datastore=​detectImpo​rtOptions Pro Max?

I compared the functions of datastore and detectImportOptions. I feel that the datastore is more powerful. Almost all the functions of detectImportOptions are included
1. Datastore can read multiple files or a specified file from different folders at once, while detectImportOptions can only read one file at once
2. Datastore.readall can automatically connect multiple files,
3. DetectImportOptions can set MissingRule, but T=datastore. readall;
Then, use anymessaging | rmmissing | fillmissing | missing | isnan | ismissing
Missing values can also be handled
It seems that there is nothing detectImportOptions can do, while datastores cannot.Is there any situation where only detectImportOptions can be used and datastore cannot be used?
Can use fds = fileDatastore(location,"ReadFcn",@fcn) read "specific format file" Instead of using detectImportOptions?
I think this comment is very helpful,thanks a lot for Walter Roberson's help! comment

回答(1 个)

When you readmatrix() or readtable() a file, these days options are automatically detected. But the automatically detected options are not always correct options in the situation. Sometimes you need to detectImportOptions() to get out a basic options structure, then modify the detected options, and then pass the modified options into the appropriate reading routine.
The default reading routines for datasets use the default options, so they might not always read the data correctly. However, if you are aware that is happening, you can specify a custom reading function that takes appropriate steps to read the data correctly.
detectImportOptions is a utility routine that was never intended to manage sets of data, and never intended to read the data and make the read data available: it is only intended to give good guesses about the format of specific data files in order to inform the reading routines such as readtable() .

5 个评论

Thanks for you answer and comment. It is very helpful.
detectImportOptions is a utility routine that was never intended to manage sets of data, and never intended to read the data and make the read data available: it is only intended to give good guesses about the format of specific data files in order to inform the reading routines such as readtable() .--------------------Yes,I think so. but I think datastore do this job better than detectImportOptions.
For example
detectImportOptions's SelectedVariableNames allow you control output var when read
but datastore's SelectedVariableNames can do this.
At the same time Datastore.read can read multiple files or a specified file from different folders at once, but T = readtable('patients.xls',opts); just read a file one time.
Datastore is more powerful than detectImportOptions. It can completely replace detectImportOptions
I don't know if my opinion is correct?
As I indicated, datastore has a default reading routine for text files and csv and spreadsheet files such as xlsx files. That default reading routine calls readmatrix() or readtable() as appropriate. Internally, those functions will call detectImportOptions or similar routines, to try to guess the input file format. So those functions would not work properly without the utility routine detectImportOptions (or related functions) being called on behalf of the user.
When readtable() and readmatrix() call detectImportOptions automatically, detectImportOptions does not always guess the file format correctly. Sometimes you have to configure your own file reading function to pass to datastore(), with the file reading function calling detectImportOptions overriding some defaults, or making changes to the values output by detectImportOptions.
For example, if you have 10000 empty rows for a variable, detectImportOptions (called by readmatrix() or readtable(), called by the datastore routines) will probably guess that the variable is double precision. But you might have reason to know that the variable is character (such as notes about the equipment having been restarted), so you might need to specifically configure the datatype of the variable.
Thanks for your help
For example, if you have 10000 empty rows for a variable, detectImportOptions (called by readmatrix() or readtable(), called by the datastore routines) will probably guess that the variable is double precision. But you might have reason to know that the variable is character (such as notes about the equipment having been restarted), so you might need to specifically configure the datatype of the variable.
-----------If I want to configure the datatype. I can use datastore configure the datatype,not use detectImportOptions.I don't know if I understand your meaning correctly?
str=datastore('local');
str.SelectedVariableTypes{1,2}='char' ;
K=str.readall
When datastore() internally automatically calls readmatrix() or readtable(), those routines call detectImportOptions() or similar routines. The detection of the import options can be relatively expensive -- the detection functions will read up to the first 100 megabytes to try to guess the file format accurately.
Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file.
This can be especially important for efficiency if you are reading from a network drive such as OneDrive or Google Drive, as reading from network drives can be fairly slow.
thanks a lot for your comment. It is very helpful!
"Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file."--------------It is sound like detectImportOptions+datastore work together?Is there any example code?

请先登录,再进行评论。

产品

版本

R2020a

提问:

2023-7-22

编辑:

2023-7-24

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by