datastore=detectImportOptions Pro Max?
显示 更早的评论
I compared the functions of datastore and detectImportOptions. I feel that the datastore is more powerful. Almost all the functions of detectImportOptions are included
1. Datastore can read multiple files or a specified file from different folders at once, while detectImportOptions can only read one file at once
2. Datastore.readall can automatically connect multiple files,
3. DetectImportOptions can set MissingRule, but T=datastore. readall;
Then, use anymessaging | rmmissing | fillmissing | missing | isnan | ismissing
Missing values can also be handled
It seems that there is nothing detectImportOptions can do, while datastores cannot.Is there any situation where only detectImportOptions can be used and datastore cannot be used?
In this case:Combine detectImportOptions with datastore?
Can use fds = fileDatastore(location,"ReadFcn",@fcn) read "specific format file" Instead of using detectImportOptions?
I think this comment is very helpful,thanks a lot for Walter Roberson's help! comment
回答(1 个)
Walter Roberson
2023-7-22
0 个投票
When you readmatrix() or readtable() a file, these days options are automatically detected. But the automatically detected options are not always correct options in the situation. Sometimes you need to detectImportOptions() to get out a basic options structure, then modify the detected options, and then pass the modified options into the appropriate reading routine.
The default reading routines for datasets use the default options, so they might not always read the data correctly. However, if you are aware that is happening, you can specify a custom reading function that takes appropriate steps to read the data correctly.
detectImportOptions is a utility routine that was never intended to manage sets of data, and never intended to read the data and make the read data available: it is only intended to give good guesses about the format of specific data files in order to inform the reading routines such as readtable() .
5 个评论
fa wu
2023-7-22
Walter Roberson
2023-7-23
As I indicated, datastore has a default reading routine for text files and csv and spreadsheet files such as xlsx files. That default reading routine calls readmatrix() or readtable() as appropriate. Internally, those functions will call detectImportOptions or similar routines, to try to guess the input file format. So those functions would not work properly without the utility routine detectImportOptions (or related functions) being called on behalf of the user.
When readtable() and readmatrix() call detectImportOptions automatically, detectImportOptions does not always guess the file format correctly. Sometimes you have to configure your own file reading function to pass to datastore(), with the file reading function calling detectImportOptions overriding some defaults, or making changes to the values output by detectImportOptions.
For example, if you have 10000 empty rows for a variable, detectImportOptions (called by readmatrix() or readtable(), called by the datastore routines) will probably guess that the variable is double precision. But you might have reason to know that the variable is character (such as notes about the equipment having been restarted), so you might need to specifically configure the datatype of the variable.
fa wu
2023-7-23
Walter Roberson
2023-7-23
When datastore() internally automatically calls readmatrix() or readtable(), those routines call detectImportOptions() or similar routines. The detection of the import options can be relatively expensive -- the detection functions will read up to the first 100 megabytes to try to guess the file format accurately.
Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file.
This can be especially important for efficiency if you are reading from a network drive such as OneDrive or Google Drive, as reading from network drives can be fairly slow.
fa wu
2023-7-24
类别
在 帮助中心 和 File Exchange 中查找有关 Spreadsheets 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!