datastore=detectImportOptions Pro Max?

Question

0 个投票

I compared the functions of datastore and detectImportOptions. I feel that the datastore is more powerful. Almost all the functions of detectImportOptions are included

1. Datastore can read multiple files or a specified file from different folders at once, while detectImportOptions can only read one file at once

2. Datastore.readall can automatically connect multiple files,

3. DetectImportOptions can set MissingRule, but T=datastore. readall;

Missing values can also be handled

It seems that there is nothing detectImportOptions can do, while datastores cannot.Is there any situation where only detectImportOptions can be used and datastore cannot be used?

In this case:Combine detectImportOptions with datastore?

Can use fds = fileDatastore(location,"ReadFcn",@fcn) read "specific format file" Instead of using detectImportOptions？

I think this comment is very helpful,thanks a lot for Walter Roberson's help! comment

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Walter Roberson 2023-7-22

0 个投票

When you readmatrix() or readtable() a file, these days options are automatically detected. But the automatically detected options are not always correct options in the situation. Sometimes you need to detectImportOptions() to get out a basic options structure, then modify the detected options, and then pass the modified options into the appropriate reading routine.

The default reading routines for datasets use the default options, so they might not always read the data correctly. However, if you are aware that is happening, you can specify a custom reading function that takes appropriate steps to read the data correctly.

detectImportOptions is a utility routine that was never intended to manage sets of data, and never intended to read the data and make the read data available: it is only intended to give good guesses about the format of specific data files in order to inform the reading routines such as readtable() .

5 个评论
显示 3更早的评论隐藏 3更早的评论

Walter Roberson 2023-7-23

As I indicated, datastore has a default reading routine for text files and csv and spreadsheet files such as xlsx files. That default reading routine calls readmatrix() or readtable() as appropriate. Internally, those functions will call detectImportOptions or similar routines, to try to guess the input file format. So those functions would not work properly without the utility routine detectImportOptions (or related functions) being called on behalf of the user.

When readtable() and readmatrix() call detectImportOptions automatically, detectImportOptions does not always guess the file format correctly. Sometimes you have to configure your own file reading function to pass to datastore(), with the file reading function calling detectImportOptions overriding some defaults, or making changes to the values output by detectImportOptions.

For example, if you have 10000 empty rows for a variable, detectImportOptions (called by readmatrix() or readtable(), called by the datastore routines) will probably guess that the variable is double precision. But you might have reason to know that the variable is character (such as notes about the equipment having been restarted), so you might need to specifically configure the datatype of the variable.

Walter Roberson 2023-7-23

When datastore() internally automatically calls readmatrix() or readtable(), those routines call detectImportOptions() or similar routines. The detection of the import options can be relatively expensive -- the detection functions will read up to the first 100 megabytes to try to guess the file format accurately.

Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file.

This can be especially important for efficiency if you are reading from a network drive such as OneDrive or Google Drive, as reading from network drives can be fairly slow.

fa wu 2023-7-24

thanks a lot for your comment. It is very helpful!

"Because of that, it can be more efficient to call detectImportOptions() once ahead of time, on one representative file, and make small adjustments (like setting variable types or setting datatime timezone formats), and to store the resulting import options. Then configure the reading routine (that datastore will invoke each time it needs to read a file from the list of files) to use the stored import options. That avoids having to guess the file structure for every file."--------------It is sound like detectImportOptions+datastore work together?Is there any example code?

请先登录，再进行评论。

datastore=detectImportOptions Pro Max?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

5 个评论
显示 3更早的评论隐藏 3更早的评论

类别

产品

版本

标签

Community Treasure Hunt

datastore=​detectImpo​rtOptions Pro Max?

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

回答（1 个）

5 个评论 显示 3更早的评论 隐藏 3更早的评论

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

datastore=detectImportOptions Pro Max?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论