Help using textscan on .csv files
显示 更早的评论
Hello,
I'm writing a script to process a number of CSV files generated each day and make a mat file from them. I use readtable first, but if there is an formatspec error due to some inconsistent data, readtable doesn't work. In that case, I'm using textscan to read a chunk of data, skip the bad row, and again read from the next row till I reach the end of the file.
Is there a way to replace the cells which are out of spec with NaNs and read them? These rows only have one bad cell, so I would want to retain information in the other cells if possible. I don't want to use auto-generated code for this, as I will be using this script for other files also, which have different parameters.
This is the inconsistency:
For a column of numbers, there is '3402823593150348600000000000000â…¦äŸå†ï°¸â…¦ï´´â…¦ì¬Ü…' in some rows. I want to replace it with 'NaN' or something else which is convenient to process, and read the rest of the row. I can put it in 'TreatAsEmpty', but I don't know if I'll have the same expression every time. Any help would be greatly appreciated.
Thanks,
Koustubh
2 个评论
per isakson
2016-7-23
编辑:per isakson
2016-7-24
"Is there a way to replace the cells which are out of spec with NaNs and read them?"   Yes, I think it is possible to read the good data of the bad rows, but how that depends on the circumstances.
Questions:
- Is speed crucial?
- Does an entire file fit comfortable in memory (RAM)?
- Is the data comma separated? Does comma, ",", appear among the garbage characters.
- What do you mean by "I don't want to use auto-generated code for this" ?
- Could you upload a sample file.
My first idea is to
- read the entire file as text
- fix the text with regexp
- read and parse the text with textscan
Stephen23
2016-7-25
@Koustubh Gohad: please edit your question and upload a sample file by clicking the paperclip button.
回答(1 个)
Thorsten
2016-7-25
To check if your string contains any invalid characters, i.e., non-digits, you can use
~isempty(regexp(s, '\D'))
and then set the cell to NaN;
类别
在 帮助中心 和 File Exchange 中查找有关 Text Data Preparation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!