Why does matlab save strings from delimited text file as individual characters? And how to prevent.

5 次查看(过去 30 天)
So, I have a cell structure in Matlab (containing words, dates and numbers separated by ";" loaded from a very large file) which I take certain lines from, then do some calculations on and finally write each field to a separate file as a table (the words being the headers, the dates and numbers the data).
I have the script functioning more or less okay, be it that I keep running into a particular problem; namely that when splitting the lines using strsplit all entries are treated as individual characters. So when I select a cell entry and add a position, for example A.a{1,1}(2) it returns the second letter of the string. It also does this for numbers, making manipulation difficult. Being splitted strings Matlab treats multi-digit numbers as single numbers, so when I do A.a{1,2} it returns 122, but when I do A.a{1,2}*2 I get ans = 98 100 100 rather then 244. Now I could use str2num, but that doesn't work for words or dates so can become pretty cumbersome... I have a hard time finding the right command to convert all entries to single 'words'. I've also tried using cell2array and array2table commands, but I somehow keep running into issues. Any help would be appreciated!
  4 个评论
Stephen23
Stephen23 2017-9-8
@Sjouke Rinsma: Thank you for uploading some sample data. I note that all of the columns appear to be numeric, except for the date in the first column. I have no idea why you are wasting your time with importing that data as characters. Why not simply import the data directly as numeric?
Sjouke Rinsma
Sjouke Rinsma 2017-9-8
Hi Stephen; I get what you're saying, though I'm somewhat fuzzy on how to import a ;-delimited text file as numeric data, since this one also contains the 'non-numeric dates'. dlmwrite does not recognize these, and readtable still imports everything as chars.. but maybe I'm just not familiar with right function to use in this case, or I'm just completely overlooking something.
Nevertheless, for as far as I can see, by the time I've reached line 22 I've got a completely numeric array (if I remove the ; at the end) in which I then rewrite the date. Also, for the files I've uploaded, the script seems to work fine, though as I mentioned before; when I'm working with the larger file I somehow get a matrix where toward the right most columns of a field the data types become mixed (randomly quoted and non-quoted entries in the same column). This also results in written files where some numbers are written as numeric and others as chars (?) with, resulting in different number of digits which makes everything look really messy (I've uploaded the resulting mat-file of the result structure and the final text file for one field, if you're interested). Especially that last part has got me puzzled... I would assume it's not because of the large data set, since that is actually the reason I'm using Matlab in the first place.

请先登录,再进行评论。

采纳的回答

Stephen23
Stephen23 2017-9-8
编辑:Stephen23 2017-9-8
Rather than wasting time importing the data as character, you would be much better of using textscan to import numeric values as numeric data, for example this reads your entire example file:
opt = {'Delimiter',';', 'CollectOutput',true};
fid = fopen('merged.txt','rt');
hdr = fgetl(fid);
fmt = ['%s',repmat('%f',1,nnz(hdr==';'))];
C = textscan(fid,fmt,opt{:});
fclose(fid);
and checking:
>> size(C{1}) % the number of date strings
ans =
6076 1
>> size(C{2}) % the size of the numeric matrix
ans =
6076 47
>> C{1}{[1,end]} % the first and last dates
ans = 07-09-2017 08:25:33
ans = 07-09-2017 10:40:54
" I work with a 200M+ lines file"
If you have a very large file that cannot be imported at once then you can adapt the code I have shown above using the method given in the MATLAB documentation, which reads blocks of data at-a-time:
Basically the trick is to use the third optional input to specify how many lines to read, and call textscan in a loop.
  1 个评论
Sjouke Rinsma
Sjouke Rinsma 2017-9-8
编辑:Sjouke Rinsma 2017-9-12
Should've refreshed before answering that previous post... nevertheless thanks for this, I will definitely look into it!
And so I did. Seems to be working fine now, thanks :)

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Import and Export 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by