textscan difficulties with mixed datatypes
1 次查看(过去 30 天)
显示 更早的评论
Hi
I am having difficulty solving a particular problem. I might just be missing the wood for the trees but here goes:
I have a large (> 1mio) cellstr that has the following type of format (only 3 row example shown):
blockCSV = {'record1,2,3,string4,s5';'rec2,22,33,str4,str5';'r3,222,333,s4,st5'};
I then attempt to textscan through each cellstr (for loop, as textscan is not "vectorized" for cellstr) using one of the following two syntaxes:
temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',0)
or
temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',1)
Now, the problem is that temp comes out as a cell that contains cells and matrices ie. indexing within indexing on different datatypes. I can't afford to index each one individually inside the loop (large dataset as mentioned) but I need the output to come out as :
ans =
'record1' [ 2] [ 3] 'string4' 's5'
'rec2' [ 22] [ 33] 'str4' 'str5'
'r3' [222] [333] 's4' 'st5'
[Edited for clarity (hopefully)]: Instead I get something like (CollectOutput is false):
ans =
{1x1 cell} [2] [3] {1x1 cell} {1x1 cell}
{1x1 cell} [2] [3] {1x1 cell} {1x1 cell}
{1x1 cell} [2] [3] {1x1 cell} {1x1 cell}
or (CollectOutput is true):
ans =
{1x1 cell} [1x2 double] {1x2 cell}
{1x1 cell} [1x2 double] {1x2 cell}
{1x1 cell} [1x2 double] {1x2 cell}
With CollectOutput == false I would expect to see what I stated above instead of a cell within a cell within makes any indexing very difficult?
I hope this makes sense. I'm sure i'm missing something simplistic.
PS: I think textscan is inconsistent because when you read the example from an actual file (instead of a cellstr) it works exactly like I want the outcome to be without any for loop or indexing.
Regards, Phillip
采纳的回答
Cedric
2014-5-28
编辑:Cedric
2014-5-28
Why do you get the CSV content as a cell array of rows? If you cannot change this, you could just merge/concatenate all these rows inserting line breaks, and use TEXTSCAN on the whole.
merger = [blockCSV, repmat({sprintf('\n')}, numel(blockCSV), 1)].' ;
data = textscan([merger{:}], '%s%f%f%s%s', 'Delimiter', ',') ;
with that you get
>> data
data =
{3x1 cell} [3x1 double] [3x1 double] {3x1 cell} {3x1 cell}
which is most appropriate memory-wise and for further indexing, as numeric entries are stored in numeric arrays, and non-numeric entries in cell arrays.
更多回答(1 个)
dpb
2014-5-27
Is only one of the many inconsistencies/quirks in textscan...
AFAIK about the best you can do is to then post-process another step by substituting the value of the cell for the cell in the three string cell columns. By for loop, it's
>> for i=1:3,t(i,1)=t{i,1};t(i,4)=t{i,4};t(i,5)=t{i,5};end
>> t
t =
'record1' [ 2] [ 3] 'string4' 's5'
'rec2' [ 22] [ 33] 'str4' 'str5'
'r3' [222] [333] 's4' 'st5'
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Type Conversion 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!