textscan difficulties with mixed datatypes

Question

Phillip 2014-5-27

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/131389-textscan-difficulties-with-mixed-datatypes

评论： Phillip 2014-5-28

采纳的回答： Cedric

在 MATLAB Online 中打开

Hi

I am having difficulty solving a particular problem. I might just be missing the wood for the trees but here goes:

I have a large (> 1mio) cellstr that has the following type of format (only 3 row example shown):

    blockCSV = {'record1,2,3,string4,s5';'rec2,22,33,str4,str5';'r3,222,333,s4,st5'};

I then attempt to textscan through each cellstr (for loop, as textscan is not "vectorized" for cellstr) using one of the following two syntaxes:

temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',0)

or

temp = textscan(blockCSV{i},'%s%f%f%s%s','delimiter',',','CollectOutput',1)

Now, the problem is that temp comes out as a cell that contains cells and matrices ie. indexing within indexing on different datatypes. I can't afford to index each one individually inside the loop (large dataset as mentioned) but I need the output to come out as :

   ans = 
    'record1'    [  2]    [  3]    'string4'    's5'  
    'rec2'       [ 22]    [ 33]    'str4'       'str5'
    'r3'         [222]    [333]    's4'         'st5'

[Edited for clarity (hopefully)]: Instead I get something like (CollectOutput is false):

ans =

    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}
    {1x1 cell}    [2]    [3]    {1x1 cell}    {1x1 cell}

or (CollectOutput is true):

ans =

    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}
    {1x1 cell}    [1x2 double]    {1x2 cell}

With CollectOutput == false I would expect to see what I stated above instead of a cell within a cell within makes any indexing very difficult?

I hope this makes sense. I'm sure i'm missing something simplistic.

PS: I think textscan is inconsistent because when you read the example from an actual file (instead of a cellstr) it works exactly like I want the outcome to be without any for loop or indexing.

Regards, Phillip

2 个评论
显示无隐藏无

per isakson 2014-5-27

Why use textscan in the first place?

Phillip 2014-5-28

Why not? I have tried a couple of things and it seemed to be best. Please elaborate if you think it's not so that I can reply appropriately

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Cedric 2014-5-28

3
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/131389-textscan-difficulties-with-mixed-datatypes#answer_138554

编辑：Cedric 2014-5-28

在 MATLAB Online 中打开

Why do you get the CSV content as a cell array of rows? If you cannot change this, you could just merge/concatenate all these rows inserting line breaks, and use TEXTSCAN on the whole.

 merger = [blockCSV, repmat({sprintf('\n')}, numel(blockCSV), 1)].' ;
 data   = textscan([merger{:}], '%s%f%f%s%s', 'Delimiter', ',') ;

with that you get

 >> data
 data = 
    {3x1 cell}    [3x1 double]    [3x1 double]    {3x1 cell}    {3x1 cell}

which is most appropriate memory-wise and for further indexing, as numeric entries are stored in numeric arrays, and non-numeric entries in cell arrays.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Phillip 2014-5-28

Nice use of the "inconsistency". Should have thought of that. Speeds up the code nicely and now I can finally generalise the larger code. Thanks!

请先登录，再进行评论。

Answer 2

dpb 2014-5-27

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/131389-textscan-difficulties-with-mixed-datatypes#answer_138540

在 MATLAB Online 中打开

Is only one of the many inconsistencies/quirks in textscan...

AFAIK about the best you can do is to then post-process another step by substituting the value of the cell for the cell in the three string cell columns. By for loop, it's

>> for i=1:3,t(i,1)=t{i,1};t(i,4)=t{i,4};t(i,5)=t{i,5};end
>> t
t = 
  'record1'    [  2]    [  3]    'string4'    's5'  
  'rec2'       [ 22]    [ 33]    'str4'       'str5'
  'r3'         [222]    [333]    's4'         'st5'

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Phillip 2014-5-28

Yes, it's a bit frustrating to be honest. The solution from Cedric below uses that inconsistency nicely to get it working though. Thanks for the response.

请先登录，再进行评论。

textscan difficulties with mixed datatypes

2 个评论
显示无隐藏无

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

textscan difficulties with mixed datatypes

2 个评论 显示 无隐藏 无

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

2 个评论
显示无隐藏无

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论