Cannot Load CSV file

Question

0 个投票

Screen Shot 2018-07-31 at 19.52.46.png

I am trying to load a csv file using the import tool.

It takes forever (like a weekend was not enough...).

I've included the screenshot of what I am doing.

The file has numbers from H2 to AEQ639774. From A1 to AEQ1 I have headers. From A2 to G639774 I have identifiers.

I was trying to first load the numbers into a numeric matrix, and then repeat the process for headers and identifiers separately. But not even this works.

The file is 1.28 GB.. so big but not that big.

My machine has 16gb ram so that should be enough.

I am probably doing something wrong!

Thanks in advance!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Adam Danz 2018-7-31

0 个投票

That sounds fishy. What version of matlab are you using? I assume the problem persists after exiting and rebooting Matlab.

You could try rehashing the toolbox cache in case 3rd party toolboxes are interfering.

You could use an alternative method of importing the data such as xlsread() which bypasses some of the processing done by the import tool.

15 个评论
显示 13更早的评论隐藏 13更早的评论

romulo alves 2018-7-31

编辑：romulo alves 2018-7-31

在 MATLAB Online 中打开

So, if I do xlsread('DOT.csv','H7:T20'), trying to extract only a little bit of numeric part, I get the message

Unable to read XLS file "path" File is not in recognized format.

If I do:

chunk_nRows = 2e4 ;
 % - Open file.
 fId  = fopen( 'DOT.csv' ) ;
 % - Read first line, convert to double, determine #columns.
 line  = fgetl( fId ) ;
 row   = sscanf( line, '%f,' )' ;
 nCols = numel( row ) ;
 % - Prealloc data, copy first row, init loop counter.
 data      = zeros( chunk_nRows, nCols ) ;
 data(1,:) = row ;
 rowCnt    = 1 ;
 % - Loop over rest of the file.
 while ~feof( fId )
    rowCnt = rowCnt + 1 ;
    % - Realloc + a chunk if rowCnt larger than data array.
    if rowCnt > size( data, 1 )
        fprintf( 'Realloc ..\n' ) ;
        data(size(data, 1)+chunk_nRows, nCols) = 0 ;
    end
    % - Read line, convert and store.
    line = fgetl( fId ) ;
    data(rowCnt,:) = sscanf( line, '%f,' )' ;
 end
 % - Truncate data to last row (truncate last chunk).
 data = data(1:rowCnt,:) ;
 % - Close file.
 fclose( fId ) ;

I get the message

Subscript indices must either be real positive integers or logicals.

I checked and the code stops when

rowCnt = 20001

Walter Roberson 2018-7-31

The 'e' and 'r' are probably the reason that most numbers are coded as if they are strings.

What do you want done with the 'e' and 'r' ? Is it okay to treat both of them the same way as empty cells, by changing all three of them into NaN ?

Walter Roberson 2018-8-1

The file turns out to be UTF8 encoded, because it contains accented characters at various points. That leads to some problems.

I started working with reading in the entire file at one time to process as a single string (there can be a lot of advantages to working that way), but I encountered a Mathworks bug with native2unicode at the point of 1 gigabyte of decoded characters.

请先登录，再进行评论。

Cannot Load CSV file

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

15 个评论
显示 13更早的评论隐藏 13更早的评论

更多回答（0 个）

类别

标签

Community Treasure Hunt

Cannot Load CSV file

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

15 个评论 显示 13更早的评论 隐藏 13更早的评论

更多回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

15 个评论
显示 13更早的评论隐藏 13更早的评论