how to read grid data from text file ?

Question

pruth 2017-9-23

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/358030-how-to-read-grid-data-from-text-file

评论： dpb 2019-7-12

L3_tropo_ozone_column_jan14.txt

hi I have a text file(attached). which contain ozone data. I am not able to read the data. since it is not in regular format. only latitude(-59.5S to 59.5N (1.00 degree steps) ) is given and on every latitude all ozone data is given so there are 288 longitudes(-179.375W to 179.375E (1.25 degree steps)) therefore 288 data points are there. but the problem is all data is in string format and we need to split data after every 3 digit. some random space is also given in the middle of the data so we have to remove that also otherwise data will not split in 3 correct digits .

later i will use inpolygon to grab out the data from specific region. that i will try later. but first i need to read this text file and took the data out.

hope you understand.

2 个评论
显示无隐藏无

Cedric 2017-9-23

Does this format have a name? Is it the original format in which the data is distributed?

pruth 2017-9-23

编辑：pruth 2017-9-23

yes.the same original file is attached . the earlier data which I used was very differently arranged and bit simple. I am no good in programing. so finding this hard. hope you will help.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Cedric 2017-9-23

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/358030-how-to-read-grid-data-from-text-file#answer_282826

编辑：Cedric 2017-9-23

在 MATLAB Online 中打开

The format seems to be GridTOMS as mentioned here. There is an IDL reader and there may be MATLAB ones.

If you need a stable reader, I advise you to look for a MATLAB implementation "endorsed" by NASA. If you need a quick hack to perform early tests, you can try the following (where I assume that spaces code for trailing zeros):

 content = fileread( 'L3_tropo_ozone_column_jan14.txt' ) ;
 % - Remove first space on all data rows.
 content = regexprep( content, '(?<=[\r\n]) ', '' ) ;
 % - Split by "lat = ..." separator.
 blocks = regexp( content, '\s+lat[^\r\n]+', 'split' ) ;
 % - Extract header from block 1.
 pos = regexp( blocks{1}, '\)\s+\d', 'start' ) ;
 header = blocks{1}(1:pos) ;
 blocks{1} = blocks{1}(pos+1:end) ;
 % - Merge blocks, remove \r\n, replace spaces by 0s.
 blocks = [blocks{:}] ;
 blocks = regexprep( blocks, '[\r\n]', '' ) ;
 blocks(blocks == ' ') = '0' ;
 % - Convert to 120x288 numeric array.
 data = reshape( sscanf( blocks, '%3d' ), 288, 120 ).' ;

Note that it is easy to wrap this in a function and call it while iterating through files from a folder (using the output of DIR). It is also easy to extract meta information from the header if relevant.

9 个评论
显示 7更早的评论隐藏 7更早的评论

Cedric 2017-9-23

编辑：Cedric 2017-9-23

在 MATLAB Online 中打开

If these files all have the same format, there should not be any problem, but check a few years/months to be sure. Pick e.g. the first and the last value for a random latitude, so you can easily compare what is in the file and what is in the array.

If all files are in the same folder, e.g. "Originals", you can automatize the process:

 dataFolder = 'Originals' ;
 dirListing = dir( fullfile( dataFolder, '*.txt' )) ;
 ozoneData = struct( 'year', [], 'month', {}, 'monthId', [], 'data', {} ) ;
 monthStr = {'January', 'February', 'March', 'April', 'May', 'June', 'July', ...
    'August', 'September', 'October', 'November', 'December'} ;
 % - Iterate through files and process.
 for fileId = 1 : numel( dirListing )
    % - Read relevant file.
    locator = fullfile( dataFolder, dirListing(fileId).name ) ;
    fprintf( 'Processing %s ..\n', locator ) ;   
    content = fileread( locator ) ;
    % - Remove first space on all data rows.
    content = regexprep( content, '(?<=[\r\n]) ', '' ) ;
    % - Split by "lat = ..." separator.
    blocks = regexp( content, '\s+lat[^\r\n]+', 'split' ) ;
    % - Extract header from block 1.
    pos = regexp( blocks{1}, '\)\s+\d', 'start' ) ;
    header = blocks{1}(1:pos) ;
    blocks{1} = blocks{1}(pos+1:end) ;
    % - Merge blocks, remove \r\n, replace spaces by 0s.
    blocks = [blocks{:}] ;
    blocks = regexprep( blocks, '[\r\n]', '' ) ;
    blocks(blocks == ' ') = '0' ;
    % - Convert to 120x288 numeric array.
    ozoneData(fileId).data = reshape( sscanf( blocks, '%3d' ), 288, 120 ).' ;
    % - Extract year and month from header, compute month ID.
    monthYear = regexp( header, '(\w+)\s+(\d+)', 'tokens', 'once' ) ;
    ozoneData(fileId).year    = str2double( monthYear{2} ) ;
    ozoneData(fileId).month   = monthYear{1} ;
    ozoneData(fileId).monthId = find( strcmpi( monthYear{1}, monthStr )) ;
 end
 % - Sort by year and month (as file naming is messing up the order).
 [~, reIndex] = sortrows( [ozoneData.year; ozoneData.monthId].' ) ;
 ozoneData = ozoneData(reIndex) ;

and then you have a struct array that you can access as follows (note that I had just a few files, so entry #2 won't be the same on your system):

 >> ozoneData(2)
 ans = 
  struct with fields:
       year: 2014
      month: 'February'
    monthId: 2
       data: [120×288 double]
 >> ozoneData(2).data
 ans =
   267   259   269   274   251   241   243   267   258   294   258   262 ...
   ...

pirapts Raptis 2019-7-12

编辑：pirapts Raptis 2019-7-12

Hello everybody,

i am processing some similar files (asc again, from the same dataset, but for other variable link)

the problem is that there 4 digit numbers in the files.

so i changed to sscanf( blocks, '%4d' )

which provides the correct dimernsions for the output (720X1440)

but there are misread numbers .

in the ascci file ;ooks like

559 584 656 84811281610184216791461128412291089 667 574

but matlab format the output as

559 584 656 8481 1281 6101 ...

instead of

559 584 656 848 1128 1610...

i have tried to process them line by line and the same fault appear.

also i have noticed that the blocks char has length 4108186.

i still don't understand how i get correct dimensions (720*1440*4=4147200 for 4digits), and how it stops reading at wrong digit when 4digit numbers appear

any idea on how to handle that would be really usefull

(matlab 2014b)

dpb 2019-7-12

在 MATLAB Online 中打开

"so i changed to sscanf( blocks, '%4d' )"

The problem is C -- the formatting was not designed with fixed-width files in mind and it simply can't handle them by default because '%4d' does NOT mean what one logically would expect; namely :"read four-character-width fields beginning at the beginning of the recore". Instead it means "read no more than 4 characters, but C silently "eats" the white space and so, as you notice, by the time it gets to the fourth entry in your input record, it begins with the 8 instead of the blank and reads "no more than" four characters. But, that's not the right answer. Fortran FORMAT gets it right, but unfortunately Mathworks chose the easy way out when rewrote MATLAB in C and used the C runtime i/o library instead of building a FORMAT facility. Late releases have (finally!! after 30 years) introduced a new fixed width text import object but that won't help you unless you can upgrade.

You simply have to count characters (including blanks) and process the resulting substrings -- with the sample record you give (NB: you're missing the leading blank at the beginning of the record)

>> str2num(reshape(rec,4,[]).')
ans =
         559
         584
         656
         848
        1128
        1610
        1842
        1679
        1461
        1284
        1229
        1089
         667
         574
>>

请先登录，再进行评论。

Answer 2

dpb 2017-9-23

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/358030-how-to-read-grid-data-from-text-file#answer_282828

编辑：dpb 2017-9-23

在 MATLAB Online 中打开

Read the file as block of cellstr, convert to character array
Convert char array of 12x75 to 1*900 line=reshape(blk.',1,[]);
Select first 288*3 --> 864 characters c=line(1:864);
Replace any blanks with '0' c=strrep(c,' ','0');
Convert 3-digit fields dat=sscanf(c,'%3d');
Go next block

Thanks to Cedric for pointing out my weak eyes... :)

file=textread('tropo.txt','%s','delimiter', '\n','whitespace', '','headerlines',3);  % file as cellstr array
L=length(file);     % number lines/records in file
data=zeros(L/12,288);  % preallocate for resulting data
j=0;                   % counter for data blocks
for i=1:12:L           % loop over blocks of 12 records
  blk=char(file(i:i+11));  % retrieve a block, convert to character array
  blk(:,1)='';    % remove leading blanks
  line=reshape(blk.',1,[]); line=line(1:864);  % recast as record;truncate
  line=strrep(line,' ','0');                   % replace blanks with leading 0
  j=j+1;                                       % increment counter
  data(j,:)=sscanf(line,'%3d');                % convert to numeric
end

results in a double array containing the data...

From the first block I tested at command line--

>> whos data
Name        Size            Bytes  Class     Attributes
data      288x1              2304  double              
>>

3 个评论
显示 1更早的评论隐藏 1更早的评论

dpb 2017-9-23

Old eyes failed me...I had mistakenly thought char() had gotten rid of the leading space but didn't...thanks.

Cedric 2017-9-23

My maybe younger eyes failed me too. I had to get tricked a couple times before I realized!

请先登录，再进行评论。

Answer 3

Guillaume 2017-9-23

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/358030-how-to-read-grid-data-from-text-file#answer_282829

在 MATLAB Online 中打开

Whoever created that format should be very ashamed. It's a pain to parse.

This is a start. I still need to figure out why I've got 292 columns instead of 288, but I've got to go.

filecontent = fileread('L3_tropo_ozone_column_jan14.txt');  %read it all
filecontent(ismember(filecontent, [10, 13])) = []; %remove line returns
longdesc = regexp(filecontent, 'Longitudes:\s*(\d+)\D+(\d+(\.\d+)?)([EW])\D+(\d+(\.\d+)?)([EW])', 'tokens', 'once');  %longitude description
longnumbers = str2double(longdesc([1 2 4]));
longnumbers(2:3) = longnumbers(2:3) .* (-1).^ strcmp(longdesc([3 5]), 'W'); %change sign for W
longitudes = linspace(longnumbers(2), longnumbers(3), longnumbers(1));
pointlats = regexp(filecontent, '\s+([0-9 ]+)lat\s*=\s*(-?\d+(\.\d+)?)', 'tokens'); %extract point strings and latitude
pointlats = vertcat(pointlats{:});
latitudes = str2double(pointlats(:, 2));
points = regexprep(pointlats(:, 1), '\s', '0'); %replace spaces with 0
points = regexp(points, '\d{3}', 'match');  %split in group of three
points = str2double(vertcat(points{:}));

5 个评论
显示 3更早的评论隐藏 3更早的评论

Cedric 2017-9-23

The format is consistent (see my comment under you answer). What is annoying is that it is designed partly because of "machine" constraints, and partly for looking "cute" to a human eye when opened in a text editor.

dpb 2017-9-23

Wonder why put the leading blank in there, though...that really is the only really bad part; the rest is pretty easy to deal with but that makes for special-casing. Oh, the no leading zero in the format is also pretty ugly; almost forgot that! :)

请先登录，再进行评论。

how to read grid data from text file ?

2 个评论
显示无隐藏无

采纳的回答

9 个评论
显示 7更早的评论隐藏 7更早的评论

更多回答（2 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论

另请参阅

类别

标签

Community Treasure Hunt

how to read grid data from text file ?

2 个评论 显示 无隐藏 无

采纳的回答

9 个评论 显示 7更早的评论隐藏 7更早的评论

更多回答（2 个）

3 个评论 显示 1更早的评论隐藏 1更早的评论

5 个评论 显示 3更早的评论隐藏 3更早的评论

另请参阅

类别

标签

Community Treasure Hunt

2 个评论
显示无隐藏无

9 个评论
显示 7更早的评论隐藏 7更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论