What is the fastest way to extract data from a huge text file?
3 次查看(过去 30 天)
显示 更早的评论
I have a text file like this:
1.0 IONOSPHERE MAPS GNSS IONEX VERSION / TYPE
ADDNEQ2 V5.3 AIUB 03-JUL-14 20:57 PGM / RUN BY / DATE
CODE'S GLOBAL IONOSPHERE MAPS FOR DAY 180, 2014 COMMENT
Global ionosphere maps (GIM) are generated on a daily basis DESCRIPTION
(I don't want this part)
.
.
.
(skip 600 lines)
1 START OF TEC MAP
2014 6 29 0 0 0 EPOCH OF CURRENT MAP
87.5-180.0 180.0 5.0 450.0 LAT/LON1/LON2/DLON/H
154 154 155 155 155 156 156 156 156 155 155 155 154 154 153 153
152 151 150 149 148 147 146 145 145 144 143 142 141 140 139 139
138 138 137 137 137 137 136 136 137 137 137 137 137 138 138 139
139 139 140 140 141 142 142 143 143 144 145 145 146 147 147 148
149 149 150 151 152 152 153 153 154
85.0-180.0 180.0 5.0 450.0 LAT/LON1/LON2/DLON/H
160 161 162 163 164 164 165 165 165 164 164 163 163 162 161 159
158 157 155 153 151 149 147 145 143 141 139 138 136 134 133 132
131 130 130 129 129 129 130 130 131 131 132 133 134 135 136 136
137 138 139 139 140 140 141 142 142 143 144 145 146 146 148 149
150 151 153 154 155 157 158 159 160
.
.
.
I have to search for a specific value by entering specific latitude, longitude and time.
I have a function using fopen and fgetl for searching this. The data have a fixed spacing. So, I use strcmp string comparison and isequal to search for the value I want. . . .
Let say, value = search(lat, lon, time)
lat = 85.0; lon = -175; time (UT) = 0;
I will first compare each line getting from fgetl with the string:
2014 6 29 0 0 0 EPOCH OF CURRENT MAP
If matched, then search for 85.0 from the from following line getting by fgetl
85.0-180.0 180.0 5.0 450.0 LAT/LON1/LON2/DLON/H
If matched, store all related data into a vector:
160 161 162 163 164 164 165 165 165 164 164 163 163 162 161 159 158 157 155 153 151 149 147 145 143 141 139 138 136 134 133 132 131 130 130 129 129 129 130 130 131 131 132 133 134 135 136 136 137 138 139 139 140 140 141 142 142 143 144 145 146 146 148 149 150 151 153 154 155 157 158 159 160 (in vector form)
then get the value by specific vector index (corresponding to longitude, index=2 in this example)
. . .
But I have to call this search function for 250,000 times. It will take over 24 hours!!!
How can I do? I cannot change my computer. Thank!! I need your help!!*
PS: the text file is about 12,000 row * 80 column
采纳的回答
per isakson
2015-2-5
编辑:per isakson
2015-2-5
Now I'm done:
- less than a tenth of a second to read and parse the sample file (with the file in the system cache)
- less than a tenth of a millisecond to retrieve one value
- the array ION is half a MB. Make ION uint8 to save memory - if needed.
- 62196 values retrieved from the sample file.
You add tests and comments!
>> tic,ION = cssm();toc
Elapsed time is 0.074765 seconds.
>> sum(not(isnan(ION(:))))
ans =
62196
>> whos ION
Name Size Bytes Class Attributes
ION 73x71x12 497568 double
>> ION(lon2ix(0),lat2ix(85),ut2ix(20))
ans =
164
>> tic,ION(lon2ix(0),lat2ix(85),ut2ix(20));toc
Elapsed time is 0.000067 seconds.
compared to
>> tic, [gim_tec] = sample_search_function( 20, 85, 0 ), toc
gim_tec =
164
Elapsed time is 0.265756 seconds.
where
function ION = cssm()
str = fileread( 'c:\m\cssm\CODG1520.txt' );
ca1 = regexp( str, '(?<=START OF TEC MAP).+?(?=END OF TEC MAP)', 'match' );
ION = nan( 73, 70, 11 );
lat2ix = @(lat) round((lat+87.5)/2.5)+1;
lon2ix = @(lon) round((lon+180)/5.0)+1; %#ok<NASGU>
ut2ix = @(ut) round(ut/2)+1;
for jj = 1 : length( ca1 )
buf = regexp( ca1{jj}, '\n', 'split', 'once' );
buf = regexp( buf{2} , '\n', 'split', 'once' );
ut = textscan( buf{1}, '%*f%*f%*f%f%*[^\n]' );
ut = ut{1};
ca2 = regexp( buf{2}, 'LAT/LON1/LON2/DLON/H', 'split' );
pos = ca2{1};
for kk = 2 : length( ca2 )
lat = textscan( pos,'%f%*[^\n]' );
lat = lat{1};
num = sscanf( ca2{kk}(1:end-60), '%f' );
pos = strtrim( ca2{kk}(end-60+1:end) );
ION(:,lat2ix(lat),ut2ix(ut)) = num;
end
end
end
9 个评论
Dogan Deniz Karadeniz
2019-6-21
@per isakson: Is it possible to read many files instead of your giving example (CODG1520) for your cssm function?
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Data Preparation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!