How to read particular dates from an HTML script?

1 次查看(过去 30 天)
I have downloaded HTML script of of few webpages; one is uploaded for reference. My task is to extract some information from the this script. On line 851 latitude and longitude are given which I extracted using the following code:
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
fileID=fopen(filename); % fileID
open_file=textscan(fileID,'%s','%f'); % parsing the file
open_file=open_file{1,1};
lat_id=find(ismember(open_file,... % Finding the position of Latitude in text-file
'<dd>Latitude'));
long_id=find(ismember(open_file,... % Finding the position of Longitude in text-file
'Longitude'));
lat(i)=open_file(lat_id+1); % latitude
long(i)=open_file(long_id+1); % longitude
proj=open_file(long_id+3); % projection type, e.g., NAD27, NAD83
But I am not able to use similar code for reading the data in line 865, which contains the time-range of the some data. The problem is that the variable open_file do not seem to contain these values. Any suggestions will be helpful.

采纳的回答

Walter Roberson
Walter Roberson 2018-1-29
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
S = fileread(filename);
place_info = regexp(S, 'Latitude\s+(?<lat>[^ ,]+),\s*\S+\s*Longitude\s+(?<long>\S+)\s*\S+\s*(?<proj>\w+)', 'names', 'once');
periods_info = regexp(S, '''begin_date''[^\d]*(?<begin_date>\d+-\d+(-\d+)?).*?end_date[^\d]*(?<end_date>\d+-\d+(-\d+)?).*?sites_selection_links\W*(?<stats_type>[^<]+)', 'names');
other_info = regexp(S, 'site_no=\d+">(?<stats_type>.*?)</a>.*?''begin_date''[^d]*?(?<begin_date>\d+(-\d+(-\d+)?)?).*?end_date[^\d]*?(?<end_date>\d+(-\d+(-\d+)?)?)', 'names');
combined_info = [periods_info, other_info];
Now:
place_info is a struct with fields 'lat', 'long', and 'proj' reflecting latitude, longitude, and projection. The lat and long are in the form they were stored in the file, so they may have a ° in them, corresponding to the ° symbol.
combined_info is a struct with fields begin_date, end_date, and stats_type . stats_type is the information about what the period is describing. In the sample data file those are
'Daily Statistics' 'Monthly Statistics' 'Annual Statistics' 'Current / Historical Observations' 'Peak streamflow' 'Field measurements' 'Field/Lab water-quality samples' and 'Water-Year Summary'

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Mapping Toolbox 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by