convert a html table to csv format
7 次查看(过去 30 天)
显示 更早的评论
I need to convert the table in the following url to csv format. Since I have to convert many tables, I can't use cope paste. http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12
1 个评论
Matt Kindig
2013-4-10
Do you have to use Matlab for this purpose? The reason I ask is because other languages that are more commonly used for website development have good HTML parsing capabilities, whereas such features are more limited in Matlab--in Matlab you'd basically have to resort to complex regexp statements.
I would recommend Python and the BeautifulSoup package to do this, actually.
回答(2 个)
Jan
2013-4-11
You can import the HTML table to Matlab at first by FEX: htmltableToCell or FEX: get-html-table-data-into-matlab. Then an export to CSV depends on the contents of the data.
0 个评论
Cedric
2013-4-10
编辑:Cedric
2013-4-10
As Matt mentions, Python + package would be perfect for this part. Here is one way to do it using REGEXP in MATLAB.. not the full stuff though, but enough to illustrate.
% - Get HTML page.
url = 'http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12' ;
buffer = urlread(url) ;
% - Extract horizontal header.
p = '(?<=<td class="dataTableColHeader">).*?(?=</td>)' ;
hheader = regexp(buffer, p, 'match') ;
% - Extract vertical header.
p = '(?<=<td class="dataTableRowHeader">).*?(?=</td>)' ;
vheader = regexp(buffer, p, 'match') ;
% - Extract/reshape data.
p = '(?<=<td class="dataTableRowData">).*?(?=</td>)' ;
data = regexp(buffer, p, 'match') ;
data = reshape(data, 12+2, []).' ;
% - Build and export the whole.
content = [vheader.',[hheader; data]] ;
xlswrite('example.xlsx', content) ;
Let me know if you want to go this way and I can improve a little this code. There would be still quite a bit of work to do on your side, e.g. to manage some inconsistency in the way they build the HTML table, to detect/manage failures in the processing, to export to CSV instead of XLSX, etc.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!