convert a html table to csv format

7 次查看(过去 30 天)
FATEMEH
FATEMEH 2013-4-10
I need to convert the table in the following url to csv format. Since I have to convert many tables, I can't use cope paste. http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12
  1 个评论
Matt Kindig
Matt Kindig 2013-4-10
Do you have to use Matlab for this purpose? The reason I ask is because other languages that are more commonly used for website development have good HTML parsing capabilities, whereas such features are more limited in Matlab--in Matlab you'd basically have to resort to complex regexp statements.
I would recommend Python and the BeautifulSoup package to do this, actually.

请先登录,再进行评论。

回答(2 个)

Jan
Jan 2013-4-11
You can import the HTML table to Matlab at first by FEX: htmltableToCell or FEX: get-html-table-data-into-matlab. Then an export to CSV depends on the contents of the data.

Cedric
Cedric 2013-4-10
编辑:Cedric 2013-4-10
As Matt mentions, Python + package would be perfect for this part. Here is one way to do it using REGEXP in MATLAB.. not the full stuff though, but enough to illustrate.
% - Get HTML page.
url = 'http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12' ;
buffer = urlread(url) ;
% - Extract horizontal header.
p = '(?<=<td class="dataTableColHeader">).*?(?=</td>)' ;
hheader = regexp(buffer, p, 'match') ;
% - Extract vertical header.
p = '(?<=<td class="dataTableRowHeader">).*?(?=</td>)' ;
vheader = regexp(buffer, p, 'match') ;
% - Extract/reshape data.
p = '(?<=<td class="dataTableRowData">).*?(?=</td>)' ;
data = regexp(buffer, p, 'match') ;
data = reshape(data, 12+2, []).' ;
% - Build and export the whole.
content = [vheader.',[hheader; data]] ;
xlswrite('example.xlsx', content) ;
Let me know if you want to go this way and I can improve a little this code. There would be still quite a bit of work to do on your side, e.g. to manage some inconsistency in the way they build the HTML table, to detect/manage failures in the processing, to export to CSV instead of XLSX, etc.

类别

Help CenterFile Exchange 中查找有关 Tables 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by