Reading conetent from web url

5 次查看(过去 30 天)
PS
PS 2024-8-29
回答: PS 2024-8-30
I know how to read urls and save the content for further analyzing the data.
The issue I am facing is that I want to read certain content of a url in a specif way;
For e.g from this url https://www.gem.wiki/Almaty-2_power_station. I would like to read table 2 in a table format or tables with having specific words in it.
On exploring internet I figured out that I can read table directly from urls but I am not sure the table I want to read from the url is actual table or just text content.
Any help will be great

采纳的回答

Rahul
Rahul 2024-8-30
Hi @PS,
I understand that you are trying to read the content of 'Table 2' from url https://www.gem.wiki/Almaty-2_power_station .
You can achieve the desired result by following the following code:
url = 'https://www.gem.wiki/Almaty-2_power_station';
htmlContent = webread(url); % Reading the content from the url
tree = htmlTree(htmlContent);
tables = findElement(tree, "table"); % Finding the tables from the DOM tree
secondTableElement = tables(4); % Here I have tables the index as 4 as some other elemts are of the HTML page are also getting considered as tables.
% Find all rows in the second table
rows = findElement(secondTableElement, "tr");
% Initialize a cell array to store table data
tableData = {};
columnNames = {};
headerCells = findElement(rows(1), "th");
% Extract header text
for j = 1:numel(headerCells)
columnNames{j} = strtrim(extractHTMLText(headerCells(j)));
end
% Extract data rows
for i = 2:numel(rows)
cells = findElement(rows(i), "td");
% Extract text from each cell
rowData = cell(1, numel(cells));
for j = 1:numel(cells)
rowData{j} = strtrim(extractHTMLText(cells(j)));
end
tableData = [tableData; rowData];
end
% The following part is just to get a string cell array for the header
headerCellstring = cell(size(columnNames));
for i = 1:numel(columnNames)
headerCellstring{i} = columnNames{i}{1};
end
% Obtain the table using 'cell2table' function
secondTable = cell2table(tableData, 'VariableNames', headerCellstring);
You can refer to the following documentations for your reference:
Hope this helps! Thanks.
  1 个评论
PS
PS 2024-8-30
编辑:PS 2024-8-30
@Rahul I can't thank you enough. I was playing with htmlTree and findElement but somehow could not fathom to go further with it.
Your code will help me save ton of my time as I have hundereds of url to scan through. My sincere gratitude.
Thanks!

请先登录,再进行评论。

更多回答(1 个)

PS
PS 2024-8-30
I figured out another solution using readtable
url = "https://www.gem.wiki/Almaty-2_power_station";
opts = htmlImportOptions('TableSelector',"//TABLE[.//TH='CHP']")
opts.VariableNamesRow= 1;
opts.DataRows = [2 Inf];
T = readtable(url, opts);

类别

Help CenterFile Exchange 中查找有关 Tables 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by