convert html content to table
21 次查看(过去 30 天)
显示 更早的评论
Hi everyone! I have a file with this content :
<html xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas- microsoft-com:office:office" xmlns="http://www.w3.org/TR/REC-html40"> <head><META http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <table> <tr> <th>DATE</th><th>OPENING</th><th>MAXIMUM</th><th>MINIMUM</th><th>CLOSURE</th><th>VAR % CLOSE</th> <th>VOLUME</th> </tr> <tr> <td>10/24/22</td><td>12,900</td><td>13,042</td><td>12,761</td><td>12,964</td><td>0.00 </td><td>80421</td> </tr> <tr>.
this is a sample line of content. i would like to convert the file content into a table. thanks
7 个评论
Rik
2023-4-23
If you have trouble with Matlab basics you may consider doing the Onramp tutorial (which is provided for free by Mathworks).
采纳的回答
Star Strider
2023-4-22
编辑:Star Strider
2023-4-23
It appears to be an Excel file in a website. Perhaps using readtable would work. If not, then perhaps first using websave might.
EDIT — (22 Apr 2023 at 18:01)
Use the URL of the website as the argument to readtable.
Example from a recent post:
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1245967/PPG_sensor.csv', 'VariableNamingRule','preserve')
EDIT — (23 Apr 2023 at 14:20)
This is not the best or most efficient code I ever wrote, however it has the virtue of working —
str = fileread('volm.xls');
expression = '(\d+\/\d+\/\d+)|(\d*\.\d*)|\d*';
[tokens,matches] = regexp(str,expression,'tokens','match');
vmatches = matches(5:end);
arlim = numel(vmatches)
v = reshape(vmatches, 8, []).';
dnv = str2double(v(:,2:end));
dtv = datetime(v(:,1), 'InputFormat','dd/MM/yy');% + timeofday(datetime(v(:,2), 'InputFormat','HH:mm:ss'));
T1 = array2table(dnv);
T1 = addvars(T1, dtv, 'Before',1);
T1.Properties.VariableNames = {'DATA','APERTURA','MASSIMO','MINIMO','CHISURA','VAR','% CLOSE','VOLUME'}
VN = T1.Properties.VariableNames;
figure
plot(T1{:,1}, T1{:,[2 3 4]})
hold on
patch([T1{:,1}; flip(T1{:,1})], [T1{:,3}; flip(T1{:,4})], [1 1 1]*0.25, 'EdgeColor','none', 'FaceAlpha',0.25)
hold off
grid
axis('padded')
xlabel(VN{1})
ylabel('Value')
legend(VN{[2 3 4]}, 'Location','best')
EDIT — (23 Apr 2023 at 14:52)
Added plot.
EDIT — (23 Apr 2023 at 17:48)
Changed code to use reshape on the the original cell array ‘vmatches’ to eliminate the loop and make the code more efficient.
.
7 个评论
Star Strider
2023-4-24
I learned something new, that being that this file (and presumably the others) have a ‘,’ decimal separator and a ‘.’ thousands separator. I was not aware of that previously, and did not suspect it. So the ‘str1’ - ‘str4’ conversion steps will likely have to be added to all the previous code as well.
This version also detects the variable names and the edits them to fit the final table (and intermediate operations, specifically combining the dates and times into a single datetime array, although I cannot determine if the other combinations of variable names to make them fit the column-size of the numeric data are correct).
Unfortunatley what I previous believed was reasonably robust code is not as robust as I thought it would be, and gives me new respect for the MathWorks folks who had to code readtable and readmatrix and the rest!
The latest version —
str = fileread('russfut.xls');
str2 = strrep(str,',','#');
str3 = strrep(str2,'.','');
str4 = strrep(str3,'#','.');
vars = regexp(str4,'([A-Z]\w*)','match');
VN1 = vars(:,7:end);
str = str4;
expression = '(\d+\/\d+\/\d+)|(\d+\:\d+\:\d+)|(\d*\.\d*)|\d*';
[tokens,matches] = regexp(str,expression,'tokens','match');
% matches
vmatches = matches(4:end);
arlim = numel(vmatches)
cols = numel(VN1)-1;
v = reshape(vmatches, cols, []).';
dnv = str2double(v(:,3:end));
dtv = datetime(v(:,1), 'InputFormat','dd/MM/yy') + timeofday(datetime(v(:,2), 'InputFormat','HH:mm:ss'));
T1 = array2table(dnv);
T1 = addvars(T1, dtv, 'Before',1);
VN1{1} = [VN1{1} ' ' VN1{2}];
VN1{8} = [VN1{7} ' ' VN1{8}];
VN1([2 7]) = [];
T1.Properties.VariableNames = VN1
VN = T1.Properties.VariableNames;
figure
plot(T1{:,1}, T1{:,[2 3 4]})
hold on
patch([T1{:,1}; flip(T1{:,1})], [T1{:,3}; flip(T1{:,4})], [1 1 1]*0.25, 'EdgeColor','none', 'FaceAlpha',0.25)
hold off
grid
axis('padded')
xlabel(VN{1})
ylabel('Value €')
legend(VN{[2 3 4]}, 'Location','best')
There could be endless variations on the formats of these files, and I cannot envision a robust code to handle all of them. It would be quite helpful to find a way for them to be made compatible with readmatrix. That, unfortunately, is beyond me.
.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 String Parsing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!