With extractHTMLtext i have harvested a news article. How can I write paragraph-long blocks to a text file

1 次查看(过去 30 天)
The text analysis funcction created a clean, ASCII file out of a very complext newspaper article using the following code (which worked wel!):
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
Each paragraph became a line of text. How can I write these to an ascii file for import to a text processing program? One paragraph per line of output file (txt or xlsx) would be best.

回答(1 个)

Vatsal
Vatsal 2024-2-21
Hi,
To output the extracted text to an ASCII file, formatting each paragraph as a separate line, the text must first be divided into paragraphs. This can be achieved in MATLAB by utilizing the "split" function, which divides a string into a cell array of strings using designated delimiters.
Here is the modified code to write each paragraph to a text file:
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
str_split = split(str, '\n'); % Split the string into paragraphs
fileID = fopen('output.txt','w'); % Open a file named 'output.txt'. Change it as per your requirement.
for i = 1:numel(str_split)
fprintf(fileID,'%s\n',str_split{i}); % Write each paragraph on a new line
end
fclose(fileID); % Don't forget to close the file after you're done
I hope this helps!

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by