With extractHTMLtext i have harvested a news article. How can I write paragraph-long blocks to a text file
1 次查看(过去 30 天)
显示 更早的评论
The text analysis funcction created a clean, ASCII file out of a very complext newspaper article using the following code (which worked wel!):
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
Each paragraph became a line of text. How can I write these to an ascii file for import to a text processing program? One paragraph per line of output file (txt or xlsx) would be best.
0 个评论
回答(1 个)
Vatsal
2024-2-21
Hi,
To output the extracted text to an ASCII file, formatting each paragraph as a separate line, the text must first be divided into paragraphs. This can be achieved in MATLAB by utilizing the "split" function, which divides a string into a cell array of strings using designated delimiters.
Here is the modified code to write each paragraph to a text file:
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
str_split = split(str, '\n'); % Split the string into paragraphs
fileID = fopen('output.txt','w'); % Open a file named 'output.txt'. Change it as per your requirement.
for i = 1:numel(str_split)
fprintf(fileID,'%s\n',str_split{i}); % Write each paragraph on a new line
end
fclose(fileID); % Don't forget to close the file after you're done
I hope this helps!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Data Preparation 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!