Convert a table in a pdf to a MATLAB cell structure

31 次查看(过去 30 天)
I have a pdf file that contains an Nx9 table of data that I need to turn into a matlab cell structure of an excel file. Some of the (row,column) entries are blank.
So far, I have tried reading the pdf using:
txt = extractFileText('filename.pdf');
This produces a 1x1 string file with multiple spaces breaking up rows in a seemingly random order. The (row,column) combinations do not appear in a logical position in txt. Is there another command that can read a PDF table?
  4 个评论
dpb
dpb 2021-1-30
Which is, it seems, what the scraping utilities do...get the boundaries of the table as rendered and then suck that area up.
Sim
Sim 2023-3-12
I have the same problem @Charles D'Onofrio @dpb @Stephen23...
The following function is not really helpful when a PDFs contains tables with blank cells:
txt = extractFileText('filename.pdf');
Has a new tool been created in the meantime, i.e. between January 2021 and today, middle of March 2023 ?

请先登录,再进行评论。

回答(2 个)

the cyclist
the cyclist 2023-3-12
I can strongly recommend using Tabula to first extract the table from the PDF file. Then use a MATLAB function (e.g. readtable) to bring the Tabula output into MATLAB.
  2 个评论
Sim
Sim 2023-3-12
编辑:Sim 2023-3-13
Thanks a lot @the cyclist! Do you know if Tabula is safe in terms of privacy and confidentiality?
the cyclist
the cyclist 2023-3-13
I've haven't used it for data that I would have privacy concerns about, but I think there are strong reasons to believe it is safe:
  • It's open-source, so you can see all the code on github
  • It doesn't seem to send your data anywhere else. Although it might seem like it is sending your data to a web site, it looks to me like it only opens a local browser window.
  • It was first built by journalists, who tend to care about privacy (at least of their own data!)

请先登录,再进行评论。


Suraj
Suraj 2023-3-29
Hi Charles
Your question seems very similar to one I've answered recenlty. Please have a look at this answer.
Hope this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by