-- requires the Text Analytics Toolbox
Reading data from PDF can be a technical challenge. PDF is not designed as a "data container" plus "commands to format containers" like CSS can be. PDF is a Page Description Language, and consists of commands to move to particular positions, draw this and that, and so on.
For example, 'fifth' might be stored in the file as a position for the leading 'f' and then a position for the second 'f', and then a position to draw a single symbol that is a 'th' ligature, and then a position to draw the 'i' close to the first 'f' .
The number of symbols encoded is not necessarily the same as the number of characters, and the positions are not generally one after the other. And the command language includes loops. For example the two 'f' of 'fifth' might be done by preparing an 'f' symbol and then a single command to composite the one symbol to two different locations.
To extract text semi-reliably from a page description language, you have to execute the commands and figure out what the result was.