DOCTYPE error from xmlread

3 次查看(过去 30 天)
KAE
KAE 2019-11-14
编辑: KAE 2019-11-18
I have been given an XML file and am trying to read it with xmlread. If I call it either of the following ways,
DOMNode = xmlread(fileXml, 'AllowDoctype', true);
DOMNode = xmlread(fileXml, 'AllowDoctype', false);
xmlread crashes at the line indicated below,
try
parseResult = p.parse(fileObj);
catch ME
% If trying to parse an XML document containing a DOCYTYPE declaration
% with 'AllowDoctype' set to false, then throw an appropriate error
% message.
if isa(ME, 'matlab.exception.JavaException') && ...
contains(char(ME.ExceptionObject.getLocalizedMessage), ...
'http://apache.org/xml/features/disallow-doctype-decl')
error(message('MATLAB:xmlread:DoctypeDisabled', filename));
end
rethrow(ME); % crashes here
end

采纳的回答

KAE
KAE 2019-11-14
编辑:KAE 2019-11-18
It turns out this was not an xmlread issue, and has nothing to do with AllowDoc, but instead is due to a problem with the XML file. Here is info in case it helps someone.
The XML file contains international place names, so there are non-English characters which appear as question marks and seem to mess up adjacent field closings. For example it crashes on this line
<field name="geocity">Matar?/field>
for Mataró (in Spain) but not if it is manually edited to
<field name="geocity">Mataro</field>
Incidentally a good way to find problem lines in an XML file is to open it in a web browser, which will tell you which line it couldn't read (if you have a long XML file scroll to the top once it's opened in the browser to see the error message).
I will mention that the first line of the XML file does not specify the encoding, which I believe can cause problems with non-English characters, but I was never able to find an encoding choice that eliminated the errors,
<?xml version="1.0"?>

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Structured Data and XML Documents 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by