detecting Excel files without Excel extension
5 次查看(过去 30 天)
显示 更早的评论
I need to be able to identify an Excel file without using xlsfinfo. It is a VERY slow function. Without it, my algorithm runs faster. Any ideas?
Background: Our test setup outputs text files that are named with the ".xls" extension. Back in Excel 2003 and before, if you named a tab delimited text file with "xls", Excel would open the file no problem and properly interpret it. Now, it is causing me headaches with importing data into MATLAB. MATLAB uses the ".xls" extension to check for wrong input to "textscan" and "importdata." Here is where it gets interesting. When you open a text file with the ".xls" extension and then save it, it is now an excel file and no longer tab delimited text file. Basically, I need to be able to tell the difference between text and Excel files without xlsfinfo because xlsfinfo is very slow.
0 个评论
采纳的回答
Iain
2013-5-23
You can try something like
fid = fopen(filename,'r');
fifty_bytes = fread(fid,50,'uint8'); fclose(fid);
Valid_characters = ['A':'Z' 'a':'z' ',./\ 0123456789' 9 10 13 '!"£$%^&*()[]{}-=_+ ... and any other valid characters you expect in your text files. ]
excel_file = false;
for i = 1:50
if any(fifty_bytes ~= Valid_characters)
excel_file = true;
end
end
3 个评论
更多回答(2 个)
José-Luis
2013-5-23
编辑:José-Luis
2013-5-23
You could always check the file signature. You would need to read the file as a binary file and look into specific locations, using hexadecimal values.
Here is a snippet that will give you a string of the hexadecimal values:
fid = fopen('/path/to/test.xls','r');
uid = fread(fid,4,'ushort','b');
fseek(fid,520,'bof');
ftype = fread(fid,4,'ushort','b');
fclose(fid);
s1 = sprintf('%X',uid)
s2 = sprintf('%X',ftype)
If the file is an office document, s1 should be:
D0 CF 11 E0 A1 B1 1A E1 without the spaces
It it is an excel spreasheet, then s2 should be:
FD FF FF FF nn xx where the last two values (nn and xx) can change within a certain range.
It would be a matter of using strcmp(). You could of also compare hexadecimal values directly with fi(). For your purpose, it sounds like it would be enough to check if it is an Office document.
Text files carry no file identifier.
4 个评论
José-Luis
2013-5-24
What are you getting? Maybe it is a little endian system? You could try with:
uid = fread(fid,4,'ushort','l');
Same thing for ftype.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Spreadsheets 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!