Importdata does not import whole .txt file

9 次查看(过去 30 天)
I'm encountering a problem importing a .txt file containing mass spectrometry data. Somewhere along the way it just stops importing the remaining part of my .txt file (in total over 777.000 lines)
The data I'm trying to import describes a scan event from the mass spectrometer. The header (from 'BEGIN IONS' to 'SCANS=scannumber') describes certain properties of the scan event. The numbers between 'SCANS=scannumber' and 'END IONS' describe the spectrum (the actual data, but worthless without the header); the first column being m/z values , the second ion intensities .
My data looks lik this (this is one scan event):
BEGIN IONS
TITLE=Spectrum2667 scans: 5993,
PEPMASS=897.52844 17418.17383
CHARGE=2+
RTINSECONDS=3127
SCANS=5993
176.86790 128.299
181.97141 139.498
221.90227 139.841
341.23862 982.842
END IONS
I want to extract certain scan events based on their scannumber; another script tells me which ones to extract from this file. But, MATLAB is not importing all my scans. Therefore I am missing some of the data (and my other script gives an error, because the scan number it is looking for is not present).
I have just over 6000 of these scans in one .txt file. For some reason, MATLAB stops somewhere near the end of my file. At a certain scan event, it stops halfway the list describing the spectrum. The code I use to import the data is:
List(:,1) = importdata('MyData.txt');
Because I just need a list of all the scan events and write them to a new file after I have extracted the scan events that I want, it is of no importance to import the file in two columns or split the header etc; I just want the complete list all the way to the end of my .txt file.
I've looked in my .txt file, but there is no different space and/or tab format at this particular line in the .txt file.
If someone could help me solve my problem, I would be very happy.
Here is a dropbox link to https://www.dropbox.com/s/ijp5mvtvrm0ob9w/140708_LO_03_140710112412.txt it was too large to attach.
  4 个评论
Sara
Sara 2014-7-11
There is an error when clicking on the file. Have you tried cutting out only the last entry by itself and see if the code fails. Maybe it's not about the number of elements but I'm really just guessing here.
Luuk van Oosten
Luuk van Oosten 2014-7-11
https://www.dropbox.com/s/ijp5mvtvrm0ob9w/140708_LO_03_140710112412.txt
Don't know what went wrong...I'm sorry. here it is.

请先登录,再进行评论。

采纳的回答

Sara
Sara 2014-7-11
I don't know what is wrong with importdata. This version will work. The size of k was based on your file, it may need to be changed if you change file.
k = cell(1332160,1);
j = 0;
fid = fopen('140708_LO_03_140710112412.txt','r');
while 1
t = fgetl(fid);
if(~ischar(t)),break,end
j = j + 1;
k{j} = t;
end
k = k(8:j-2);
  3 个评论
Luuk van Oosten
Luuk van Oosten 2014-7-15
Got it, by changing
k = k(8:j-2);
to
k=k(1:j)
everything works fine. Thanks again.
Sara
Sara 2014-7-15
I thought you didn't need that part :) and the number was totally casual, just a big one.

请先登录,再进行评论。

更多回答(3 个)

per isakson
per isakson 2014-7-15
编辑:per isakson 2014-7-15
"For some reason, MATLAB stops somewhere near the end of my file."
In Matlab, there is no high level function that reads and parses your text file, i.e. a file with repeated headers and blocks of data.
&nbsp
"[...]the actual data, but worthless without the header" .
I have a function, read_blocks_of_numerical_data, that reads only the actual data.
>> g=read_blocks_of_numerical_data('140708_LO_03_140710112412.txt',50);
>> whos g
Name Size Bytes Class Attributes
g 1x2142 21279808 cell
>> g{1234}
ans =
1.0e+06 *
0.0001 0.0023
0.0001 0.0022
0.0001 0.0022
.......
I attached the m-file. Somebody else might want to try it.
  1 个评论
Luuk van Oosten
Luuk van Oosten 2014-7-15
Thank you for clearing that up! At this point I do not want to extract solely the actual data, but maybe your script can be of any use later in my project!

请先登录,再进行评论。


Cedric
Cedric 2014-7-15
编辑:Cedric 2014-7-15
Here is an alternate way based on regular expressions
content = fileread( '140708_LO_03_140710112412.txt' ) ;
pattern = ['TITLE=(?<title>[^\r\n]*)\s*', ...
'PEPMASS=(?<pepmass>[^\r\n]*)\s*', ...
'CHARGE=(?<charge>[^\r\n]*)\s*', ...
'RTINSECONDS=(?<rtinseconds>\d*)\s*', ...
'SCANS=(?<scans>\d*)\s*', ...
'(?<spectrum>[^E]*)'] ;
data = regexp( content, pattern, 'names' ) ;
for k = 1 : numel( data )
data(k).pepmass = sscanf( data(k).pepmass, '%f' )' ;
data(k).rtinseconds = sscanf( data(k).rtinseconds, '%d' ) ;
data(k).scans = sscanf( data(k).scans, '%d' ) ;
data(k).spectrum = sscanf( data(k).spectrum, '%f', [2, Inf] )' ;
end
Running this, you get e.g.
>> data
data =
1x2142 struct array with fields:
title
pepmass
charge
rtinseconds
scans
spectrum
>> data(1000)
ans =
title: 'Spectrum1000 scans: 3128,'
pepmass: [630.9374 2.4366e+05]
charge: '7+'
rtinseconds: 1987
scans: 3128
spectrum: [885x2 double]
>> select = [data.scans] > 5200 ;
>> data(select)
ans =
1x6 struct array with fields:
title
pepmass
charge
rtinseconds
scans
spectrum

Sanket Mishra
Sanket Mishra 2014-7-10
Put importdata command into try and catch block and look for the exception that gets displayed. This might help you.
try
List = importdata();
catch ex
disp(ex);
end
I would suggest you to use textscan instead of importdata which is more suitable to your workflow. Please follow the below link to the documentation of textscan

类别

Help CenterFile Exchange 中查找有关 String Parsing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by