large textfile 27580*1102 cell
    2 次查看(过去 30 天)
  
       显示 更早的评论
    
fid = fopen('Cancer.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);   %get the lines
    if isempty(strfind(l,'NA')),  %remove NA rows
    else 
        continue
    end
        %read next line
     idx=regexp(l,'\t','split');   %split the colmuns of this line which don't have NA and look for ';' in every column and split it 
      [nrow,ncol]=size(idx);  
      for i=1:ncol  
                if idx(i)==';'   %look for columns which have ';'and split it 
                split this column into 2 columns and put the second column
                       into a new row
                        idx = regexp(idx,';','split')
                        l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
                end
                      i=i+1;
             end
            fprintf(fid,l,idx);
  end
  fid=fclose(fid);
inputs:
Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
Composite Element REF  Beta_value  Gene_Symbol  Chromosome  Genomic_Coordinate  Beta_value  Gene_Symbol
cg00000292  0.511852232819811  ATP2A1  16  28890100  0.787687855895422  ATP2A1
cg00003994  0.0341977140819682    MEOX2   15725862  0.334815614333325     MEOX2
cg00008493  0.987979722052904  "COX8C;KIAA1409"  14  93813777  0.986128428295584  "COX8C;KIAA1409"
cg00011459  0.922491239231445  "TMEM186;PMM2"  16  8890425  0.961124285303233  "TMEM186;PMM2"
output:
Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05 ……
cg00000292  0.511852232819811  ATP2A1   0.787687855895422  
cg00003994  0.0341977140819682    MEOX2   0.334815614333325     
cg00008493  0.987979722052904  COX8C     0.986128428295584      
cg00008493  0.987979722052904  KIAA1409  0.986128428295584
4 个评论
采纳的回答
  Rik
      
      
 2017-2-21
        So essentially you have a tab separated file, where you only want to keep specific columns.
You can read a file like this with readtable. If you really have to go through it line-by-line you can use a for loop, but with this syntax you should be able to select the columns you want to keep. (and with writetable you can write the new file)
Note1: You can set the 'Delimiter' parameter to a tab with '\t'.
Note2: You'll need Matlab 2013b or later. Otherwise you'll have to muck about with the textscan function.
5 个评论
  Rik
      
      
 2017-2-21
				If you have managed to convert your data to a matrix, then you can use the command mean(data,2) to get the average along the 2nd dimension (so the columns)
更多回答(0 个)
另请参阅
类别
				在 Help Center 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

