hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 次查看（过去 30 天）

显示更早的评论

chocho 2017-2-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/325176-hi-guys-i-want-to-read-a-text-file-line-by-line-and-remove-the-lines-which-have-na-and-the-duplica

编辑： Walter Roberson 2017-2-20

采纳的回答： dpb

COADREAD_methylation.txt

在 MATLAB Online 中打开

d = fopen('COADREAD_methylation.txt','r');
this_line=0;
all={};
while this_line~=-1
 % C= textscan( d, '%f%s'  ) ;
    this_line=fgetl(d);
   if this_line~=-1
       all=[all;this_line];
   end
end
fclose(d);

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Stephen23 2017-2-17

编辑：Stephen23 2017-2-17

采纳的回答

dpb 2017-2-15

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/325176-hi-guys-i-want-to-read-a-text-file-line-by-line-and-remove-the-lines-which-have-na-and-the-duplica#answer_254913

编辑：dpb 2017-2-16

在 MATLAB Online 中打开

Well, 'NA' is easy, not sure what defines the repeated columns; not enough time at present to try to parse that input file to figure out what is/isn't unique without a description being supplied...

fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);
  if isempty(strfind(l,'NA')), data=[data;{l}]; end
end
fid=fclose(fid);

If the presence of 'NA' is all that's needed to get all the offending records, then you're done; otherwise need more details on how to tell so folks here don't have to try to work it out on their own.

13 个评论
显示 11更早的评论隐藏 11更早的评论

chocho 2017-2-20

编辑：Walter Roberson 2017-2-20

在 MATLAB Online 中打开

hi friend, i want to make this code like this format

Note: i want to get every line and check if it has a NA remove it and get the second line, if not ckeck the columns of this line and see which column have ';' split this column and make 2 rows

fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
  l=fgetl(fid);   %get the lines
    if isempty(strfind(l,'NA')),  %remove NA rows
    else 
        %read next line
      idx=regexp(l,'\t','split');   %split the colmuns of this line which don't have NA and look for ';' in every column and split it 
      [nrow,ncol]=size(idx);  
           for i=1:ncol  
                 if idx(i)==';'  %look for columns which have ';'and split it 
                     split this column into 2 columns and put the second column
                     into a new row
                      %D = regexp(idx,';','split')
                      %l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
                 end
                     i=i+1;
           end
            save this line % this line will have no NA and if have ; will be splitted
      end
  end
  fid=fclose(fid);

chocho 2017-2-20

编辑：Walter Roberson 2017-2-20

在 MATLAB Online 中打开

inputs:

Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
Composite Element REF  Beta_value  Gene_Symbol  Chromosome  Genomic_Coordinate  Beta_value    Gene_Symbol
cg00000292  0.511852232819811  ATP2A1   16  28890100  0.787687855895422  ATP2A1
cg00002426  0.519102187746053  SLMAP    3  57743543  0.932889308560864  SLMAP
cg00006414  NA  "ZNF425;ZNF398"  7  148822837  NA  "ZNF425;ZNF398"  
cg00008493  0.987979722052904  "COX8C;KIAA1409"  14  93813777  0.986128428295584      "COX8C;KIAA1409"  
cg00011459  0.922491239231445  "TMEM186;PMM2"  16  8890425  0.961124285303233  "TMEM186;PMM2"

outputs:

Hybridization REF  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05  TCGA-A6-2672-11A-01D-1551-05
cg00000292  0.511852232819811  ATP2A1   0.787687855895422  
cg00002426  0.519102187746053  SLMAP       0.932889308560864  
cg00008493  0.987979722052904  COX8C     0.986128428295584      
cg00008493  0.987979722052904  KIAA1409  0.986128428295584        
cg00011459  0.922491239231445  TMEM186  0.961124285303233  
cg00011459  0.922491239231445  PMM2                0.961124285303233

appreciate your help !

请先登录，再进行评论。

类别

MATLAB Data Import and Analysis Large Files and Big Data

在 Help Center 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

13 个评论
显示 11更早的评论隐藏 11更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

hi guys , i want to read a text file line by line and remove the lines which have NA and the duplicated columns

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

13 个评论 显示 11更早的评论隐藏 11更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

13 个评论
显示 11更早的评论隐藏 11更早的评论