large textfile 27580*1102 cell
2 次查看(过去 30 天)
显示 更早的评论
fid = fopen('Cancer.txt','r');
data={};
while ~feof(fid)
l=fgetl(fid); %get the lines
if isempty(strfind(l,'NA')), %remove NA rows
else
continue
end
%read next line
idx=regexp(l,'\t','split'); %split the colmuns of this line which don't have NA and look for ';' in every column and split it
[nrow,ncol]=size(idx);
for i=1:ncol
if idx(i)==';' %look for columns which have ';'and split it
split this column into 2 columns and put the second column
into a new row
idx = regexp(idx,';','split')
l=[{l(1:idx-1)}; {[l(1:itab) l(idx+1:end)]}]; %split the line into 2
end
i=i+1;
end
fprintf(fid,l,idx);
end
fid=fclose(fid);
inputs:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05
Composite Element REF Beta_value Gene_Symbol Chromosome Genomic_Coordinate Beta_value Gene_Symbol
cg00000292 0.511852232819811 ATP2A1 16 28890100 0.787687855895422 ATP2A1
cg00003994 0.0341977140819682 MEOX2 15725862 0.334815614333325 MEOX2
cg00008493 0.987979722052904 "COX8C;KIAA1409" 14 93813777 0.986128428295584 "COX8C;KIAA1409"
cg00011459 0.922491239231445 "TMEM186;PMM2" 16 8890425 0.961124285303233 "TMEM186;PMM2"
output:
Hybridization REF TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 TCGA-A6-2672-11A-01D-1551-05 ……
cg00000292 0.511852232819811 ATP2A1 0.787687855895422
cg00003994 0.0341977140819682 MEOX2 0.334815614333325
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
4 个评论
采纳的回答
Rik
2017-2-21
So essentially you have a tab separated file, where you only want to keep specific columns.
You can read a file like this with readtable. If you really have to go through it line-by-line you can use a for loop, but with this syntax you should be able to select the columns you want to keep. (and with writetable you can write the new file)
Note1: You can set the 'Delimiter' parameter to a tab with '\t'.
Note2: You'll need Matlab 2013b or later. Otherwise you'll have to muck about with the textscan function.
5 个评论
Rik
2017-2-21
If you have managed to convert your data to a matrix, then you can use the command mean(data,2) to get the average along the 2nd dimension (so the columns)
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!