split a row into 2 rows
3 次查看(过去 30 天)
显示 更早的评论
cg00008493 0.987979722052904 "COX8C;KIAA1409" 14 93813777 0.986128428295584 "COX8C;KIAA1409" 14 93813777
cg00031162 0.378288688845672 "TNFSF12;TNFSF12-TNFSF13" 17 7453377 0.362510745266914 "TNFSF12;TNFSF12-TNFSF13" 17 7453377
here are 2 lines and each line have 8 columns, i want to split each line have 2 sets like "COX8C;KIAA1409" into 2 rows and delete the duplicated columns output should be like this:
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
cg00031162 0.378288688845672 "TNFSF12 0.362510745266914
cg00031162 0.378288688845672 TNFSF12-TNFSF13 0.362510745266914
fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
l=fgetl(fid);
if isempty(strfind(l,'NA')), data=[data;{l}]; end
a = reshape(l, ',','""', [])';
end
fid=fclose(fid);
Note: I used NA to remove the lines which have NA
0 个评论
采纳的回答
Stephen23
2017-2-16
编辑:Stephen23
2017-2-17
opt = {'CollectOutput',true};
inp = '%s%s%q%*d%*d%s%*q%*d%*d';
out = '%s\t%s\t%s\t%s\n';
f1d = fopen('temp1.txt','rt'); % the original file
f2d = fopen('temp2.txt','wt'); % the new file
while ~feof(f1d)
C = textscan(f1d,inp,1,opt{:});
C = [C{:}];
D = regexp(C{3},';','split');
for k = 1:numel(D)
fprintf(f2d,out,C{1:2},D{k},C{4});
end
end
fclose(f1d);
fclose(f2d);
Produces this output file:
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
cg00031162 0.378288688845672 TNFSF12 0.362510745266914
cg00031162 0.378288688845672 TNFSF12-TNFSF13 0.362510745266914
Tested on this input file:
18 个评论
Stephen23
2017-2-22
If textscan has an empty output then you probably need to check the format string.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 File Operations 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!