split a row into 2 rows
显示 更早的评论
cg00008493 0.987979722052904 "COX8C;KIAA1409" 14 93813777 0.986128428295584 "COX8C;KIAA1409" 14 93813777
cg00031162 0.378288688845672 "TNFSF12;TNFSF12-TNFSF13" 17 7453377 0.362510745266914 "TNFSF12;TNFSF12-TNFSF13" 17 7453377
here are 2 lines and each line have 8 columns, i want to split each line have 2 sets like "COX8C;KIAA1409" into 2 rows and delete the duplicated columns output should be like this:
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
cg00031162 0.378288688845672 "TNFSF12 0.362510745266914
cg00031162 0.378288688845672 TNFSF12-TNFSF13 0.362510745266914
fid = fopen('COADREAD_methylation.txt','r');
data={};
while ~feof(fid)
l=fgetl(fid);
if isempty(strfind(l,'NA')), data=[data;{l}]; end
a = reshape(l, ',','""', [])';
end
fid=fclose(fid);
Note: I used NA to remove the lines which have NA
采纳的回答
opt = {'CollectOutput',true};
inp = '%s%s%q%*d%*d%s%*q%*d%*d';
out = '%s\t%s\t%s\t%s\n';
f1d = fopen('temp1.txt','rt'); % the original file
f2d = fopen('temp2.txt','wt'); % the new file
while ~feof(f1d)
C = textscan(f1d,inp,1,opt{:});
C = [C{:}];
D = regexp(C{3},';','split');
for k = 1:numel(D)
fprintf(f2d,out,C{1:2},D{k},C{4});
end
end
fclose(f1d);
fclose(f2d);
Produces this output file:
cg00008493 0.987979722052904 COX8C 0.986128428295584
cg00008493 0.987979722052904 KIAA1409 0.986128428295584
cg00031162 0.378288688845672 TNFSF12 0.362510745266914
cg00031162 0.378288688845672 TNFSF12-TNFSF13 0.362510745266914
Tested on this input file:
18 个评论
hi friend , i have multiple columns around 547 ,not only those which i mentioned as in the example, so how could i update this input inp = '%s%s%q%*d%*d%s%*q%*d%*d';???
@chocho phD: you could:
- read the textscan documentation and learn how to specify the format yourself.
- upload a sample file and get some help (click the paperclip button).
- if the unwanted columns are all trailing, then try something like this:
inp = '%s%s%q%*d%*d%s%*[^\n]';
thanks a lot, i will take your remarks into consideration! Great Job..
Sorry for the low efficiency but i got this error in fprintf Error using fprintf Invalid file identifier. Use fopen to generate a valid file identifier.
Error in splitremove (line 11) fprintf(f2d,out,C{1:2},D{k},C{4});
@chocho phD: you need to provide the correct filepath to fopen.
i already opened so many files as i put all of them in one file to matlab path
It is not clear what "i already opened so many files as i put all of them in one file to matlab path" means. Can you please explain that clearly.
i mean why in the command window is showing me this error of "Use fopen to generate a valid file identifier". as i see in your code everything is so clear but it doesn't work see the second error in line 11 "fprintf(fid2,out,C{1:2},D{k},C{4});" plz tell me what i should do C:\Program Files (x86)\MATLAB\R2012a this is the file path in which i'm working on..
He used f1d (eff one dee) and f2d (eff two dee), not fid2 (eff eye dee two).
yes i see, so!!
hi friend any updates!!
@chocho phD: do not work in that directory.
That is the installation directory of MATLAB. It is not intended for your to use any installation directory for working in. NEVER use any of the Program Files folders for your MATLAB current directory.
You should be using a subdirectory of your user directory, e.g.:
C:\Users\<your user name>\Documents\MATLAB\Working
"C:\Program Files (x86)\MATLAB\R2012a this is the file path in which i'm working on"
You cannot write to any directory under "C:\Program Files (x86)" because MS Windows will not allow that. You need to cd to a different directory and work there.
got it ! thank you too much
Stephen Cobeldick your code work very good but could you change it by using loop i tried but i fail. error:Index exceeds matrix dimensions. Error in splitremove (line 13) D = regexp(C{3},';','split');
inp = '%s %f1%s%d%d %f2%s%d%d %f3%s%d%d ........... ;
If textscan has an empty output then you probably need to check the format string.
could you tell me how to present the format of this line? cg00000292 0.511852232819811 ATP2A1 0.787687855895422 0.51208122605745 0.599610258157912 0.568034757766559
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 File Operations 的更多信息
标签
尚未输入任何标签。
另请参阅
选择网站
选择网站以获取翻译的可用内容,以及查看当地活动和优惠。根据您的位置,我们建议您选择:。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
