Reading in specific column and plotting bar chart

7 次查看(过去 30 天)
I have a text file as:
Heading A
------------------------
Heading B
GA008246-0_B_F_1852967891 X 7117
GA011810-0_B_F_1852968731 14 7380
GA017861-0_B_F_1852970072 22 7749
GA017864-0_T_R_1853027526 22 7751
GA017866-0_T_R_1853027527 22 7753
GA017875-0_B_R_1852970076 22 7755
I want to be able to plot a histogram of the 2nd column under the title Heading B. sometimes there are additonal lines under heading A.
This is what I have so far.
%Read in data file
fid = fopen('c:\myfile.txt','rt');
C = textscan (fid, '%s %s s', 'delimiter', '\t','headerlines', 1)
while (strcmp(C{1}{1}, 'Heading B') == 0)
C = textscan (fid, '%s %s %s', 'delimiter', '\t')
end
fclose(fid);
C{:,2}
But Im picking out one too early item i.e.
ans =
''
'X'
'14'
'22'
'22'
'22'
'22'
once the additional ' ' item is removed, how can I plot a bar chart showing the number of occurances of each of these int he list. i.e. in this example
X = 1 repetition 14 = 1 repetition 22 = 4 repetitions
Tanaks for any help. Jsaon

采纳的回答

Guillaume
Guillaume 2015-4-14
编辑:Guillaume 2015-4-14
I would use fgetl instead of textscan to find the start of the heading B section, then use textscan to read it.
fid = fopen('c:\myfile.txt','rt');
tline = fgetl(fid);
while ~isnumeric(tline) && ~strcmp(tline, 'Heading B')
tline = fgetl(fid);
end
if isnumeric(tline) %eol reach before Heading B
error('End of file reached prematurely');
end
C = textscan (fid, '%s %s %s', 'delimiter', '\t');
To find the number of repetitions in a column of C, use the third return value of unique together with histc:
[names, ~, position] = unique(C{2})
repetitions = histc(position, 1:numel(names))
%useful for seeing the result:
table(names, repetitions)
  5 个评论
Guillaume
Guillaume 2015-4-14
Oh, sorry I misunderstood. You also need to change the position and numbers of ticks (XTick property)
set(gca, 'XTickLabel', names, 'XTick', 1:numel(names))
should work.

请先登录,再进行评论。

更多回答(1 个)

Star Strider
Star Strider 2015-4-14
I don’t have your file, but I would change the textscan call to:
C = textscan (fid, '%s %f %f', 'delimiter', '\t','headerlines', 3)
The initial ‘X’ in column #2 will then show up as either '' or NaN, so you can eliminate it by using isempty or isnan, as appropriate.
  2 个评论
Jason
Jason 2015-4-14
编辑:Jason 2015-4-14
The problem is that there are sometimes lines under "Heading A", so the number of lines until I find "Heading B" is variable.
I actually want the X as well as the numbers (its to do with Chromosomes). Its actually this mixture of text and numbers in the cell array that I am finding it hard to plot a bar chart showing the frequency of each string.
I've included the txt file. Thanks
Star Strider
Star Strider 2015-4-14
编辑:Star Strider 2015-4-14
This works for the current file:
fidi = fopen('test1.txt');
C = textscan (fidi, '%s %s %s', 'delimiter', '\t','headerlines', 2);
C2 = C{2};
Ix = cellfun(@isempty,C2);
[C2u,ia,ic] = unique(C2(~Ix));
cnts = hist(ic,length(C2u));
figure(1)
bar(cnts)
xt = get(gca, 'XTick');
set(gca, 'XTick', xt, 'XTickLabel',C2u)
EDIT —
Added plot ...

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Labels and Annotations 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by