Reading in specific column and plotting bar chart

Question

0 个投票

I have a text file as:

Heading A
------------------------
Heading B
GA008246-0_B_F_1852967891  X  7117
GA011810-0_B_F_1852968731  14  7380
GA017861-0_B_F_1852970072  22  7749
GA017864-0_T_R_1853027526  22  7751
GA017866-0_T_R_1853027527  22  7753
GA017875-0_B_R_1852970076  22  7755

I want to be able to plot a histogram of the 2nd column under the title Heading B. sometimes there are additonal lines under heading A.

This is what I have so far.

%Read in data file
fid = fopen('c:\myfile.txt','rt');
 C = textscan (fid, '%s %s s', 'delimiter', '\t','headerlines', 1)
while (strcmp(C{1}{1}, 'Heading B') == 0)
     C = textscan (fid, '%s %s %s', 'delimiter', '\t')
end
 fclose(fid);
C{:,2}

But Im picking out one too early item i.e.

ans =

    ''
    'X'
    '14'
    '22'
    '22'
    '22'
    '22'

once the additional ' ' item is removed, how can I plot a bar chart showing the number of occurances of each of these int he list. i.e. in this example

X = 1 repetition 14 = 1 repetition 22 = 4 repetitions

Tanaks for any help. Jsaon

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Guillaume 2015-4-14

编辑：Guillaume 2015-4-14

在 MATLAB Online 中打开

0 个投票

I would use fgetl instead of textscan to find the start of the heading B section, then use textscan to read it.

fid = fopen('c:\myfile.txt','rt');
tline = fgetl(fid);
while ~isnumeric(tline) && ~strcmp(tline, 'Heading B')
   tline = fgetl(fid);
end
if isnumeric(tline) %eol reach before Heading B
   error('End of file reached prematurely');
end
C = textscan (fid, '%s %s %s', 'delimiter', '\t');

To find the number of repetitions in a column of C, use the third return value of unique together with histc:

[names, ~, position] = unique(C{2})
repetitions = histc(position, 1:numel(names))
%useful for seeing the result:
table(names, repetitions)

5 个评论
显示 3更早的评论隐藏 3更早的评论

Guillaume 2015-4-14

在 MATLAB Online 中打开

Oh, sorry I misunderstood. You also need to change the position and numbers of ticks (XTick property)

set(gca, 'XTickLabel', names, 'XTick', 1:numel(names))

should work.

Jason 2015-4-15

Perfect, thankyou.

请先登录，再进行评论。

Answer 2

Star Strider 2015-4-14

在 MATLAB Online 中打开

1 个投票

I don’t have your file, but I would change the textscan call to:

C = textscan (fid, '%s %f %f', 'delimiter', '\t','headerlines', 3)

The initial ‘X’ in column #2 will then show up as either '' or NaN, so you can eliminate it by using isempty or isnan, as appropriate.

2 个评论
显示无隐藏无

Jason 2015-4-14

编辑：Jason 2015-4-14

test1.txt

The problem is that there are sometimes lines under "Heading A", so the number of lines until I find "Heading B" is variable.

I actually want the X as well as the numbers (its to do with Chromosomes). Its actually this mixture of text and numbers in the cell array that I am finding it hard to plot a bar chart showing the frequency of each string.

I've included the txt file. Thanks

Star Strider 2015-4-14

编辑：Star Strider 2015-4-14

在 MATLAB Online 中打开

This works for the current file:

fidi = fopen('test1.txt');
C = textscan (fidi, '%s %s %s', 'delimiter', '\t','headerlines', 2);
C2 = C{2};
Ix = cellfun(@isempty,C2);
[C2u,ia,ic] = unique(C2(~Ix));
cnts = hist(ic,length(C2u));
figure(1)
bar(cnts)
xt = get(gca, 'XTick');
set(gca, 'XTick', xt, 'XTickLabel',C2u)

EDIT —

Added plot ...

请先登录，再进行评论。

Reading in specific column and plotting bar chart

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

5 个评论
显示 3更早的评论隐藏 3更早的评论

更多回答（1 个）

2 个评论
显示无隐藏无

类别

标签

Community Treasure Hunt

Reading in specific column and plotting bar chart

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

5 个评论 显示 3更早的评论 隐藏 3更早的评论

更多回答（1 个）

2 个评论 显示 无 隐藏 无

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论

2 个评论
显示无隐藏无