Counting the unique values

1 次查看(过去 30 天)
Pat
Pat 2012-7-18
I have a values
out=
c1 c2,,,,,,,,,,,,,,,,,,,,,c5
'gene1' 'd' 'u' 'd' 'u' 'd'
'gene2' 'u' 'u' 'u' 'u' 'd'
'gene3' 'u' 'u' 'd' 'u' 'u'
'gene4' 'd' 'u' 'u' 'd' 'd'
'gene5' 'u' 'u' 'u' 'u' 'd'
'gene6' 'd' 'u' 'u' 'u' 'u'
'gene7' 'd' 'u' 'd' 'u' 'u'
Taking the first column 'c1' value for gene1 is 'd' this value must be compared with all other column if there is more than 3 same vales it should be displayed,,Only first clumn must be compared with others columns
in 1st col there are 3d's ,2nd col 4u's(since 1st col is u),3rd col
4u's,,,,,,,,,,,6&7th gene's there are only 1 ans 2 d's respectively,so it should be deleted
So i need output as
'gene1' 'd' 'u' 'd' 'u' 'd'
'gene2' 'u' 'u' 'u' 'u' 'd'
'gene3' 'u' 'u' 'd' 'u' 'u'
'gene4' 'd' 'u' 'u' 'd' 'd'
'gene5' 'u' 'u' 'u' 'u' 'd'
Pleae help
  1 个评论
Jan
Jan 2012-7-18
编辑:Jan 2012-7-18
@Pat: Please post the input data in valiud Matlab syntax, such that we can try our suggestion by copy&paste. The time required to guess what "c1 c2,,,,,c5" should mean is wasted, because you know this detail.

请先登录,再进行评论。

采纳的回答

Freddy
Freddy 2012-7-18
Hello Pat,
first idea i came up with:
A = {'','c1','c2','c3','c4','c5';
'gene1' 'd' 'u' 'd' 'u' 'd';...
'gene2' 'u' 'u' 'u' 'u' 'd';...
'gene3' 'u' 'u' 'd' 'u' 'u';...
'gene4' 'd' 'u' 'u' 'd' 'd';...
'gene5' 'u' 'u' 'u' 'u' 'd';...
'gene6' 'd' 'u' 'u' 'u' 'u';...
'gene7' 'd' 'u' 'd' 'u' 'u'};
limit = 3;
F = cell2mat(A(2:end,2:end));
A(logical([0;sum(bsxfun(@eq,F,F(:,1)),2)>=limit]),:);
Hopefully it will help you.
Freddy
  1 个评论
Jan
Jan 2012-7-18
You can omit the "logical", if you use:
A([true; sum(bsxfun(@eq, F, F(:,1)), 2) >= limit], :)

请先登录,再进行评论。

更多回答(2 个)

Walter Roberson
Walter Roberson 2012-7-18
c1_column = 2; %looks like column 2 to me, since column 1 has gene name
match_count = arrayfun(@(K) sum(out{K,cl_column} == [out{K,c1_column+1:end}]), 1:size(out,1));
out(match_count > 3, :)
Your problem description is inconsistent about what to do if the number of matches is exactly 3. You wrote that it has to be more than 3, but your sample output includes the case where it is exactly 3.
  2 个评论
Pat
Pat 2012-7-18
Walter thats the sample program,in your code i get error as
Undefined function or variable 'cl_column'.
Jan
Jan 2012-7-18
@Pat: Come on, I'm sure you are able to fix this typo by your own. The "1" (one) looks very similar to the "l" (lowercase L). It is your job to participate as far as possible in the solution of your problems.

请先登录,再进行评论。


Bjorn Gustavsson
Bjorn Gustavsson 2012-7-18
Pat,
I strongly suggest you change the encoding of your data! Make your variable an integer array with for example 0 for 'd' and 1 for 'u'. Then you could do something like this:
genes = [0,1,0,1,0
1 1 1 1 0
1 1 0 1 1
0 1 1 0 0];
lim4disp = 3;
genes(sum(repmat(genes(:,1),[1,size(genes,2)]) == genes,2)>=lim4disp,:)

类别

Help CenterFile Exchange 中查找有关 Genomics and Next Generation Sequencing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by