how can i change an indice in Matrix as vector?
3 次查看(过去 30 天)
显示 更早的评论
I have sequences as character arrays. I need to search particular characters and change them with vectors(Boolean representations).
So finally i need 3 D matrix.
It worked for one sequences but i have 96000 more. I tried to do with loops but i get error.
Theese are my code for one sequences bu i need to do for 96000 sequences.
I need your help about that issue, Thanks in advance
p1_1=sequences;
% first sequence selected and converted to character array
Chp1_1=char(p1_1(1,:));
% from first character to end of sequences search for every character to replace boolean representation
SeqL = length(Chp1_1);
for i=1:SeqL
X = Chp1_1(1,i)
switch X
case 'A'
M(i,:) = A1;
case 'C'
M(i,:) = C1;
case 'D'
M(i,:) = D1;
case 'E'
M(i,:) = E1;
case 'F'
M(i,:) = F1;
case 'G'
M(i,:) = G1;
case 'H'
M(i,:) = H1;
case 'I'
M(i,:) = I1;
case 'K'
M(i,:) = K1;
case 'L'
M(i,:) = L1;
case 'M'
M(i,:) = M1;
case 'N'
M(i,:) = N1;
case 'P'
M(i,:) = P1;
case 'Q'
M(i,:) = Q1;
case 'R'
M(i,:) = R1;
case 'S'
M(i,:) = S1;
case 'T'
M(i,:) = T1;
case 'V'
M(i,:) = V1;
case 'W'
M(i,:) = W1;
case 'Y'
M(i,:) = Y1;
end
end
4 个评论
Guillaume
2019-11-26
编辑:Guillaume
2019-11-26
It's important to use notation that actually reflects your data. Otherwise, the code we give you might not work. It's also important to use the proper notation. Because now, we're left wondering:
- Do you have numbered variables as per your Protein_1, Protein_2, etc.
- Do you have a cell array of char vector as per your "{1,96000}" which is a cell array notation
- Do you have a string array as per your "in the [...] string array"
回答(3 个)
Guillaume
2019-11-25
First, probably the most important thing: numbered or sequentially named variables are always a very bad idea. they always make the code more complicated, not easier, to write. For example, with your protein_1, protein_2, ... protein_96000 you cannot easily apply the same code to each variable, whereas if you just had one variable, for example a cell array called protein, you could just use a loop to apply the same code to each:
for p = 1:numel(protein)
dosomethingwith(protein{p});
end
Same with your horrible switch...case and your A1, C1, etc. You end up rewriting many times the same thing with only one variation, with increased risk that you make a mistake on one line. Computers are very good at doing repetitive things, so why do you end up doing the repetition yourself.
Anything that is numbered or sequentially named should be just one variable that you index instead.
So, with regards to your transformation, first create two variables, the first one the list of letters to transform and the second one what they need to be transformed into, eg:
letters = 'ACDEFGHIKLMNPQSTVWY'.'; %column vector of letters
acid = [1 0 0 0 0;
0 1 0 0 0;
0 0 1 0 0;
0 0 0 1 0;
..etc.
];
For pretty display we could even put them into a table:
map = table(letters, acid);
Now that we have that transforming a sequence of letters into a 2D matrix is trivial:
prot = 'ACDKLMEGAC'; %content and length doesn't matter
[found, whichrow] = ismember(prot, map.letters); %find which row of letters correspond to each letter of prot
assert(all(found), 'some letters of the input are invalid');
transformed = map.acid(whichrow, :); %and use the correspond row of acid instead
%all done!
And assuming protein is the above mentioned cell array where all the sequences are the same length, then:
transformed = zeros(numel(protein{1}, size(map.acid, 2), numel(protein))); %preallocated 3D array
for p = 1:numel(protein)
[found, whichrow] = ismember(protein{p}, map.letters); %find which row of letters correspond to each letter of prot
assert(all(found), 'some letters of protein %d are invalid', p);
transformed(:, :, p) = map.acid(whichrow, :); %and use the correspond row of acid instead
end
See how short the code can be once you don't have numbered variables and use indexing instead?
0 个评论
Philippe Lebel
2019-11-25
I am not sure what you are trying to do as a whole, but if you want to quickly find where there are occurences of a certain string, use strfind().
a = 'aasdasffwfdasda';
your_sequence_of_bools_for_letter_a = [true false true];
idx = strfind(a,'a')
ans =
1 2 5 12 15
M=cell(1,length(a));
for i=1:length(idx)
M{idx(i)} = your_sequence_of_bools_for_letter_a;
end
Philippe Lebel
2019-11-25
Now i understand.
Here is a solution that you can easily expand.
clear
protein(1).name = 'A';
protain(1).bool_value = [1 0 0];
protein(2).name = 'B';
protain(2).bool_value = [0 1 0];
protein(3).name = 'C';
protain(3).bool_value = [0 0 1];
protein_name_list = [protein.name];
sequences = ['ABC';'CCC';'CAB'];
M=cell(1,length(sequences));
for i=1:length(sequences)
resulting_bool = [];
sequence = sequences(i,:);
for j = 1:length(sequence)
idx = strfind(protein_name_list, sequence(j));
resulting_bool = [resulting_bool ;protain(idx).bool_value];
end
M{i} = resulting_bool;
end
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Genomics and Next Generation Sequencing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!