Finding cell array row indices based on numeric column values
10 次查看(过去 30 天)
显示 更早的评论
I have a large cell array keystrokes of approximate size 20000x4. Columns 1 and 3 each contain a char, while columns 2 and 4 each contain a double. For example:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{'l' } {[ 180]} {'e' } {[ 69]}
{'e' } {[300664]} {'|space|'} {[ 125]}
{'|space|'} {[ 62]} {'n' } {[2500]}
I want to find the row indices in keystrokes of occurrences of every unique combination of columns 1 and 3, where the value in column 2 is less than 100000 and the value in column 4 is less than 2000. My current code gives me the error "Undefined operator '<' for input arguments of type 'cell'.", and is shown below.
% Temporarily convert keystroke structure to a table due to unique() apparently not supporting combinations of cellarray columns.
uniqueDigraphsTable = unique(cell2table(keystrokes(:,[1 3])), 'rows');
uniqueDigraphs = table2cell(uniqueDigraphsTable);
for ii = 1:length(uniqueDigraphs)
% Find rows containing the current unique digraph
occurrenceIndices = find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & strcmp(keystrokes(:,3),
uniqueDigraphs{ii,2}) & keystrokes(:,2)<100000 & keystrokes(:,4)<2000);
...
end
Using keystrokes{:,4}<2000 gives me this error: "Error using <. Too many input arguments." Is there a simple (and perhaps prettier) way to find the indices?
1 个评论
Jan
2018-1-9
Prefer to post the input data such, that they can be used by copy&paste. Is keystrokes a nested cell:
kestrokes = { ...
{'l' } {[ 180]} {'e' } {[ 69]}; ...
{'e' } {[300664]} {'|space|'} {[ 125]}; ...
{'|space|'} {[ 62]} {'n' } {[2500]}}
or a cell:
kestrokes = { ...
'l', 180, 'e', 69; ...
'e', 300664, '|space|', 125; ...
'|space|', 62, 'n' 2500}
? Even typing this question need a lot of typing.
回答(2 个)
Guillaume
2018-1-9
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
[keystrokes{:,2}] < 100000 & ...
[keystrokes{:,4}] < 2000)
or
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
cell2mat(keystrokes(:,2)) < 100000 & ...
cell2mat(keystrokes(:,4)) < 2000)
In essence you have to transform your cell columns into numeric matrices.
Jan
2018-1-9
编辑:Jan
2018-1-9
The cell is not useful for these comparisons. Converting is to a table is the next indirection. Easier:
% Store strings in one cell string:
Strings = keystrokes(:, [1, 3]);
uStrings = unique(Strings, 'rows');
% Store numbers in a numerical array:
Values = cell2mat(keystrokes(:, [2, 4]));
% Move the check of the values out of the loop for performance:
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(strcmp(Strings(:,1), uStrings{ii, 1}) & ...
strcmp(Strings(:,2), uStrings{ii, 2}) & ...
match);
...
end
This would be faster, if you use the 2nd and 3rd output of unique() also:
[uStrings, iString, iUniq] = unique(Strings, 'rows');
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(iUniq == ii & match);
...
end
2 个评论
Guillaume
2018-1-10
Annoyingly, unique (and ismember) do not support the 'row' option with cell arrays even if it is a cell array of char arrays. If you have matlab R2016b or later, you can convert the cell array of char arrays into a string array which can be used with unique and the 'row' option:
unique(string(keystrokes(:, [1 3])), 'rows')
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Matrix Indexing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!