How to improve efficiency when checking for unique combinations of table values?

2 次查看(过去 30 天)
Hello, I am trying to check each row in a sheet of data to see if it contains a combination of 2 values. For example, there might be a table like
Fruit | Color | related value
apple | blue | 3
apple | red | 8
banana | blue | 5
banana | yellow | 6
apple | blue | 16
pear | yellow | 7
I want to create a table with the first column as the top row and the second column as the first column, and with the values indicated by a corrdinate pair being the number of times that that particular pair appears.
For example:
apple | banana | pear
blue 2 1 0
red 1 0 1
yellow 0 1 1
I'm currently using code that looks like this
fileContents = zeros(size((colorNames,1),size(fruitNames,1));
for p = 1:size(uniqueFruitNames,1)
for q = 1:size(uniqueColorNames,1)
for k = 1:size(rawData,1)
if ((uniqueFruitNames(p) == rawData(k,1)) && (all((uniqueColorNames(q,:) == rawData(k,2))))
fileContents(q,p) = fileContents(q,p) + 1;
end
end
end
end
My problem is that the table that is referenced by my code contains upwards of 10 thousand values, and I have found that the above code take approximately 15 minutes to complete. The rawData sheet contains around 12 thousand rows and 70 columns. The uniqueFruitNames array is 8 x 1, and the uniqueColorNames array is aproximately 2000 x 1.
What can I do to make this more efficient?

采纳的回答

dpb
dpb 2021-6-15
tFruit=readtable('fruit.txt');
tFruit=tFruit(:,[1:2:end]);
tFruit.Properties.VariableNames={'Fruit','Color','Data'};
tFruit.Fruit=categorical(tFruit.Fruit);
tFruit.Color=categorical(tFruit.Color);
tFruit =
6×3 table
Fruit Color Data
______ ______ ____
apple blue 3
apple red 8
banana blue 5
banana yellow 6
apple blue 16
pear yellow 7
% The engine
>> histcounts2(double(tFruit.Color),double(tFruit.Fruit))
ans =
2 1 0
1 0 0
0 1 1
>>

更多回答(1 个)

Walter Roberson
Walter Roberson 2021-6-14
findgroups the two columns individually. Concatenate the groups into one array of two columns. Use those as the first parameter to accumarray and use 1 as the second parameter. The result will be a count array.

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by