Strategies for reducing calculation time: Finding values in a large array

Question

Joe Vinciguerra 2019-6-11

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/466585-strategies-for-reducing-calculation-time-finding-values-in-a-large-array

评论： Joe Vinciguerra 2019-6-12

I have multiple individual large arrays (each are as much as 1 million rows) making up a "complete dataset". Each has two columns. Column 1 has indentifying values (ID's), and column 2 has measurement values (Data). Each ID may be repeated an unknown number of times. I need to find each instance of each ID, calculating the mean of the Data for the IDs that repeat.

This code prodives and example of the raw data formatting for one such array, and outputs the expected results. However, speed is the issue, as the loop in Step 3 may be as large as 600,000 or more iterations for each array in the complete dataset.

% Step 1: representation of data format
RawData = [randi([1,3],10,1)/10,rand(10,1)];
% Step 2: Preallocate array with the unique IDs, sorted by default
UniqueData(:,1) = unique(RawData(:,1));
% Step 3: for each unique ID find the mean of the values matching that ID, storing results
for i = 1:length(UniqueData(:,1))
	UniqueData(i,2) = mean(RawData(RawData(:,1) == UniqueData(i,1),2));
end

Using timeit(), I found the above to be the fastest of the methods I tried, but it still takes around 2 hours to calculate Step 3 for one complete dataset (consisting of 10 such arrays).

I also tried replacing Step 3 with this:

UniqueData(:,2) = arrayfun(@(x) mean(RawData((RawData(:,1) == x),2)),UniqueData(:,1));

and this:

for i = 1:length(UniqueData(:,1))
    foo = RawData(RawData(:,1) == UniqueData(i,1),2);
    UniqueData(i,2) = mean(foo);
end

... without improved performance.

Is there a faster method for completing this calculation? I can't think of a method besides using a loop or arrayfun. Thanks.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jan 2019-6-11

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/466585-strategies-for-reducing-calculation-time-finding-values-in-a-large-array#answer_378767

编辑：Jan 2019-6-11

在 MATLAB Online 中打开

[uniqID, ~, index] = unique(RawData(:, 1));
avg    = accumarray(index, RawData(:, 2), [], @mean);
result = [uniqID, avg];

Do you have a C compiler installed? Then a small C code might be even faster. I could post it on demand.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Joe Vinciguerra 2019-6-12

WOW! What took 2 hours yesterday runs in 21 seconds. Thank you!

That looks like a useful little function; I'll have to read up more on it.

Please share your C code with me if you don't mind. I don't have any experience, but once I'm done prototyping in Matlab it will likely be migrated into C by a colleague .

Cheers.

请先登录，再进行评论。

Strategies for reducing calculation time: Finding values in a large array

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Strategies for reducing calculation time: Finding values in a large array

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论