Find columns of a Matrix (n x m) that fit best to a Vector (n x 1)

1 次查看(过去 30 天)
Hello,
I have a Matrix that consists of 24 x 365 values (the hourly elctricity consumption of one year). Now I want to find a set of similar data of m days, that fits best to my forecast vector (24 x 1). So I want to generate a Matrix (24 x m) for m days. For example if I want to find the 6 best fitting days of the year, the new Matrix should have 24 x 6 values.
How could that be implemented? Are there any predefined functions?
greetings
Alex

采纳的回答

John D'Errico
John D'Errico 2019-3-28
编辑:John D'Errico 2019-3-28
You need first to decide what it means to "fit best". For example, I'll just make up some data here.
data = rand(3,20);
>> data'
ans =
0.075854 0.05395 0.5308
0.77917 0.93401 0.12991
0.56882 0.46939 0.011902
0.33712 0.16218 0.79428
0.31122 0.52853 0.16565
0.60198 0.26297 0.65408
0.68921 0.74815 0.45054
0.083821 0.22898 0.91334
0.15238 0.82582 0.53834
0.99613 0.078176 0.44268
0.10665 0.9619 0.0046342
0.77491 0.8173 0.86869
0.084436 0.39978 0.25987
0.80007 0.43141 0.91065
0.18185 0.2638 0.14554
0.13607 0.86929 0.5797
0.54986 0.14495 0.85303
0.62206 0.35095 0.51325
0.40181 0.075967 0.23992
0.12332 0.18391 0.23995
As you can see, I transposed data to display it, so it will be easier to read.
Now let us pretend that I have 20 daya worth of data I have sampled 3 times per day and I want to find the subset of 3 days among those 20 days that are the best fit to some prototypical day in the variable target:
target = [0.4; 0.5; 0.6]
target =
0.4
0.5
0.6
I'll use some tools below that come from more recent MATLAB releases. So if the code I wrote does not work for you, then tell me what release you have, and I'll explain how to fix it to work in older releases. (Best is if you always say what release you are using.)
What we need to decide now is how to measure how different any specific daily data is from your target. The simplest might be to compute the sum of the absolute value of the differences. Then find the two smallest such sums, and report which they were.
[err,ind] = mink(sum(abs(data - target),1),3)
err =
0.45785 0.49309 0.55167
ind =
18 6 5
So, if you look at the one liner computation above, it takes the difference, then tha absolute value, adds them all up for each column of data, and finally, looks to see which were the 3 smallest such sums. Days 18, then 6, then 5 were the closest by that measure.
[target, data(:,ind)]
ans =
0.4 0.62206 0.60198 0.31122
0.5 0.35095 0.26297 0.52853
0.6 0.51325 0.65408 0.16565
To be honest, the first column shown here does not seem like that great of a match to the others. But by the above measure, those 3 days were the best "fit".
Alternatively, you might care only about the square root of the sum of squares of those differences. This tends to emphasize the larger differences as important, and it is a rather classic way to look at such a fit.
[err,ind] = mink(sqrt(sum((data - target).^2,1)),3)
err =
0.28116 0.31608 0.39474
ind =
18 6 4
As you can see, the 3rd best such day is now a different choice.
[target, data(:,ind)]
ans =
0.4 0.62206 0.60198 0.33712
0.5 0.35095 0.26297 0.16218
0.6 0.51325 0.65408 0.79428
Finally, what we might care about could just be the largest difference. That too is easy to locate.
[err,ind] = mink(max(abs(data - target),[],1),3)
err =
0.22206 0.23703 0.28921
ind =
18 6 7
>> [target, data(:,ind)]
ans =
0.4 0.62206 0.60198 0.68921
0.5 0.35095 0.26297 0.74815
0.6 0.51325 0.65408 0.45054
Again, the first two days found were the very best days, but the 3rd best day again changed.
So, if your data is really 24x365, and your forecast is 24x1, and you want to find the best 6 days, then the solution just depends on which metric you need to use.
The 1-norm (sum of absolute values)
nbest = 6;
[err,ind] = mink(sum(abs(data - target),1),nbest);
The 2-norm (sum of squares)
nbest = 6;
[err,ind] = mink(sqrt(sum((data - target).^2,1)),nbest);
The infinity-norm (maximum of absolute values)
nbest = 6;
[err,ind] = mink(max(abs(data - target),[],1),nbest);
Your choice.

更多回答(2 个)

Andrei Bobrov
Andrei Bobrov 2019-3-28
% Let A - your array (24 x 365)
% B - your forecast vector (24 x 1)
m = 6;
ii = sqrt(sum((A - B).^2));
[~,ij] = mink(ii,m);
out = A(:,ij);

KSSV
KSSV 2019-3-28
编辑:KSSV 2019-3-28
Read about ismember.

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by