How to remove outliers from exponentially weighted moving mean in real-time?

Question

Cai Chin 2020-11-9

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/642245-how-to-remove-outliers-from-exponentially-weighted-moving-mean-in-real-time

评论： Cai Chin 2020-11-24

I am using MATLAB R2020a on a MacOS. I am calculating an exponentially weighted moving mean using the dsp.MovingAverage function and am trying to remove vector elements in real-time based on 2 conditions - if the new element causes the mean to exceed 1.5 times the 'overall' mean so far, or if it is below 0.5 times the 'overall' mean so far.

In other words, the weighted mean with the current element is compared to the previous weighted mean, and if the current element causes the weighted mean to increase above 1.5 times the previous mean or go below 0.5 times the previous mean, then it should be ignored and the recursive equation is instead applied to the next element, and so on. In the end, I'd like to have a vector containing the outliers removed.

This is the function I am using to calculate the exponentially weighted moving mean:

movavgExp = dsp.MovingAverage('Method', 'Exponential weighting',    'ForgettingFactor', 0.4);
mean_cycle_period_exp = movavgExp(cycle_period_step_change);

I would very much appreciate any suggestions on how to tackle this, thanks in advance!

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

riccardo 2020-11-9

I haven't had time to check the calcs, but the error is likely due to a zero-indexing conditions in the loop:

>> for i = 2:lenght....

.....

if x(i) > 1.5*(1 - 1/w(i - 1))* >>x(i - 2)<< + (1/w(i - 1))*x(i - 1)

i-2 = 0 at the start

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

riccardo 2020-11-9

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/642245-how-to-remove-outliers-from-exponentially-weighted-moving-mean-in-real-time#answer_539260

I am not sure what you ar eaiming at, here:

>>

% Calculate moving mean with weights manually

x = zeros(length(cycle_period_step_change), 1);

x(1) = 2;

>>

"x" is an array of zeros, with a leading "2".

The plot you've shown is just the consequence of it, with a smooth transition from 2 -> 0, due to the averaging. Am I missing something ?

24 个评论
显示 22更早的评论隐藏 22更早的评论

riccardo 2020-11-10

编辑：riccardo 2020-11-10

Actually "mavCnd" was meant to be a scalar - no need for an array of temporary values (unless you wanted those as well).

As to the result, in case of outliers being removed this code would keep the moving average constant, even in the case of 2-3 successive values.

This would create plateaus of constant values, corresponding to the removed outliers.

From what you say, it seems you actually want to keep only the moving average points corresponding to acceptable data.

You could either post-process the moving average and "pluck out" the unwanted values (using the indexes of the removed outliers) or use a "while" cycle with a separate index "j" for the moving average array and discard the "tail" of leftovers once the cycle ends.

Just be careful that your time-axis must be sync'd - presumably.

i=2;

j=2;

while i<=Max

%%<code here> .....

if <OK>

%% <mavCnd is fine -> update x>

x(j) = mavCnd;

time(j) = <time of point "i">;

%% <next available moving average slot>

j = j+1;

else

%% <outlier>

%% <save outlier info out>

%% <ignore x and j>

end

i=i+1

end

Cai Chin 2020-11-10

Hi, thanks so much again for explaining. However, when I implemented the code, I encountered 2 issues:

The number of elements in the 'time' variable is not equal to that in the 'x' vector so the graph is not produced. When I use a 'pseudo' time variable with the same number of elements as x that simply counts up in equal increments, there are visible outliers still remaining in the weighted mean.
x still seems to have outliers in it even when the code to remove them is applied. For instance, this is the vector I get when I have implemented the code. I have underlined the values that seem erroneous

x = [0.0040, 1.8060, 0.5871, 0.8476, 0.9794, 1.8796, 2.5095, 0.6071, 0.7588, 0.7499, 0.6041, 0.6300, 0.7719, 0.7576, 0.7688, 0.6300, 0.6512, 0.8080, 0.7752, 0.7614, 0.6270, 0.7952, 0.7688, 0.7608, 0.6249, 0.7936, 0.7792, 0.7618, 0.616, 0.6496, 0.6882, 1.1490, 2.0880, 0.6568, 0.6522, 0.8050, 0.7860, 0.6198, 0.7468, 0.7466, 0.7742, 0.6496, 0.6688, 0.6672, 0.6632, 3.1392, 0.6720, 0.8366, 0.8098, 0.8052, 0.8094, 0.6592, 0.6616, 0.6672, 0.8186, 0.7906, 0.7816, 0.6282, 0.6402, 0.6514, 0.7918]

I just can't seem to work out what the problem is, thanks again!

Cai Chin 2020-11-11

Hi, thanks again for all you help. I have implemented the changes you suggested but now it gives a very strange distribution.

I'm also still a little confused over what the difference between what 'x' and 'j' is.

Also, by assigning 'cycle_periods' to x, has that affected the algorithm used to calculate the moving mean? The algorithm is this:

wN,λ = λwN−1,λ + 1

‾xN,λ =( 1−1wN,λ)‾xN−1,λ + (1wN,λ)xN

‾xN,λ — Moving average at the current sample
xN — Current data input sample
‾xN−1,λ — Moving average at the previous sample
λ — Forgetting factor
wN,λ — Weighting factor applied to the current data sample
(1−1wN,λ)‾xN−1,λ — Effect of the previous data on the average

The vector 'x' now seems to contain a lot of the first element of 'cycle_periods' as shown below:

x = [0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0050, 0.0040, 0.0040, 0.0050, 0.0040, 0.0050, 0.0030, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0050, 0.0050, 0.0050, 0.7830, 0.0040, 0.8010, 0.7720, 0.7560, 0.7800, 0.0050, 0.7960, 0.7840, 0.7600, 0.7690, 0.0040, 0.8110, 0.0040, 0.8590, 0.0050, 0.8350, 2.4050, 0.8200, 0.0040, 0.8140, 0.0050, 0.8090, 0.7890, 0.7740, 0.0030, 0.7480, 0.7420, 0.7650, 0.8110, 0.0040, 0.8350, 0.0040, 0.8330, 0.0040, 0.8280, 0.0040, 3.9230, 0.0040, 0.8390, 0.0040, 0.8430, 0.8110, 0.8050, 0.8060, 0.8230, 0.0040, 0.8260, 0.0040, 0.8330, 0.0040, 0.8250, 0.7930, 0.7810, 0.7840, 0.0050, 0.7990, 0.0050, 0.8130, 0.0050, 0.7910, 0.7950]

Thanks again

Cai Chin 2020-11-12

在 MATLAB Online 中打开

Hi, thanks again. I have executed the code again using your suggestion but again, the

% Calculate exponentially weighted moving mean manually whilst removing
% outliers in real-time
x = zeros(length(cycle_periods),1);  % array for exponentially weighted means
x (1) = cycle_periods(1);
i = 2; % index for counting through input signal
j = 2; % index for counting through accepted exponential mean values 
while i <= length(cycle_periods)
    mavCnd = (1 - 1/w(j))*x(j - 1) + (1/w(j))*cycle_periods(i);
      if mavCnd < 1.5*(x(j - 1)) && mavCnd > 0.5*(x(j - 1))  % Identify high and low outliers
        x(j) = mavCnd;
        cycle_index(j) = i;
        j = j + 1;
      else
        x(i) = x(i - 1);
      end
    i = i + 1;
end
% Scrap the tail
x(j - 1:end)=[];
x = [0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0042, 0.0042, 0.0041, 0.0043, 0.0043, 0.0044, 0.0041, 0.0041, 0.0041, 0.0041, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0042, 0.0044]

I just can't seem to understand why there are so many repeats of the first element..

Thanks again

Cai Chin 2020-11-19

在 MATLAB Online 中打开

Hi riccardo, I was wondering if you wouldn't mind helping me with another issue I am having? Instead of removing the outliers, I would like them to be replaced by the mean of the framing 2 accepted values (1 from the past and one from the future). I have tried to do this retrospectively by creating a new array whereby the unaccepted values are added as 'zero's which are then replaced by the next new accepted value comes in. If the previous value is zero, this is now replaced by the framing value mean. However, this does not account for the fact that there may be multiple unacceptable values in a row - it only accouunts for the zero of the previous value but not others before it. Any advice would be greatly appreciated, thanks!

% Calculate exponentially weighted moving mean and tau without outliers
accepted_means = zeros(length(cycle_periods_filtered),1); % array for accepted exponentially weighted means
accepted_means(1) = cycle_periods_filtered(1);
k = zeros(length(cycle_periods_filtered),1); % array for accepted raw cycle periods
m = zeros(length(cycle_periods_filtered), 1); % array for raw periods for all cycles with outliers replaced by mean of framing values
k(1) = cycle_periods_filtered(1); 
m(1) = cycle_periods_filtered(1);
tau = m/3; % pre-allocation for efficiency
i = 2; % index for counting through input signal
j = 2; % index for counting through accepted exponential mean values
n = 2; % index for counting through raw periods of all cycles
while i <= length(cycle_periods_filtered)
    mavCurrent = (1 - 1/w(j))*accepted_means(j - 1) + (1/w(j))*cycle_periods_filtered(i);
      if cycle_periods_filtered(i) < 1.5*(accepted_means(j - 1)) && cycle_periods_filtered(i) > 0.5*(accepted_means(j - 1))  % Identify high and low outliers
        accepted_means(j) = mavCurrent;
        k(j) = cycle_periods_filtered(i);
        m(n) = cycle_periods_filtered(i);
        cycle_index3(j) = i;
        tau(n) = m(n)/3;
        
        if m(n - 1) == 0
            m(n - 1) = (k(j) + k(j - 1))/2;
            tau(n - 1) = m(n)/3;
        end
        
        j = j + 1;
        n = n + 1; 
        
      else
        m(n) = 0;
        n = n + 1;     
      end   
    i = i + 1;
end

riccardo 2020-11-23

You should be clear with what your objective is.

Getting the raw data and obtaining the moving average array is data analysis/signal processing, because you do not "change" the data, although you may discard a subset of it. Here you look at the data to understand its behaviour.

Replacing any data point is not "analysis" any longer, it's "modeling": in this case you must be careful to avoid any unwanted artifacts/side-effects caused (potentially) when you introduce a "different" subset, in place of the outliers.

Unless you have a specific and well based reason (say a causal relationship, like in a first principle equation) to say "this data point should be corrected in this way", any other type of model can only be statistical in nature, that's why a post-processing approach would be safer (e.g. if you had reason to believe the data should be continuous/smooth/etc., you could apply a certain type of interpolator/spline to the moving average result array, in order to produce the "missing points" with the desired property).

riccardo 2020-11-24

In RT you cannot interpolate, but you could try extrapolating a "candidate outlier" point using the gradient based on the last 2 accepted average values.

The problem arise when you find several adjacent/consecutive outliers: in this case the latest 2 acceptable values may be "too far back".

Cai Chin 2020-11-24

Hi riccardo, I was wondering how this extrapolation could be done in code?

请先登录，再进行评论。

How to remove outliers from exponentially weighted moving mean in real-time?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

24 个评论
显示 22更早的评论隐藏 22更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to remove outliers from exponentially weighted moving mean in real-time?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

24 个评论 显示 22更早的评论隐藏 22更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

24 个评论
显示 22更早的评论隐藏 22更早的评论