Removing instantaneous jumps (outliers) from a time series data set

24 次查看(过去 30 天)
Hi All,
I have an array of time series data that has instantaneous jumps in it that need removing. The issue I have is that the time series represents cliff failures and therefore I need to establish a code that removes these jumps that are outliers. However, some of the jumps actually represent real cliff failures and are not anomalies. The actual matrix size is 60x262. I have included a snippet of the time series showing a 'jump'. Since I need the code to check across rows as these are associated with seperate transects, I assume this is best done using a 'for' loop however I am not entirely sure how I could do this. In the below example, I would need to remove the 10.9 and replace it with the mean of the values either side however there could be cases where there are 2-3 consecutive outliers that all need replacing. Any help would be greatly appreciated.
2.41595489732326 2.41031483654907 2.40801209924756 2.40993913128027 2.41582625997938
2.87743028179327 2.88058793933828 2.88454870497978 2.87492517755818 2.87925902569740
6.40769380488418 6.54033729047571 10.9192698256242 6.42335491352382 6.53564543320352
13.9915744275613 13.9347860460628 13.9070741204200 13.9481973397569 13.9297364372147
6.65790304043271 6.68776078855576 6.69467488849872 6.65418705323087 6.68234121374043

采纳的回答

Star Strider
Star Strider 2024-3-22
There are several functions to detect and remove outliers, depending on how you want to define them and deal with them.
Here is an example using the isoutlier function with your posted data —
A = [2.41595489732326 2.41031483654907 2.40801209924756 2.40993913128027 2.41582625997938
2.87743028179327 2.88058793933828 2.88454870497978 2.87492517755818 2.87925902569740
6.40769380488418 6.54033729047571 10.9192698256242 6.42335491352382 6.53564543320352
13.9915744275613 13.9347860460628 13.9070741204200 13.9481973397569 13.9297364372147
6.65790304043271 6.68776078855576 6.69467488849872 6.65418705323087 6.68234121374043];
x = 1:size(A,2);
Lm = isoutlier(A, 'median', 2) % Logical MAtrix
Lm = 5x5 logical array
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
showOutliers = A.*Lm % Return Detected Outliers
showOutliers = 5x5
0 0 0 0 0 0 0 0 0 0 0 0 10.9193 0 0 0 0 0 0 0 0 0 0 0 0
colsum = sum(showOutliers); % Create Vector By 'sum' Over Columns
Lv = colsum ~= 0; % Logical VEctor
figure
plot(x, A, 'DisplayName','Data')
hold on
plot(x(Lv), colsum(Lv), 'sg', 'DisplayName','Outliers')
hold off
grid
xlabel('x')
ylabel('y')
legend('Location','best')
axis('padded')
The function documentation has links to the other outlier functions within and at the end of the page.
.
  11 个评论
luke
luke 2024-3-26
I have gone through used what you suggested and it works great, one last thing is how do I return the row with the replaced values? When I use the code below it returns firstly the row with the outliers remove and then the logical array. I need to obtain the row with the outliers replaced. Sorry I am sure this is easy but I cant get the correct result. A provides the row with the outliers remove and CD provides the logical row.
[A,CD] = rmoutliers(transect_above, 'movmedian', 50, 'ThresholdFactor', 6);
Star Strider
Star Strider 2024-3-26
Thank you! I was hoping it would work on your other data sets, however I couldn’t be certain.
Use the second output, similar to what I used in the plot —
figure
plot(days(~CD), transect_25(~CD))
xlabel 'Time (Days)'
ylabel 'Recession of cliff'
title('Transect 25 With Outliers Removed')
axis('padded')
The ‘CD’ result is a logical vector that works like any other subscript, and has the positions of the outliers as true, so use the negated version (~CD, the ~ is the logical ‘not’ operator) to return the corrected vectors without the outliers.
If your data have each transect in its its own table (called ‘transect_25’ here), the addressing would be:
days = transect_25.days(~CD);
transect = transect_25.transect(~CD);
or equivalently:
transect_25_corrected = transect_25(~CD,:)
days = transect_25_corrected.days;
transect = transect_25_corrected.transect;
The second approach would also work if ‘transect_25’ is an (Nx2) array. instead of a table.:
transect_25_corrected = transect_25(~CD,:)
If ‘transect_25’ is instead a (2xN) array, the order of the subscripts is reversed:
transect_25_corrected = transect_25(:,~CD);
That should work.
.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Hypothesis Tests 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by