How to use isoutlier based in a part of the data?

9 次查看(过去 30 天)
Good morning everybody,
I have a vector of datas. Like this,
a =[0;0.0028;0.0002;0.0039;0.0061].
As you see, since the 4° element, the values start growing more until the end.
I was trying to determine a threshold to define the 4° and 5° elements as ouliers using 'isoutlier' function from Matlab. I did it. But I had to define a fixed 'ThresholdFactor'value using one of the methods the function has.
I would like the 4° and 5° values to being identified as outliers. Not based with all the vector datas, but because they are bigger than the 1°, 2° and 3° elements. I mean, I would like to find the outliers based on the backforward datas [0;0.0028;0.0002].
The vector I posted is an example. The size must be generic.
Can you help me?
P.S. (Actualized): As I said, depending of the data entries, my vectors gonna have different sizes. But in all cases, the phenomenum they represent, makes the vector values would be bigger at the end.
I can't find a way to define when the datas gonna be outliers since the vector will not always be the same. I need to generalize. So what I really need is to identify when the values start growing until reach the end. For instance, for my example, it would happen from the 4° position.
I hope I could explain better here.
  5 个评论
Mariana
Mariana 2023-3-7
Thank you, Antonios
I'm gonna try it and back here to comment how it goes.
Mariana
Mariana 2023-3-7
编辑:Mariana 2023-3-7
The method does not work for the following vector
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113].
But it helped me to solve other problems.
Thanks a lot.

请先登录,再进行评论。

采纳的回答

Mathieu NOE
Mathieu NOE 2023-3-7
hello
why not using islocalmin ? seems to me what you want is to keep the first 3 points (corresponding to a local min)
a =[0;0.0028;0.0002;0.0039;0.0061];
id = find(islocalmin(a(1:end)));
a_keep = a(1:id)
a_keep = 3×1
0 0.0028 0.0002
plot(a)
hold on
plot(a_keep,'dr')
  2 个评论
Mariana
Mariana 2023-3-7
编辑:Mariana 2023-3-7
This worked.
I tried with another vector,
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=islocalmin(c);
resul=[c d] % I can see the local min. I'm interested in the last local min.
d=find(d); % I get the local min positions
d=d(end); % Getting the last local min position
threshold=max(c(1:d)); % Is the threshold i was looking for in general way
Thank you very much all of you.

请先登录,再进行评论。

更多回答(4 个)

Antonios Dougalis
Hi,
I am not sure if i got it right. You can simply index in the region you are intersted in your array 'a' when using isoutlier
A = [1:100] % make example array A
A(5) = 1000; % put at 5th index the value 1000
A(50) = 1000; % put at 50th index the value 1000
B = isoutlier(A); % will return both outliers in logical array at positions 5 to 50
C = isoutlier(A(1:10)) % will return the first outlier only at position 5
  1 个评论
Mariana
Mariana 2023-3-7
Hi Antonio,
Thank you for your answer.
I just explained the problem better in the first comment above.

请先登录,再进行评论。


Fifteen12
Fifteen12 2023-3-7
Your question is a little complex, as the definition of an outlier is not very well defined. For instance, in your vector, the second element a(2) is more than 10x larger than the following element a(3). Is it an outlier? Only you can really tell that. To tell if any generic element in a vector is an outlier you need to establish a clear definition of what you consider to be an outlier. The definition MATLAB uses for isoutlier (as the default option) is if the element is 3 standard deviations away from the median of the set, but you can change this definition using the method call.
It's a relatively simple task to deconstruct how isoutlier does this, which might help you in customizing your outlier approach.
a = [0;0.0028;0.0002;0.0039;0.0061]; %Sample vector
med = median(a); %Find the median
MAD = median(abs(a - med)); %Median Absolute Deviation: https://www.mathworks.com/help/matlab/ref/filloutliers.html#bvml247
dist = abs(a - MAD); %Distance from each element in a from the MAD
outliers = dist > 3*MAD; %boolean array where 1's indicate a number that was 3 MAD's away from the median
Using this method, none of the elements are outliers. But you can adjust the cutoff for a outlier and make it more sensitive. Hope this helps!
  1 个评论
Mariana
Mariana 2023-3-7
Jhon,
Thank you very much for your help. I already had read the definition of the 'outlier' function, and my problem is that my vector changes as I change the system I'm approaching. But always, this generic vector a, will increase their values at the end. So I can't define one threshold, I need the thresholding changing as the data entries change.
I detailed better in the first comment above.

请先登录,再进行评论。


Les Beckham
Les Beckham 2023-3-7
编辑:Les Beckham 2023-3-7
It seems like what you are wanting to do is to chop off the "increase at the end".
Here is one way to do that by searching backwards through a to find where it starts increasing.
a = [0; 0.0028; 0.0002; 0.0039; 0.0061; 0.0062]; % added an extra point to verify logic
last_index = 1 + numel(a) - find(diff(flip(a)) > 0) % find where a stops increasing at the end (working backwards)
last_index = 3
plot(a)
hold on
plot(a(1:last_index),'r*')
grid on
  2 个评论
Mariana
Mariana 2023-3-7
Hi Les,
I think this could solve my problem too. I'm gonna try it and back here!
Thank you so much.
Mariana
Mariana 2023-3-7
Les,
It does not work with the following vector,
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];

请先登录,再进行评论。


Bruno Luong
Bruno Luong 2023-3-8
编辑:Bruno Luong 2023-3-8
Not sure, you are not better to describe what you want than most people; what you cann outlier seems to be point that violate the increasing trend:
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=diff(c);
i=find(d<0);
close all
plot(c); hold on; plot(i,c(i),'or',i+1,c(i+1),'*r')

类别

Help CenterFile Exchange 中查找有关 Data Preprocessing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by