How to use isoutlier based in a part of the data?
9 次查看(过去 30 天)
显示 更早的评论
Good morning everybody,
I have a vector of datas. Like this,
a =[0;0.0028;0.0002;0.0039;0.0061].
As you see, since the 4° element, the values start growing more until the end.
I was trying to determine a threshold to define the 4° and 5° elements as ouliers using 'isoutlier' function from Matlab. I did it. But I had to define a fixed 'ThresholdFactor'value using one of the methods the function has.
I would like the 4° and 5° values to being identified as outliers. Not based with all the vector datas, but because they are bigger than the 1°, 2° and 3° elements. I mean, I would like to find the outliers based on the backforward datas [0;0.0028;0.0002].
The vector I posted is an example. The size must be generic.
Can you help me?
P.S. (Actualized): As I said, depending of the data entries, my vectors gonna have different sizes. But in all cases, the phenomenum they represent, makes the vector values would be bigger at the end.
I can't find a way to define when the datas gonna be outliers since the vector will not always be the same. I need to generalize. So what I really need is to identify when the values start growing until reach the end. For instance, for my example, it would happen from the 4° position.
I hope I could explain better here.
采纳的回答
Mathieu NOE
2023-3-7
hello
why not using islocalmin ? seems to me what you want is to keep the first 3 points (corresponding to a local min)
a =[0;0.0028;0.0002;0.0039;0.0061];
id = find(islocalmin(a(1:end)));
a_keep = a(1:id)
plot(a)
hold on
plot(a_keep,'dr')
更多回答(4 个)
Antonios Dougalis
2023-3-7
Hi,
I am not sure if i got it right. You can simply index in the region you are intersted in your array 'a' when using isoutlier
A = [1:100] % make example array A
A(5) = 1000; % put at 5th index the value 1000
A(50) = 1000; % put at 50th index the value 1000
B = isoutlier(A); % will return both outliers in logical array at positions 5 to 50
C = isoutlier(A(1:10)) % will return the first outlier only at position 5
Fifteen12
2023-3-7
Your question is a little complex, as the definition of an outlier is not very well defined. For instance, in your vector, the second element a(2) is more than 10x larger than the following element a(3). Is it an outlier? Only you can really tell that. To tell if any generic element in a vector is an outlier you need to establish a clear definition of what you consider to be an outlier. The definition MATLAB uses for isoutlier (as the default option) is if the element is 3 standard deviations away from the median of the set, but you can change this definition using the method call.
It's a relatively simple task to deconstruct how isoutlier does this, which might help you in customizing your outlier approach.
a = [0;0.0028;0.0002;0.0039;0.0061]; %Sample vector
med = median(a); %Find the median
MAD = median(abs(a - med)); %Median Absolute Deviation: https://www.mathworks.com/help/matlab/ref/filloutliers.html#bvml247
dist = abs(a - MAD); %Distance from each element in a from the MAD
outliers = dist > 3*MAD; %boolean array where 1's indicate a number that was 3 MAD's away from the median
Using this method, none of the elements are outliers. But you can adjust the cutoff for a outlier and make it more sensitive. Hope this helps!
Les Beckham
2023-3-7
编辑:Les Beckham
2023-3-7
It seems like what you are wanting to do is to chop off the "increase at the end".
Here is one way to do that by searching backwards through a to find where it starts increasing.
a = [0; 0.0028; 0.0002; 0.0039; 0.0061; 0.0062]; % added an extra point to verify logic
last_index = 1 + numel(a) - find(diff(flip(a)) > 0) % find where a stops increasing at the end (working backwards)
plot(a)
hold on
plot(a(1:last_index),'r*')
grid on
Bruno Luong
2023-3-8
编辑:Bruno Luong
2023-3-8
Not sure, you are not better to describe what you want than most people; what you cann outlier seems to be point that violate the increasing trend:
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=diff(c);
i=find(d<0);
close all
plot(c); hold on; plot(i,c(i),'or',i+1,c(i+1),'*r')
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Preprocessing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!