How to remove the outliers
15 次查看(过去 30 天)
显示 更早的评论
I have a sequence data and I assumed there are some outliers which us plott in my excel in the red colour of shading. I attach the xfiles of my data.
I have a question about how function of the matlab can detect dan delete those data in the red shading.
If any one can help, I would be appreciated.
Thanks
0 个评论
采纳的回答
Steven Lord
2019-7-11
3 个评论
Jon
2019-7-11
Maybe you are running an old version of MATLAB that does not have the filloutliers function.
filloutliers was introduced in MATLAB version 2017A
What version of MATLAB are you running? To find out you can type the ver command.
In the future it is good to use the code button in the MATLAB answers toolbar for inserting code. That way it comes out nicely formatted and is easier to read, use and or copy.
更多回答(1 个)
Jon
2019-7-11
编辑:Jon
2019-7-11
Since you do not have filloutliers and rmoutliers in your version of MATLAB
I would first recommend updating to a more recent version of MATLAB if possible as there have been many advances since 2013.
If that is not possible, you can look at the documentation in the link that Steven provided.
It gives MATLAB's default definition of an outlier as:
Outliers are defined as elements more than three scaled MAD from the median. The scaled MAD is defined as c*median(abs(A-median(A))), where c=-1/(sqrt(2)*erfcinv(3/2)).
So you could easily implement this in your code. For example if you had a vectors x and y and you wanted to make a plot with the outliers removed you could do the following
isOutlier = abs(y) > -3/(sqrt(2)*erfcinv(3/2))*median(abs(y - median(y)))
plot(x(~isOutlier),y(~isOutlier))
I would recommend though implementing isOutlier as a small function, so you don't have to keep repeating this code.
Another simple way to remove outliers is to sort your data, using the sort command, and then removing the first and last n values from the sorted listed, where you choose n according to how conservative you want to be with the outlier removal. so for example, given vectors x and y and n = 5.
You could implement this with something like
n = 5;
[ySrt,iSrt] = sort(y)
iKeep = iSrt(n:length(y)-n)
plot(x(iKeep),y(iKeep))
Note that n/length(y) is the fraction of data that you are discarding as outliers at the top and the bottom of the sorted list. So you might want to choose n so that n/length(y) is approximately 0.025, and thus you would be keeping 100*( 1- 2*0.025) = 95% of your data and considering the other extremes as outlier.
This method although simple, of course assumes you usually have some outliers at the extremes, otherwise you are just throwing away good data even though it is at the lower and upper end of the sorted list.
2 个评论
Jon
2019-7-12
编辑:Jon
2019-7-12
Glad to hear it is working now. If you feel like the question is answered it would be good to "accept" it so that if someone else has the same issue they can see that there is an answer available. If you are still waiting to see if there other approaches then you should leave it open.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Preprocessing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!