rmoutliers function on quartiles method?

9 次查看(过去 30 天)
Hi everybody.. I wrote the following code:
%%% SECOND CLEANING by QUARTILES
%
cleanDatac2 = rmoutliers(cleanData(:,[1 3 5]), "quartiles"); % QUARTILES
size of cleanData is 709847
size of cleanDatac2 is 576736
So I lost
% 709847-576736 = 133111 elements.
is there a way so that I can remove a lower number of elements?
It seems in this way the functions is removing more than what I need.
  1 个评论
Rik
Rik 2024-1-24
Email message:
%{
Hi!
Thanks for reading.
I'm an italian student in MSc TLC Engineering and I am struggling with my Matlab code (rmoutliers function) for removal of outliers by quartiles since I am not getting the right graph.
I have the photo of the graph I should get but I am not able to get it by my data.
May someone help me?
Thanks in advance.
I would exchange also my knowledges if someone would need ..
Hope to hear about you soon!
I'm really struggling for many days for this problem!
Thanks very much in advance
%}
Since you accepted the answer below, I'm presuming your question is solved. If not, have a read here and here. It will greatly improve your chances of getting an answer. It is fine to send people a link asking them to have a look at a specific thread, but direct contact out of the blue via email is generally not welcome.

请先登录,再进行评论。

采纳的回答

Steven Lord
Steven Lord 2024-1-20
You could try to use a different method of determining what is an outlier for your particular data set rather than the "quartiles" method. The section of the rmoutliers documentation page describing the method input argument lists a number of different options that use different criteria for what is an outlier.
Or if you are required by some constraint (homework assignment, your customers and/or management staff insisting on it, etc.) to use the "quartiles" method and/or think that it's in principle the right approach, you could specify the ThresholdFactor name-value argument to determine how many interquartile ranges data has to be above the upper quartile or below the lower quartile before the data points are considered outliers. The default is 1.5.
Though you're only operating on three of the (at least) five columns of cleanData -- are you sure you're not overestimating the number of elements removed? What sizes (not number of elements) are cleanData and cleanDatac2? How many elements does the original data on which you operated have? In this case that would be cleanData(:, [1 3 5]).
  4 个评论
Steven Lord
Steven Lord 2024-1-24
No, please don't contact me directly. If you need official Technical Support help you could contact them directly using this link.
But only you have your data. Only you know where it came from. That's information that is likely useful or necessary to determine why so much of your data seems to fall outside 1.5 IQRs from the first or third quartile.
517541/709847
ans = 0.7291
You kept about 73% of the rows in your data, meaning about 27% of the rows contained one or more outliers.
What happens if you run isoutlier or rmoutliers on each column in cleanData individually? Perhaps two of the columns that you processed have no outliers and something happened while collecting the third column of data that rended it complete junk / noise / garbage. Or perhaps each column had 9% outliers and there happened to be no overlaps, meaning each column caused a different 9% of the rows to be removed.
Giuseppe Zumbo
Giuseppe Zumbo 2024-1-24
@Steven Lord Yes, of course running isoutlier on each column in cleanData individually I would have a minor loss of data but then I would have different lengths of the single vectors... and so I couldn't plot the entire variables or not? ( I mean , i can use plot or scatter only if I have the vectors with the same length

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Descriptive Statistics 的更多信息

产品


版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by