Problem 42485. Eliminate Outliers Using Interquartile Range

Given a vector with your "data" find the outliers and remove them.

To determine whether data contains an outlier:

  1. Identify the point furthest from the mean of the data.
  2. Determine whether that point is further than 1.5*IQR away from the mean.
  3. If so, that point is an outlier and should be eliminated from the data resulting in a new set of data.
  4. Repeat steps to determine if new data set contains an outlier until dataset no longer contains outlier.

IQR: Interquartile Range is the range between the median of the upper half and the median of the lower half of data: http://www.wikihow.com/Find-the-IQR

To find an outlier by hand:

Data: [ 53 55 51 50 60 52 ] we will check for outliers.

Sorted: [ 50 51 52 53 55 60 ] where the mean is 53.5 and 60 is the furthest away (60-53.5 > 53.5-50).

1.5 * IQR = 1.5 * (55-51) = 6

Since 60-53.5 = 6.5 > 6, 60 is an outlier.

New Data: [ 53 55 51 50 52 ] we will check for outliers.

New Data Sorted: [ 50 51 52 53 55 ] where the mean is 52.2 and 55 is the furthest away.

1.5* IQR = 1.5 * (54-50.5) = 4.5

Since 55-52.2 = 2.8 < 4.5, 55 is NOT an outlier.

Our original data had one outlier, which was 60.

Example:

Input data = [53 55 51 50 60 52]
Output new_data = [53 55 51 50 52]

since 60 is an outlier, it is removed

*Note: A number may be repeated within a dataset that is an outlier. You should not remove all instances, but remove only the first instance and check the new dataset to determine whether this number is still an outlier (see 5th test suite).*

Solution Stats

35.4% Correct | 64.6% Incorrect
Last Solution submitted on Oct 24, 2024

Problem Comments

Solution Comments

Show comments

Problem Recent Solvers25

Suggested Problems

More from this Author1

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!