قالب وردپرس درنا توس
Home / Tips and Tricks / How and why is the outlier function used in Excel?

How and why is the outlier function used in Excel?

An outlier is a value that is significantly higher or lower than most of the values ​​in your data. When analyzing data in Excel, outliers can distort the results. For example, the average mean of a record can truly reflect your values. Excel has some useful functions for managing your outliers. Let's take a look.

A quick example

In the figure below, the outliers are relatively easy to spot ̵

1; the value of two assigned to Eric and the value of 173 awarded to Ryan. In such a record, it is easy enough to manually detect and handle these outliers.

  Range of values ​​with outliers

This is not the case for larger data sets. Being able to identify the outliers and remove them from statistical calculations is important – and that's what this article describes.

How to find outliers in your data

How to find outliers in a dataset We use the following steps:

  1. Compute 1st and 3rd quartiles (we'll just talk about it a bit).
  2. Rate the interquartile range (we'll explain this a a bit further down).
  3. Enter the upper and lower limits of our data area.
  4. Use these boundaries to identify the outer data points.

The cell area to the right of the record shown in the figure below is used to store these values.

  Range for quartiles

Let's start.

First step: Calculate the quartiles

If you divide your data into quarters, each of these groups is called a quartile. The lowest 25% of the numbers in the range make up the 1st quartile, the next 25% the 2nd quartile and so on. We first take this step because the most common definition of an outlier is a data point that is more than 1.5 interquartile ranges (IQRs) below the 1st quartile and 1.5 interquartile ranges above the 3rd quartile. To determine these values, we must first find out what the quartiles are.

Excel provides a QUARTILE function for calculating quartiles. Two pieces of information are needed: Array and Quart.

  = QUARTILE (Array, Quart) 

Array is the range of values ​​that you evaluate. And quart is a number representing the quartile you wish to return (eg 1 for the 1st st st 2 for the 2nd quartile, etc. ).

Note In Excel 2010, Microsoft has released the QUARTILE.INC and QUARTILE.EXC functions as improvements to the QUARTILE function. QUARTILE is more backward compatible if you use multiple versions of Excel.

Let us return to our example table.

  Range for Quartiles

For calculating the 1st st quartile we can use the following formula in cell F2.

  = QUARTILE (C2: C14,1) 

When you enter the formula, Excel provides a list of options for the quart argument.

 Using the QUARTILE function [19659017] To calculate the 3 quarter, you can enter a formula like the previous one in cell F3, but a three instead of a one.

  = QUARTILE (C2: C14,3) [19659019] Now we have the quartile data points displayed in the cells. 

 1st and 3rd quartile values ​​

Second step: evaluation of the interquartile range

The interquartile range (or IQR) is the mean 50% of the values ​​in your data. It is calculated as the difference between the 1st quartile value and the 3rd quartile value.

We use a simple formula in cell F4 that subtracts the 1st st st quartile from the 3rd . rd quartile:

  = F3-F2 

Now we see our interquartile range.

 Interquartile value

3. Step: Return Lower and Upper Bounds [19659016] The lower and upper bounds are the smallest and largest values ​​of the data range we want to use. Any values ​​smaller or larger than these bound values ​​represent the outliers.

We calculate the lower limit in cell F5 by multiplying the IQR value by 1.5 and then subtracting it from the Q1 data point:

  = F2- (1.5 * F4) 

 Excel formula for lower value

Note: The parentheses in this formula are not required because of the multiplication part the subtraction part is calculated. However, they make it easier to read the formula.

To calculate the upper bound in cell F6, multiply the IQR again by 1.5, but this time add to the data point Q3: [19659018] = F3 + (1.5 * F4)

 Values ​​for the lower and upper bounds

Step Four: Identifying the Outliers

Now we have all of our underlying data set up to identify our outer data points – those lower than that lower limit or higher than the upper limit.

We will do it You can use the OR function to perform this logical test and display the values ​​that meet these criteria by entering the following formula in cell C2:

  = OR (B2 <$F$5,B2> $ F $ 6) 

 OR function to identify outliers

We then copy this value into our C3-C14 cells. A TRUE value indicates an outlier, and as you can see, we have two in our data:

Ignore the outliers when calculating the average

Use the QUARTILE function to specify the Value to calculate IQR and work with the most commonly used definition of an outlier. However, if you calculate the average mean for a range of values ​​and ignore outliers, you can use a faster and easier function. This technique will not identify an outlier as before, but it will allow us to be flexible with what we consider to be outliers.

The function we need is called TRIMMEAN, and you can see the syntax for it below: [19659018] = TRIMMEAN (array, percent)

The array is the range of values ​​you want to average. The percent is the percentage of data points to be excluded from the top and bottom of the data set (you can enter it as a percentage or as a decimal value).

We entered the following formula in cell D3 in our example to calculate the average and exclude 20% of the outliers.

  = TRIMMEAN (B2: B14, 20%) 

 TRIMMEAN formula for the average without outliers

There are two different functions for handling outliers. Whether you want to determine them for specific reporting requirements or exclude them from calculations such as averages, Excel has a feature that meets your needs.

Source link