
Numeric values: If there are values that are known to be outside of the expected range of values, these can be used to indicate missing values.Code for missing dataĪt times, when values are unknown, the person entering the data might use a value to indicate this. Two potential sources are missing data and errors in data entry or recording. There are different potential sources for these “incorrect values”. One of the potential sources for outliers in our data are values that are not correct. One of the reasons we want to check for outliers is to confirm the quality of our data. Why is Finding Outliers Important? Ensure Data Quality


For example, when measuring blood pressure, your doctor likely has a good idea of what is considered to be within the normal blood pressure range. Sometimes, the typical ranges of a value are known. As a result, there are a number of different methods that we can use to identify them. There is not a hard and fast rule about how much a data point needs to differ to be considered an outlier. But at other times it can reveal insights into special cases in our data that we may not otherwise notice.įor example, in our names data above, perhaps the reason that Jane is found so many more times than all the other names is because it has been used to capture missing values(ie Jane Doe). Sometimes outliers might be errors that we want to exclude or an anomaly that we don’t want to include in our analysis.

An outlier is a value or point that differs substantially from the rest of the data.
