leaguekvm.blogg.se

Sparknotes outliers
Sparknotes outliers













Numeric values: If there are values that are known to be outside of the expected range of values, these can be used to indicate missing values.Code for missing dataĪt times, when values are unknown, the person entering the data might use a value to indicate this. Two potential sources are missing data and errors in data entry or recording. There are different potential sources for these “incorrect values”. One of the potential sources for outliers in our data are values that are not correct. One of the reasons we want to check for outliers is to confirm the quality of our data. Why is Finding Outliers Important? Ensure Data Quality

sparknotes outliers

  • Distance from the interquartile range by a multiple of the interquartile rangeįor the purposes of our exploration, we’re going to use the interquartile range, but for more information about using the mean and the standard deviation, you can check out this article.
  • Distance from the mean in standard deviations.
  • There are two common statistical indicators that can be used: We define a measurement for the “center” of the data and then determine how far away a point needs to be to be considered an outlier. When using statistical indicators we typically define outliers in reference to the data we are using. If you identify points that fall outside this range, these may be worth additional investigation. It might be the case that you know the ranges that you are expecting from your data. In this case, “outliers”, or important variations are defined by existing knowledge that establishes the normal range. As a result, they may advise some course of action. If they were looking at the values above, they would identify that all of the values that are highlighted orange indicate high blood pressure.

    sparknotes outliers

    For example, when measuring blood pressure, your doctor likely has a good idea of what is considered to be within the normal blood pressure range. Sometimes, the typical ranges of a value are known. As a result, there are a number of different methods that we can use to identify them. There is not a hard and fast rule about how much a data point needs to differ to be considered an outlier. But at other times it can reveal insights into special cases in our data that we may not otherwise notice.įor example, in our names data above, perhaps the reason that Jane is found so many more times than all the other names is because it has been used to capture missing values(ie Jane Doe). Sometimes outliers might be errors that we want to exclude or an anomaly that we don’t want to include in our analysis.

    sparknotes outliers

    An outlier is a value or point that differs substantially from the rest of the data.















    Sparknotes outliers