PROWAREtech

articles » current » dot-net » remove-outliers-iqr

.NET: Remove Data Outliers for Better Data

Remove data outliers using the Interquartile Range (IQR) method with logarithmic transformation; written in C#.
This code implements a statistical method for removing outliers from a dataset using the Interquartile Range (IQR) method, but with a logarithmic transformation.
  1. The function takes two parameters:
    • A list of records, where each record has some data and an associated numeric metric
    • An iqrMultiplier (defaulting to 1.5) which controls how aggressive the outlier removal is
  2. The code applies a log10 transformation to all metric values. This is useful when dealing with data that:
    • Has a wide range of values
    • Follows a log-normal distribution
    • Contains only positive numbers
  3. It then calculates three important statistical measures in log space:
    • Q1 (First quartile): The value at 25% of the sorted data
    • Q3 (Third quartile): The value at 75% of the sorted data
    • IQR (Interquartile Range): The difference between Q3 and Q1
  4. Using these measures, it calculates bounds for what constitutes an outlier:
    • Lower bound = Q1 - (iqrMultiplier × IQR)
    • Upper bound = Q3 + (iqrMultiplier × IQR)
  5. These bounds are then transformed back from log space to normal space using 10^x
  6. Finally, it filters the original dataset to keep only records whose metrics fall within these bounds

// Increase iqrMultiplier for more outliers and decrease for fewer outliers
public static List<(object data, double metric)> RemoveOutliers(List<(object data, double metric)> records, float iqrMultiplier = 1.5f)
{
	// Calculate Q1, Q3 and IQR for log-transformed prices
	var logs = records.Select(r => Math.Log10(r.metric)).OrderBy(p => p).ToList();
	int q1Index = logs.Count / 4;
	int q3Index = (3 * logs.Count) / 4;
	double logQ1 = logs[q1Index];
	double logQ3 = logs[q3Index];
	double logIqr = logQ3 - logQ1;

	// Filter out prices beyond [iqrMultiplier] IQRs from Q1 and Q3 in log space
	double logLowerBound = logQ1 - (iqrMultiplier * logIqr);
	double logUpperBound = logQ3 + (iqrMultiplier * logIqr);

	// Convert bounds back to normal space
	double lowerBound = Math.Pow(10, logLowerBound);
	double upperBound = Math.Pow(10, logUpperBound);

	var filtered = records.Where(r => r.metric >= lowerBound && r.metric <= upperBound).ToList();

	return filtered;
}

PROWAREtech

Hello there! How can I help you today?
Ask any question

PROWAREtech

This site uses cookies. Cookies are simple text files stored on the user's computer. They are used for adding features and security to this site. Read the privacy policy.
ACCEPT REJECT