.NET: Remove Data Outliers for Better Data

Remove data outliers using the Interquartile Range (IQR) method with logarithmic transformation; written in C#.
This code implements a statistical method for removing outliers from a dataset using the Interquartile Range (IQR) method, but with a logarithmic transformation.
  1. The function takes two parameters:
    • A list of records, where each record has some data and an associated numeric metric
    • An iqrMultiplier (defaulting to 1.5) which controls how aggressive the outlier removal is
  2. The code applies a log10 transformation to all metric values. This is useful when dealing with data that:
    • Has a wide range of values
    • Follows a log-normal distribution
    • Contains only positive numbers
  3. It then calculates three important statistical measures in log space:
    • Q1 (First quartile): The value at 25% of the sorted data
    • Q3 (Third quartile): The value at 75% of the sorted data
    • IQR (Interquartile Range): The difference between Q3 and Q1
  4. Using these measures, it calculates bounds for what constitutes an outlier:
    • Lower bound = Q1 - (iqrMultiplier × IQR)
    • Upper bound = Q3 + (iqrMultiplier × IQR)
  5. These bounds are then transformed back from log space to normal space using 10^x
  6. Finally, it filters the original dataset to keep only records whose metrics fall within these bounds

// Increase iqrMultiplier for more outliers and decrease for fewer outliers
public static List<(object data, double metric)> RemoveOutliers(List<(object data, double metric)> records, float iqrMultiplier = 1.5f)
	// Calculate Q1, Q3 and IQR for log-transformed prices
	var logs = records.Select(r => Math.Log10(r.metric)).OrderBy(p => p).ToList();
	int q1Index = logs.Count / 4;
	int q3Index = (3 * logs.Count) / 4;
	double logQ1 = logs[q1Index];
	double logQ3 = logs[q3Index];
	double logIqr = logQ3 - logQ1;

	// Filter out prices beyond [iqrMultiplier] IQRs from Q1 and Q3 in log space
	double logLowerBound = logQ1 - (iqrMultiplier * logIqr);
	double logUpperBound = logQ3 + (iqrMultiplier * logIqr);

	// Convert bounds back to normal space
	double lowerBound = Math.Pow(10, logLowerBound);
	double upperBound = Math.Pow(10, logUpperBound);

	var filtered = records.Where(r => r.metric >= lowerBound && r.metric <= upperBound).ToList();

	return filtered;


