PROWAREtech

articles » current » dot-net » kmeans-data-clustering

.NET: K-Means Clustering Algorithm

Clustering code designed to group numerical data into bands or ranges; written in C#.

This code implements a K-means clustering algorithm specifically designed to group numerical data (like prices) into bands or ranges. It works in logarithmic space to better handle data that may be exponentially distributed, which is common with prices and financial data.

The function takes a list of records (each containing some data and a metric value) and a parameter kBands (defaulting to 5) which determines how many price bands or clusters to create. It first transforms all the metric values into logarithmic space, which helps handle the wide range of values more effectively. The algorithm then initializes cluster centers (centroids) evenly spread between the minimum and maximum log values.

The core of the algorithm is an iterative process that runs up to 100 times or until convergence. In each iteration, it performs two main steps: first, it assigns each data point to its nearest centroid based on absolute distance. Then, it updates each centroid to be the average of all points assigned to it. This process continues until the centroids stop moving significantly (convergence) or the maximum iterations are reached. This implementation uses a fixed random seed (42) for reproducibility.

After clustering in log space, the code converts the results back to normal space and computes the actual minimum and maximum values for each cluster. It creates bands based on these ranges and sorts them by minimum value. Finally, it prints out each band's range and the count of items in that band before returning the list of bands. Each band is represented as a tuple containing its minimum and maximum values. This kind of clustering can be particularly useful for creating price brackets, salary bands, or any other numerical categorization where you want to group similar values together while accounting for exponential distribution patterns.


public static List<(double min, double max)> FindClustersKMeans(List<(object data, double metric)> records, int kBands = 5)
{
	var metrics = records.Select(r => r.metric).ToList();
	var logs = metrics.Select(p => Math.Log(p)).ToList();

	// Initialize centroids
	var random = new Random(42);
	var centroids = new List<double>();
	double min = logs.Min();
	double max = logs.Max();
	double step = (max - min) / (kBands - 1);

	for (int i = 0; i < kBands; i++)
		centroids.Add(min + i * step);

	// K-means iterations
	for (int iter = 0; iter < 100; iter++)
	{
		// Assign points to nearest centroid
		var clusters = new List<List<double>>();
		for (int i = 0; i < kBands; i++)
			clusters.Add(new List<double>());

		foreach (var price in logs)
		{
			int nearestCentroid = 0;
			double minDist = double.MaxValue;

			for (int i = 0; i < kBands; i++)
			{
				double dist = Math.Abs(price - centroids[i]);
				if (dist < minDist)
				{
					minDist = dist;
					nearestCentroid = i;
				}
			}

			clusters[nearestCentroid].Add(price);
		}

		// Update centroids
		bool changed = false;
		for (int i = 0; i < kBands; i++)
		{
			if (clusters[i].Any())
			{
				double newCentroid = clusters[i].Average();
				if (Math.Abs(newCentroid - centroids[i]) > 0.0001f)
				{
					changed = true;
					centroids[i] = newCentroid;
				}
			}
		}

		if (!changed)
			break;
	}

	// Convert back from log space and find min/max for each band
	var bands = new List<(double min, double max)>();
	var assignments = new int[metrics.Count];

	for (int i = 0; i < metrics.Count; i++)
	{
		double logPrice = Math.Log(metrics[i]);
		int nearestCentroid = 0;
		double minDist = double.MaxValue;

		for (int j = 0; j < kBands; j++)
		{
			double dist = Math.Abs(logPrice - centroids[j]);
			if (dist < minDist)
			{
				minDist = dist;
				nearestCentroid = j;
			}
		}
		assignments[i] = nearestCentroid;
	}

	// Create bands based on actual price ranges in each cluster
	for (int i = 0; i < kBands; i++)
	{
		var clusterPrices = metrics.Where((p, idx) => assignments[idx] == i);
		if (clusterPrices.Any())
		{
			bands.Add((
				min: clusterPrices.Min(),
				max: clusterPrices.Max()
			));
		}
	}

	// Sort bands by min price
	bands = bands.OrderBy(b => b.min).ToList();

	return bands;
}

Comment

.NET: K-Means Clustering Algorithm

.

..

tutorial

.NET: About the System.Security.Cryptography Namespace

.NET: Access the Pixels of an Image Using SixLabors.ImageSharp v3.x

.NET: Add Comment to XML Document

.NET: Alpha Compositing Algorithm

.NET: Call an External Function Using DllImport in C#

.NET: CNN v1.0 for Supervised Deep Learning Example

.NET: CNN v2.0 for Supervised Deep Learning Example

.NET: Common Activation Functions with Their Derivatives in C#

.NET: ConcurrentBag - What is It and How to Use It?

.NET: ConcurrentDictionary - What is It and How to Use it?

.NET: Convert a Binary Loss to a Percent Accuracy

.NET: Convert a Softmax Loss to a Percentage Accuracy

.NET: Convert Dictionary Keys and Values to a List

.NET: Convert Double Array to Byte Array

.NET: Convert Float Array to Byte Array

.NET: Convert Google's Gemini Markdown Text to HTML

.NET: Convert Int Array to Byte Array

.NET: Convert Number to Binary String

.NET: Copy Console Output to a Log File

.NET: Create, Read-from and Write-to the Process Class

.NET: Crop Image to Square using SixLabors.ImageSharp v3.x

.NET: Double to String Conversion Without Scientific Notation

.NET: Download any Type of File Data from a URL with HttpClient

.NET: Extract the Page Title from HTML

.NET: Find Keywords in Text using Regex

.NET: Find the Index Value When Using LINQ Select()

.NET: Globally Unique Identifiers

.NET: Google Generative AI Library

.NET: Hash-based Message Authentication Code

.NET: How to Create WordPress Slugs

.NET: How to Randomize the Order of a List and Optional Parallel List

.NET: Identify Brotli Compressed Files

.NET: Identify GZip Compressed Files

.NET: Image Utility for SixLabors.ImageSharp v1.0

.NET: Image Utility for SixLabors.ImageSharp v3.x

.NET: Inverted Sigmoid Decay Function

.NET: Is a Number a Power of Two (2)

.NET: Join/Merge Two List, Array or Enumerable (IEnumerable) Objects

.NET: K-Means Clustering Algorithm

.NET: Kill or End a Process

.NET: Lazy Load Data

.NET: Machine Learning, Unsupervised Learning or Clustering, K-mean / Silhouette Clustering Library

.NET: Memory-Mapped Files

.NET: Mersenne Twister Random Number Generation

.NET: Multi-threading with Tasks

.NET: Neural Network, Supervised Deep Machine Learning Example in C#

.NET: Nvidia CUDA for CSharp (v1.0.1 - Linux Edition)

.NET: Nvidia CUDA for CSharp (v1.0.1)

.NET: Operating System Detection

.NET: Power Status - AC/DC

.NET: Remove Data Outliers for Better Data

.NET: Retrieve the Executable's Path and Directory

.NET: Reverse Geocode

.NET: Sort a List of Objects with a Delegate Function

.NET: Strip/Remove HTML SCRIPT Tags from Text Using Regex

.NET: Strip/Remove HTML Tags from Text Using Regex

.NET: The Random Class is Not Thread-safe!

.NET: Use Callback Function with Windows API

.NET: Using Brotli to Compress and Decompress Data

.NET: Using GZip to Compress and Decompress Data

.NET: What's New or Changed in C# 6

.NET: What's New or Changed in C# 7

.NET: Working with Arrays

.NET: Working with Dates

.NET: Working with Strings

.NET: Working with Threads

.NET: XmlSerializer Example