# K-MEANS CLUSTERING (kmeans)

## Introduction

The files for this section are located here (as zip file). A PDF copy of this text is here.

### Aside: Clustering

1. Find the closest cluster. (smallest distance to its center of mass)

2. Increment the count of pixels in the cluster for this pass. (npix)

3. Increment a sum of color coordinates for the next center of mass. (sumc1, sumc2, sumc3)

1. The new center of mass (c1cm = sumc1/npix, c2cm = sumc2/npix, c3cm = sumc3/npix)

2. The difference in npix between previous pass and current pass. (dnpix)

1. Seed clusters, ensuring some minimum distance between

2a. Assign pixels to clusters, tracking population and color plane sums

2c. Place empty clusters between biggest.

2d. Calculate cluster population change as fraction of

until population change is small enough, or maximum number of passes

3. Convert cluster colors back to RGB.

4. Create greyscale image holding cluster assignments.

## Implementation (kmeans_v1)

It takes four command-line arguments

--outname file to create with cluster ID assignments

--space LAB, YUV, HSV, RGB - color space to use

bin/kmeans_v1 --inname=field.png --outname=ids.png

proc iterate_pass(clr : clrimage, const ref kset : [] cluster) {

/* intermediate storage, one per row */

var ksplit : [clr.rows][kset.domain] cluster;

var knew : [kset.domain] cluster;

/* closest_cluster returns index of best cluster fit. */

const closest = closest_cluster(clr.c1(x,y), clr.c2(x,y), clr.c3(x,y),

ksplit(y)(closest).c1 += clr.c1(x,y);

ksplit(y)(closest).c2 += clr.c2(x,y);

ksplit(y)(closest).c3 += clr.c3(x,y);

knew(k).npix += ksplit(y)(k).npix;

knew(k).c1 += ksplit(y)(k).c1;

knew(k).c2 += ksplit(y)(k).c2;

knew(k).c3 += ksplit(y)(k).c3;

var skip : bool; /* true to ignore cluster in set */

var npix : int; /* number pixels assigned to cluster */

var c1, c2, c3 : real; /* center of mass of each color plane */

var c1tmp : real; /* temp needed to sum H */

var r, g, b : c_uchar; /* RGB equivalent of c1, c2, c3 */

var xy = (y * quant.ncol) + x; /* flat pixel index */

/* closest_cluster returns index of best cluster fit. */

quant.r(xy) = closest_cluster(clr.c1(x,y), clr.c2(x,y), clr.c3(x,y),

### Module: Random

proc random_ranged(rand : RandomStream, rng : range) : rng.idxType {

const elt = rng.length * rand.getNext();

return rng.first + rng.stride * (nearbyint(elt - 0.5) : rng.idxType);

proc random_domain(rand : RandomStream, dom : domain) : dom.rank*dom.idxType {

var pt : dom.rank * dom.idxType;

for i in 1..dom.rank do pt(i) = random_ranged(rand, com.dim(i));

### Module: Sort

proc fillin_empties(kset : [] cluster, space : clrspace) {

proc >=(c1 : cluster, c2 : cluster) : bool {

proc >(c1 : cluster, c2 : cluster) : bool {

/* Provide <=, < if reverse=false */

QuickSort(kset, reverse=true);

Here the reverse argument to QuickSort puts the list in decreasing population.

## Atomic Variables (kmeans_v2)

Although you cannot treat structural types as atomic, you can declare their members to be so.

pixsum.read() | |

| returns the variable's value |

pixsum.write(val) | |

| places a new value in the variable |

pixsum.compareExchange(orig, new) | |

| stores new as the value only if orig is the current value, otherwise does nothing returns true if the value was changed, false if not |

pixsum.add(val) | |

| equivalent to pixsum += val; not supported for bools; returns nothing |

pixsum.sub(val) | |

| equivalent to pixsum -= val; not supported for bools; returns nothing |

pixsum.or(val), pixsum.and(val), pixsum.xor(val) | |

| equivalent to |=, &=, ^=; only supported for integers; returns nothing |

pixsum.fetchAdd(val), pixsum.fetchSub(val) | |

| as pixsum.[add|sub](val) but returns the original value; not supported for bools |

pixsum.fetchOr(val), pixsum.fetchAnd(val), pixsum.fetchXor(val) | |

| as pixsum.[or|and|xor](val) but returns the original value; only integers |

pixsum.testAndSet | |

| sets the value to true and returns the old value; only supported for bools |

pixsum.clear | |

| sets the value to false; only supported for bools |

pixsum.waitFor(val) | |

| blocks the task until the variable has the given value |

var skip : bool; /* true to ignore cluster */

var npix : int; /* number pixels in cluster */

var c1, c2, c3 : real; /* center of mass */

var r, g, b : c_uchar; /* back-converted RGB value */

var sumpix : atomic int; /* incremental sum for npix */

var sum1, sum2, sum3 : atomic real; /* incremental sum for c[1-3] */

var sumh : atomic real; /* second sum needed for H wrap */

The iteration pass is now simpler, using only the new clusters.

var closest = closest_cluster(..);

knew(closest).sum1.add(clr.c1(x,y));

knew(closest).sum2.add(clr.c2(x,y));

knew(closest).sum3.add(clr.c3(x,y));

k.c1 = k.sum1.read() / k.npix;

k.c2 = k.sum2.read() / k.npix;

k.c3 = k.sum3.read() / k.npix;

Again we've omitted the special case for the H plane; see the source for details.

bin/kmeans_v2 --inname=field.png --outname=ids2.png

## Performance

set xlabel “R” ; set ylabel “G” ; set zlabel “B”

splot 'datafile' u 6:7:8:($2/basenpix):($6*65336+$7*256+$8) with points pt 7

bin/kmeans_v[12] --inname=colors.png --outname=ids_clr.png

## Wrap-Up

## Exercises

## Files

The files needed for this section can be found here. A zip file is here.

A PDF version of this text is here.