Aug 05 2006
Kelly Norton

About a year ago, I was interested in mining RSS content. At the same time, I was taking a mathematical modeling class and opted to spend my final project playing with Spectral Clustering. Which is just a colorful name for taking some data and breaking it up into groups. With summer raging outdoors, I took the opportunity last night to revisit that concept in a more flippant way.

If you want to actually learn more about spectral clustering, my original class project page has a very brief overview and a set of links to some relevant papers. My source code for the RSS clusterer is also linked in at the bottom of the page along with an experiment I did to better understand some commonly used kernel functions.

The funny thing is that the whole time I was playing with the
concept of clustering, I kept thinking about clusters of grapes and
patches of wildflowers. I don't know if it was really that helpful
to be thinking about flowers and grapes, but it did tend to annoy
the math-heads in the class who casually use names like kernel
k-means as if they were simply telling you their favorite brand of
microwave popcorn. Those guys hate it when you are really imprecise
with analogies and generalizations, and make up words like
“eigen-vines” to describe how things *really*
work. Which is reason enough to break out the flowers once again for
this latest round of spectral clustering fun. When I first did the RSS
clustering project, time was short and there was so much work in
crunching the article text into numeric matrices that I didn't
really get to explore as much as I would have liked. This time, I
kept things simple doing the clustering in two dimensional space
(instead of the hundred element vectors you get when dealing with
text).

So long as you have a relatively modern Java runtime, you should be able to click on the graphic below to start the application. Clicking anywhere on the little floor will place a flower and it's color will change to reflect the group into which it was placed. You can also tweak two parameters, radius on the left and delta on the right. Look at my class project page to see how those actually affect the results. Or you can just change them and see first hand. And if you really get curious, grab the source code below and laugh at how messy it is.

0