Spectral Clustering
About a year ago, I was interested in mining RSS content. At the same time, I was taking a mathematical modeling class and opted to spend my final project playing with Spectral Clustering. Which is just a colorful name for taking some data and breaking it up into groups. With summer raging outdoors, I took the opportunity last night to revisit that concept in a more flippant way.
If you want to actually learn more about spectral clustering, my original class project page has a very brief overview and a set of links to some relevant papers. My source code for the RSS clusterer is also linked in at the bottom of the page along with an experiment I did to better understand some commonly used kernel functions.
The funny thing is that the whole time I was playing with the concept of clustering, I kept thinking about clusters of grapes and patches of wildflowers. I don't know if it was really that helpful to be thinking about flowers and grapes, but it did tend to annoy the math-heads in the class who casually use names like kernel k-means as if they were simply telling you their favorite brand of microwave popcorn. Those guys hate it when you are really imprecise with analogies and generalizations, and make up words like “eigen-vines” to describe how things really work. Which is reason enough to break out the flowers once again for this latest round of spectral clustering fun. When I first did the RSS clustering project, time was short and there was so much work in crunching the article text into numeric matrices that I didn't really get to explore as much as I would have liked. This time, I kept things simple doing the clustering in two dimensional space (instead of the hundred element vectors you get when dealing with text).
So long as you have a relatively modern Java runtime, you should be able to click on the graphic below to start the application. Clicking anywhere on the little floor will place a flower and it's color will change to reflect the group into which it was placed. You can also tweak two parameters, radius on the left and delta on the right. Look at my class project page to see how those actually affect the results. Or you can just change them and see first hand. And if you really get curious, grab the source code below and laugh at how messy it is.