Views
Possible Mentors

Jamie Bullock
Description
Clustering is the assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters. Often similarity is assessed according to a distance measure. Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. The purpose of this project is to develop a set of tools in the form of Pd patches and abstractions that make various clustering techniques available in Pd. This could include (but not limited to) kMeans clustering, Principal Components Analysis and Multidimensional Scaling analysis.
Related projects

Pd DataViz: a library of Gem, Pd and Pd data structures abstractions for data visualization

ArtificialNeuralNetworksLibrary: extending Pd's neural network externals with new algorithms and recurrent networks
Resources to start:

http://en.wikipedia.org/wiki/Data_clustering Clustering WikiPedia entry
Required Skills

knowledge and interest in data clustering approaches

decent mathematical skills (geometry)

reasonable Pd patching skills

possible C, Python or Lua skills
Possible Breakdown of Steps

Investigate existing collections of abstractions/patches that fit the requirements

Identify the set of abstractions/patches that will form the basis of the library

Create new abstractions/patches as necessary

Package up the abstractions for distribution