Unsupervised evaluation

You will use 2 unsupervised methods to build cluster on the cora dataset:

  • PLSA (content only)
  • Louvain (graph based)

Then, we aim at measuring the similarities between the 2 results. You will use purity and Rand index.


NB: to perform the evaluation (purity), in each cluster:

  • retrieve the class of the samples in this cluster
  • find the most important class
  • consider all samples from this class as good classified and all other samples as badly classified.

NB2: to compute the distance between the 2 clustering approaches (Rand index): refer to wikipedia: http://en.wikipedia.org/wiki/Rand_index


You can work on some methods which combine content and graph structure to perform the clustering.