Teaching - DMKM


Opinion Mining - Master DMKM

Schedule:

All programs will be written in octave. A rapid tutorial (in french) is available at link

1. PageRank and authority measure

Data and basic algorithm are available on Langville webpage:
http://www.limfinity.com/ir/

Aim of the practical session: ranking webpages according to PageRank and HITS algorithms and comparing the results.

Session details, data and questions

2. Classification in Graphs, collective classification

This is the biggest part of the course. We will study 3 approaches:

  • Semi supervised approaches which aim at building a model minimizing the cost of misclassification as well as a regularization cost on the graphe.
  • 2nd kind of algorithms are very close: the formulation is nearly the same but we will use transductive formulation that are more efficient
  • Finally, we study algorithms derived from ICA (iterative classification algorithm)

Resources: http://www.cs.umd.edu/~getoor/
http://www.cs.umass.edu/~mccallum/ with special attention to the data section of his website.

Aim of the practical session: implementing node classification algorithm in relational datasets. Comparing basic results (content only) and improved approaches (content+relations)

Session details, data and questions

3. Clustering and community detection

Resources: http://lear.inrialpes.fr/~verbeek/software.php [PLSA]
http://www.inma.ucl.ac.be/~blondel/research/louvain.html [Community]

Mixing PLSA and LDA approaches with sentiment classification.

Session details, data and questions

4. Sentiment classification

Resources: http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
http://www.cs.jhu.edu/~mdredze/datasets/sentiment/

There is also an interesting data section at this address.

Aim of the practical session: comparing thematic and sentiment classification. Introduction to transfert learning and multi-topic sentiment classification.

Session details, data and questions

Report specifications

The report will present rapidly (between 1/2 page and 2 pages max) the graph ranking algorithms (ie pagerank). You will focus on:

  • Motivation of those algorithms / Comparison with previous approaches
  • How it works (main part, with some illustration, convergence checking...)
  • Are those algorithms robust? How can they be skirted?

The report will then present graph classification algorithms (main part, no particular limit of length). You will focus on the following points:

  • Description of the algorithms, how they work. Giving a lot of illustrations of the optimization processes (evolution of the cost, evolution of the rate of good classification in learning/testing).
  • All descriptions and curves will be commented: you have to show that you understood well what you've done: are the results logical? What is really optimized? What does the generalization concept means?...
  • All algorithms should be compared and you can make some proposal to improve the results (without implementing all solutions). You will insist on all advantages/drawbacks of the methods.

The report should also include a short part (same size as the first one) on the clustering task (i.e. community detection) and sentiment classification.

  • rapid algorithmic description
  • description of the issues around this task
  • quick analyses of the results