Opinion Mining - Master DMKM
All programs will be written in octave. A rapid tutorial (in french) is available at link
1. PageRank and authority measure
Data and basic algorithm are available on Langville webpage:
Aim of the practical session: ranking webpages according to PageRank and HITS algorithms and comparing the results.
2. Classification in Graphs, collective classification
This is the biggest part of the course. We will study 3 approaches:
- Semi supervised approaches which aim at building a model minimizing the cost of misclassification as well as a regularization cost on the graphe.
- 2nd kind of algorithms are very close: the formulation is nearly the same but we will use transductive formulation that are more efficient
- Finally, we study algorithms derived from ICA (iterative classification algorithm)
Aim of the practical session: implementing node classification algorithm in relational datasets. Comparing basic results (content only) and improved approaches (content+relations)
3. Clustering and community detection
Mixing PLSA and LDA approaches with sentiment classification.
4. Sentiment classification
There is also an interesting data section at this address.
Aim of the practical session: comparing thematic and sentiment classification. Introduction to transfert learning and multi-topic sentiment classification.
The report will present rapidly (between 1/2 page and 2 pages max) the graph ranking algorithms (ie pagerank). You will focus on:
- Motivation of those algorithms / Comparison with previous approaches
- How it works (main part, with some illustration, convergence checking...)
- Are those algorithms robust? How can they be skirted?
The report will then present graph classification algorithms (main part, no particular limit of length). You will focus on the following points:
- Description of the algorithms, how they work. Giving a lot of illustrations of the optimization processes (evolution of the cost, evolution of the rate of good classification in learning/testing).
- All descriptions and curves will be commented: you have to show that you understood well what you've done: are the results logical? What is really optimized? What does the generalization concept means?...
- All algorithms should be compared and you can make some proposal to improve the results (without implementing all solutions). You will insist on all advantages/drawbacks of the methods.
The report should also include a short part (same size as the first one) on the clustering task (i.e. community detection) and sentiment classification.
- rapid algorithmic description
- description of the issues around this task
- quick analyses of the results