#### Teaching - DMKM

## Opinion Mining - Master DMKM

Download the (small) cora corpus (provided by Lise Getoor) cora.dat and a small related bibliography biblioGraph.zip.

### Content only approach

First, we propose to learn a perceptron on the content data. This algorithm will be taken as baseline for further experiments.

NB: we use the one-against-all technique to learn the perceptron in multi-class context.

NB2: in the following, you will distinguish a learning set and a test set.

The optimization of the perceptron can be viewed as a gradient descent of the following cost function:

Cost = \sum_i (x_i w y_i)_+

tips: the most efficient way to implement the multi-class perceptron (according to me) is the following one.

- transform the yi into vectors (1 for the class, -1 otherwise)
- rely on a matrix form of w (nb features x nb classes)
- initialize w to zero
- in the for loop
- randomly pick a sample
- make a (vectorial) prediction using w
- compute the gradient of C (wrt w)
- update only the part of w that implies some errors

Useful function to split a dataset into learning set and test set:

To perform cross validation:

### Smoothing by random walks -> To be done at the end

Then, we propose a simple heuristic method based on random walks: for each node in the graph, we have as many scores as categories (content only approach). Then, we assume that 2 nodes that are linked probably belong to the same category. According to this assumption, let implement a smoothing strategy of the scores.

Example:

- Transform scores into probabilities
- For i=1:niteration
- begin on a random labeled node x
- move randomly from x following the edge and propagate the label of x

### Regularized Approaches

Graph regularized approaches are very similar to random walks but they provide a clear framework: 2 linked nodes with different labels become penalized and a global cost function has to be minimized.

You will implement the method describe in the joined article:*Graph Regularization Methods for Web Spam Detection **Jacob Abernethy, Olivier Chapelle and Carlos Castillo*

Cost = \sum_i (x_i w y_i)_+ + \lambda \sum_{i,j} (x_i w - x_j w)^2 \delta_{ij}

The first part of the cost is minimized first, using the predefined perceptron and you should assert that the cost is well minimized by plotting the cost function with respect to the iteration.

Then, you will minimize the second part of the cost using a gradient descent. You should assert that the cost is well minimized by plotting the cost function with respect to the iteration.

I do not manage to obtain good performance with this strategy... And this is confirmed by the bibliography: we should implement transductive approach to get better results.

### Transductive regularized Approaches

Our aim is to compare classical classifiers and transductive approaches respectively denoted method 2 and 3 in the article)

First, you will learn w^\star with a perceptron and then, you will learn some slack variables \xi_i corresponding to the transductive formulation:

Cost = \sum_i (x_i w + \xi_i - y_i)^2 + \alpha \sum_{i,j} (x_i w +\xi_i - x_j w - \xi_j)^2 \delta_{ij} + \lambda ||\xi||^2

The slack variables correspond to the differences between our predictions and the exact labels. We denote by z_i=x_i w + \xi_i the predicted label. We can compute z only for known points: our model can not be used on new data, it is transductive.

NB: \xi_i from the learning set will be taken to minimize the previous equation, \xi_i from the test set will only minimize the second part of the equation.

NB2: in the multi-class case, the easiest way to implement the method consists in taking x_i w \in \mathbb{R}^C, \xi_i\in \mathbb{R}^C, y_i \in \mathbb{R}^C

In order to optimize this problem, we will use the following algorithm called coordinate descent:

- pick 2 random samples x_i, x_j
- compute the analytical solution of \frac{\partial C}{\partial \xi_i}=0 and \frac{\partial C}{\partial \xi_j}=0
- set \xi_i and \xi_j to their new values

### Iterative classification

Another popular method to tackle collective classification problem is ICA: Iterative Classification Algorithm. This approach is well defined in the joined article:*Collective Classification in Network Data Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor*

We propose to implements this strategy on cora and to compare the results with previous approaches.

Basically, the idea consists in learning a perceptron, then we make a prediction for all the data and we build a new representation of the data (a new matrix V). The columns that we add in V could be:

- the number/percentage of neighbours in each category (i.e., we add C features)
- the average representation of neighbours in each category (i.e., we add D*C features)

Once the new representation is computed, we learn a new perceptron w_2.

After that, we enter the iteration classificaiton loop:

- classfying the data using w_2
- using the classification to update the representation of the points
- loop until convergence

Another interesting experiment consists in using only the graph features in the iterative schedule. It will enable you to measure the gain related to the graph structure.

A quick function to compute enriched representation: enhancedDescr.m