Opinion Mining - Master DMKM
Useful link: Patrick's slides pages: 25-27 in particular
What was the new idea in Pagerank (with respect to the other systems in 1998)? How is it implemented?
Download a small web graph like
hollins.dat and load the data in octave format using
Data and algorithms are here
Check the loaded data using
Then, proceed to a rapid analysis of the data: compute the top10 of pages that contain a lot of incoming links, mean incoming links per pages…
Once, this preliminary work is done, apply PageRank on this data. Compare the resulting authorities with your preliminary work.
Give a quick implementation of the hits algorithm (slide 48 in P. Gallinari’s slides here). Explain the difference of ranking of the different methods.
- Define and understand the keyword webspam. Propose some basic algorithms to eliminate some webspam pages.
- Find some new dataset on the web and apply those algorithms on it to extract particular information.
- Download Gephi and visualize some graphical data