Comments
Description
Transcript
slides - Luca Maria Aiello
Workshop on Data Driven Dynamical Networks A glimpse on social influence and link prediction in OSNs Speaker: Luca Maria Aiello, PhD student Università degli Studi di Torino Computer Science Department [email protected] Keywords : link creation, link prediction, homophily, social influence, aNobii Acknowledgments Università degli Studi di Torino ISI Foundation Giancarlo Ruffo Rossano Schifanella Alain Barrat Ciro Cattuto People: School of Informatics and Computing, Indiana University Filippo Menczer Dynamics leading to link creation Several theories from sociology ◦ ◦ ◦ ◦ ◦ ◦ ◦ Self-interest Mutual-interest Exchange Contagion (influence) Balance Homophily Proximity Food networks Collaboration networks Social media 2nd part: exploit the observations on these phenomena to predict future links 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 3 Outline 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 4 Outline 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 5 Social network for bookworms Data-driven analysis on anobii.com Profile features Social network ◦ Library and wishlist ◦ Groups ◦ Tags 4th snapshot ◦ Directed ◦ Friendship + neighborhood Friendship Neighborhood Union Nodes 74,908 54,590 86,800 Links 268,655 429,482 697,910 6 snapshots, 15 days apart Full giant connected component 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 6 Basic statistics 103 ng(kout) nb(kout) nw(kout) 102 101 100 100 101 kout 102 103 Broad distributions Positive correlations between connectivity and activity Assortativity 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 7 Triadic closure Classification of new links at time t+1 between nodes already present at time t (t ∈ {1,…,5}) Double closure Closure Direct Reciprocated 75% 20% Bidirectional 30% 25% 10% Reciprocation is strong (exchange) Users tend to choose “friends of their friends” as new friends (balance) 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 8 Outline 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 9 Profile similarity vs. social distance Does similarity between user profiles depend on the social distance? b b b u, v b u v nb u nb v Topical overlap Statistical correlation because of assortative biases? Null model to discern real overlap from purely statistical effects ◦ No topical overlap other than that caused by statistical mixing patters 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 10 Geographical overlap Null model test with random link rewire Country-level overlap due to language barriers City level overlap 22/08/2010 SocialCom 2010 - Luca Maria Aiello, Università degli Studi di Torino 11 Outline 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 12 Causality between similarity and link creation Topical overlap is observed for all profile features What is the cause of topical overlap? Three possible explanations: 1. 2. Homophily (people connect with similar people) Social influence (social connection conveys similarity) 3. Mixture of the two Explore the causality relationship between profile similarity and social linking 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 13 Similarity link creation (homophily) duv = 2 〈ncb〉 9.5 σb 0.02 〈ncg〉 1.12 σg 0.05 u→v u↔v Closure 12.9 18.5 18.2 0.04 0.04 0.04 1.10 1.67 1.81 0.08 0.11 0.10 Dbl closure 23.4 0.05 1.20 0.12 Average similarity of pairs forming new links between t and t+1 (t=4), compared with average similarity of all the pairs at distance 2 at time t Pairs that are going to get connected show a substantially higher similarity 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 14 Books Groups Link creation similarity (influence) Evolution of the similarity between pairs linking together at different times 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 15 Summary Theories to explain link creation ◦ ◦ ◦ ◦ ◦ ◦ ◦ Self-interest Mutual-interest Exchange Reciprocity in linking Contagion Social influence Balance Triangle closure Homophily For all profile features Proximity Geographical and on social graph Can we exploit the observations on these phenomena to predict future links? 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 16 Outline 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 17 Link prediction Snapshots at time t and t+1 Predict links created between t and t+1 given the whole information at time t Supervised learning approach to combine profile and structural features Pair Id Library sim. Common neighbors Will be connected? 1 0.56 18 1 2 0.11 5 0 3 0.71 36 1 Learning set example 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 18 Features Profile ◦ Library (cosine) ◦ Groups (cosine) ◦ Groups (size) 1 s xy gG ( x ) G ( y ) | g | ◦ ◦ ◦ ◦ ◦ ◦ ◦ Gender {0,1} Town {0,1} Age (|age1 – age2|) Country {0,1} Vocabulary (cosine) Wishlists (cosine) Tagging behavior 28/09/2010 Structural ◦ Common neighbors ◦ Distance on graph ◦ Preferential attachment s xy k ( x) k ( y ) ◦ Resource allocation 1 s xy z ( x ) ( y ) k ( z ) ◦ Local path S A2 A3 , [0,1] Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 19 Link prediction: preliminary results Rotation forest, 10-fold cross-validation, balanced sets Precision Recall F-measure AUC Structural 0.782 0.778 0.777 0.838 Topical 0.746 0.746 0.746 0.82 Complete 0.827 0.826 0.826 0.9 Rotation forest, 10-fold cross-validation, unbalanced sets Complete K-ratio 28/09/2010 Precision Recall F-measure AUC 1:1 0.827 0.826 0.826 0.9 1:10 0.934 0.94 0.933 0.897 1:100 0.988 0.991 0.987 0.86 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 20 Outline 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 21 Conclusions and future work Theories on social network growth are verified Causality between similarity and social connection Effective link detection/prediction ◦ Topical information seems to be predictive as well as structural information RFC: ◦ Link prediction sampling/evaluation procedure ◦ New challenges in prediction 28/09/2010 Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino 22 Workshop on Data Driven Dynamical Networks Thank you for your attention! Speaker: Luca Maria Aiello [email protected] www.di.unito.it/~aiello Reference: L. M. Aiello, A. Barrat, C. Cattuto, G. Ruffo, R. Schifanella "Link creation and profile alignment in the aNobii social network" In SocialCom'10: Proceedings of the 2nd IEEE International Conference on Social Computing, Minneapolis, MN, USA, August 2010