...

slides - Luca Maria Aiello

by user

on
Category: Documents
16

views

Report

Comments

Transcript

slides - Luca Maria Aiello
Workshop on Data Driven Dynamical Networks
A glimpse on social influence and link
prediction in OSNs
Speaker:
Luca Maria Aiello, PhD student
Università degli Studi di Torino
Computer Science Department
[email protected]
Keywords : link creation, link prediction, homophily, social influence, aNobii
Acknowledgments
Università degli Studi di Torino
ISI Foundation
Giancarlo Ruffo
Rossano Schifanella
Alain Barrat
Ciro Cattuto
People:
School
of Informatics and Computing, Indiana University
Filippo Menczer
Dynamics leading to link creation

Several theories from sociology
◦
◦
◦
◦
◦
◦
◦
Self-interest
Mutual-interest
Exchange
Contagion (influence)
Balance
Homophily
Proximity
Food networks
Collaboration networks
Social media
2nd part:
exploit the observations on these phenomena to predict future links
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
3
Outline
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
4
Outline
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
5
Social network for bookworms
Data-driven analysis on anobii.com
 Profile features
 Social network

◦ Library and wishlist
◦ Groups
◦ Tags
4th snapshot
◦ Directed
◦ Friendship + neighborhood
Friendship
Neighborhood
Union
Nodes
74,908
54,590
86,800
Links
268,655
429,482
697,910
6 snapshots, 15 days apart
 Full giant connected component

28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
6
Basic statistics
103
ng(kout)
nb(kout)
nw(kout)
102
101
100
100
101
kout
102
103
Broad distributions
 Positive correlations
between connectivity and
activity
 Assortativity

28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
7
Triadic closure

Classification of new links at time t+1 between
nodes already present at time t (t ∈ {1,…,5})
Double
closure
Closure
Direct
Reciprocated
75%
20%
Bidirectional
30%
25%
10%
Reciprocation is strong (exchange)
 Users tend to choose “friends of their friends”
as new friends (balance)

28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
8
Outline
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
9
Profile similarity vs. social distance
Does similarity between user profiles depend on the social distance?
b
 b    b 

u, v  
b
u
v
nb u   nb v 
Topical overlap
 Statistical correlation because of assortative biases?
 Null model to discern real overlap from purely
statistical effects

◦ No topical overlap other than that caused by statistical
mixing patters
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
10
Geographical overlap



Null model test with
random link rewire
Country-level overlap
due to language
barriers
City level overlap
22/08/2010
SocialCom 2010 - Luca Maria Aiello, Università degli Studi di Torino
11
Outline
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
12
Causality between similarity and link
creation

Topical overlap is observed for all profile features
What is the cause of topical overlap?

Three possible explanations:
1.
2.
Homophily (people connect with similar people)
Social influence (social connection conveys
similarity)
3. Mixture of the two

Explore the causality relationship between
profile similarity and social linking
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
13
Similarity  link creation (homophily)
duv = 2
〈ncb〉
9.5
σb
0.02
〈ncg〉
1.12
σg
0.05
u→v
u↔v
Closure
12.9
18.5
18.2
0.04
0.04
0.04
1.10
1.67
1.81
0.08
0.11
0.10
Dbl closure
23.4
0.05
1.20
0.12


Average similarity of pairs forming new links
between t and t+1 (t=4), compared with average
similarity of all the pairs at distance 2 at time t
Pairs that are going to get connected show a
substantially higher similarity
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
14
Books
Groups
Link creation  similarity (influence)

Evolution of the similarity between pairs linking
together at different times
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
15
Summary

Theories to explain link creation
◦
◦
◦
◦
◦
◦
◦
Self-interest
Mutual-interest
Exchange  Reciprocity in linking
Contagion  Social influence
Balance  Triangle closure
Homophily  For all profile features
Proximity  Geographical and on social graph
Can we exploit the observations on
these phenomena to predict future links?
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
16
Outline
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
17
Link prediction



Snapshots at time t and t+1
Predict links created between t and t+1 given the
whole information at time t
Supervised learning approach to combine profile
and structural features
Pair Id
Library sim.
Common neighbors
Will be connected?
1
0.56
18
1
2
0.11
5
0
3
0.71
36
1
Learning set example
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
18
Features

Profile
◦ Library (cosine)
◦ Groups (cosine)
◦ Groups (size)
1
s xy 

gG ( x ) G ( y ) | g |
◦
◦
◦
◦
◦
◦
◦
Gender {0,1}
Town {0,1}
Age (|age1 – age2|)
Country {0,1}
Vocabulary (cosine)
Wishlists (cosine)
Tagging behavior
28/09/2010

Structural
◦ Common neighbors
◦ Distance on graph
◦ Preferential attachment
s xy  k ( x)  k ( y )
◦ Resource allocation
1
s xy  
z ( x )   ( y ) k ( z )
◦ Local path
S  A2  A3 ,   [0,1]
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
19
Link prediction: preliminary results


Rotation forest, 10-fold cross-validation, balanced sets
Precision
Recall
F-measure
AUC
Structural
0.782
0.778
0.777
0.838
Topical
0.746
0.746
0.746
0.82
Complete
0.827
0.826
0.826
0.9
Rotation forest, 10-fold cross-validation, unbalanced sets
Complete
K-ratio
28/09/2010
Precision
Recall
F-measure
AUC
1:1
0.827
0.826
0.826
0.9
1:10
0.934
0.94
0.933
0.897
1:100
0.988
0.991
0.987
0.86
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
20
Outline
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
21
Conclusions and future work
Theories on social network growth are verified
 Causality between similarity and social
connection
 Effective link detection/prediction

◦ Topical information seems to be predictive as well
as structural information

RFC:
◦ Link prediction sampling/evaluation procedure
◦ New challenges in prediction
28/09/2010
Les Houches 2010 - Luca Maria Aiello, Università degli Studi di Torino
22
Workshop on Data Driven Dynamical Networks
Thank you for your attention!
Speaker: Luca Maria Aiello
[email protected]
www.di.unito.it/~aiello
Reference:
L. M. Aiello, A. Barrat, C. Cattuto, G. Ruffo, R. Schifanella
"Link creation and profile alignment in the aNobii social network"
In SocialCom'10: Proceedings of the 2nd IEEE International
Conference on Social Computing, Minneapolis, MN, USA, August 2010
Fly UP