Tractable and Consistent Random Graph Models September 27, 2014 Arun G. Chandrasekhar
by user
Comments
Transcript
Tractable and Consistent Random Graph Models September 27, 2014 Arun G. Chandrasekhar
Introduction Simple example SERGMs SUGMs Illustrations Conclusion Tractable and Consistent Random Graph Models Arun G. Chandrasekhar‡ ‡ ? Matthew O. Jackson? Stanford, Dept. of Economics Stanford, Dept. of Economics, Santa Fe Institute, CIFAR September 27, 2014 Motivation Introduction Simple example SERGMs SUGMs Illustrations Conclusion Network formation models: Risk-sharing relationships typically function when embedded in grouped relationships? Do cross race/ethnicity/caste friendships typically occur in private? Social learning, etc. Core motivations for models: 1. Can test economic hypotheses from theory 2. Counterfactuals/policy Desiderata Introduction Simple example SERGMs SUGMs Illustrations Estimable network formation models Conclusion Link dependencies More likely to link to friends of friends? Do relationships need to be embedded in others to function? (e.g., need for support, cliques, etc.?) Link to others who are well-connected? Good statistical properties (i.e., consistency) Random utility foundations Difficulties Introduction Simple example SERGMs SUGMs Illustrations Conclusion Typical dataset: Observe one draw g and covariates x = (x1 , ..., xn ) Extremes: Extreme 1: Full independence Each decision is independent =⇒ n 2 independent obs. Extreme 2: Full dependence e.g., Nodes 1, 2 linked ⇐⇒ all nodes linked. 0 26 87 93 1 2 4 5 28 6 Introduction 7 3 Simple example 16 8 9 10 SERGMs 13 12 11 14 SUGMs 15 Illustrations 18 33 17 34 19 Conclusion 35 37 20 36 38 39 22 40 24 23 41 25 27 42 21 43 29 45 31 30 98 44 32 46 97 48 47 49 52 53 75 62 50 60 54 51 61 67 55 63 66 56 64 57 72 65 58 84 68 59 71 86 70 69 73 85 88 74 89 76 90 91 77 92 78 94 79 95 80 96 81 82 83 0 26 87 93 1 2 4 5 28 6 Introduction 7 3 Simple example 16 8 9 10 SERGMs 13 12 11 14 SUGMs 15 Illustrations 18 33 17 34 19 Conclusion 35 37 20 36 38 39 22 40 24 23 41 25 27 42 21 43 29 45 31 30 98 44 32 46 97 48 47 49 52 53 75 62 50 60 54 51 61 67 55 63 66 56 64 57 72 65 58 84 68 59 71 86 70 69 73 85 88 74 89 76 90 91 77 92 78 94 79 95 80 96 81 82 83 Framework Introduction Simple example SERGMs SUGMs Illustrations Triangular array of random graphs: {gn : n ∈ N} drawn Conclusion g ∼ Pnβn (·) Want to consistently estimate β n as n → ∞. Without any restrictions on preferences and action space ui (g, X; β n ) impossible to proceed. Literature Introduction Simple example SERGMs SUGMs Illustrations Conclusion Most models used are not consistent with one draw g ERGM literature (see Shalizi and Rinaldo ‘13); Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Exponential Random Graphs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Pβ (g) = P exp(β·S(g)) 0 g 0 exp(β·S(g )) S(g) a vector of sufficient statistics, usually subgraph counts e.g., # links, # triangles, # k-stars, ... Literature Introduction Simple example SERGMs SUGMs Illustrations Conclusion Most models used are not consistent with one draw g ERGM literature (see Shalizi and Rinaldo ‘13); Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Infeasible or difficult estimation ERGM literature; Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Literature Introduction Simple example SERGMs SUGMs Illustrations Conclusion Most models used are not consistent with one draw g ERGM literature (see Shalizi and Rinaldo ‘13); Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Infeasible or difficult estimation ERGM literature; Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Techniques that apply for dense graphs Bickel and Chen (‘09) Chatterjee, Diaconis and Sly (‘11); Graham (‘14) Chatterjee, Diaconis and Sly (2011) Introduction Simple example SERGMs SUGMs Illustrations Conclusion Unknown β := (β1 , ..., βn ). pij = exp(βi + βj ) . 1 + exp(βi + βj ) Show with probability 1 − O(n−2 ), ! r log n b max βi − βi ≤ O . i n Literature Introduction Simple example SERGMs SUGMs Illustrations Conclusion Most models used are not consistent with one draw g ERGM literature (see Shalizi and Rinaldo ‘13); Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Infeasible or difficult estimation ERGM literature; Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Techniques that apply for dense graphs Bickel and Chen (‘09) Chatterjee, Diaconis and Sly (‘11); Graham (‘14) Literature Introduction Simple example SERGMs SUGMs Illustrations Conclusion Most models used are not consistent with one draw g ERGM literature (see Shalizi and Rinaldo ‘13); Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Infeasible or difficult estimation ERGM literature; Christakis, Fowler, Imbens and Kalyanaraman (‘10); Mele (‘13); Koenig (‘13); Badev (‘14); Sheng (‘14) Techniques that apply for dense graphs Bickel and Chen (‘09) Chatterjee, Diaconis and Sly (‘11); Graham (‘14) Techniques that apply for near block-diagonal graphs Random Geometric Graph Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion Random Geometric Graph Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion =⇒ Random Geometric Graph Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion =⇒ Random Geometric Graph Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion =⇒ =⇒ Random Geometric Graph Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion Representation of n × n adjacency matrix Contributions of this paper Introduction Simple example SERGMs SUGMs Illustrations Conclusion Develop new models that: extend the workhorse model in the literature emphasize subgraphs as objects of interest; resulting network is a projection under certain assumptions, have estimators that are consistent and asymptotically normally distributed are feasible to estimate generate graphs consistent with data have microfoundations Summary of Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion Summary of Approach Introduction Simple example SERGMs SUGMs Illustrations Conclusion =⇒ Microfoundation Examples Introduction Simple example SERGMs SUGMs Illustrations Conclusion 1. Dynamic (myopic) revision 2. Mutual consent 3. Strategic search Microfoundation Examples Dynamic Revision Introduction Simple example SERGMs SUGMs Illustrations Conclusion A. Payoffs: Additively separable in subgraphs X v(g` , X` ; β` ). ui (g, X; β) = g` ,i∈g` Microfoundation Examples Dynamic Revision Introduction Simple example SERGMs SUGMs Illustrations Conclusion A. Payoffs: Additively separable in subgraphs X v(g` , X` ; β` ). ui (g, X; β) = g` ,i∈g` B. Stability concept: Pairwise stability with transfers ij ∈ g implies that ui (g) + uj (g) ≥ ui (g − ij) + uj (g − ij), and ij ∈ / g implies that ui (g) + uj (g) ≥ ui (g + ij) + uj (g + ij). Microfoundation Examples Dynamic Revision Introduction Simple example SERGMs SUGMs Illustrations Conclusion A. Payoffs: Additively separable in subgraphs X v(g` , X` ; β` ). ui (g, X; β) = g` ,i∈g` B. Stability concept: Pairwise stability with transfers ij ∈ g implies that ui (g) + uj (g) ≥ ui (g − ij) + uj (g − ij), and ij ∈ / g implies that ui (g) + uj (g) ≥ ui (g + ij) + uj (g + ij). Remark: A+B =⇒ existence of a potential function: f (·). Microfoundation Examples Dynamic Revision Introduction Simple example SERGMs SUGMs Illustrations Conclusion A. Payoffs: Additively separable in subgraphs X v(g` , X` ; β` ). ui (g, X; β) = g` ,i∈g` B. Stability concept: Pairwise stability with transfers ij ∈ g implies that ui (g) + uj (g) ≥ ui (g − ij) + uj (g − ij), and ij ∈ / g implies that ui (g) + uj (g) ≥ ui (g + ij) + uj (g + ij). Remark: A+B =⇒ existence of a potential function: f (·). C. Meeting process and shocks: ∀t, at most one pair meets; every ij have some positive probability of being selected. Microfoundation Examples Dynamic Revision Introduction Simple example SERGMs SUGMs Illustrations Conclusion A. Payoffs: Additively separable in subgraphs X v(g` , X` ; β` ). ui (g, X; β) = g` ,i∈g` B. Stability concept: Pairwise stability with transfers ij ∈ g implies that ui (g) + uj (g) ≥ ui (g − ij) + uj (g − ij), and ij ∈ / g implies that ui (g) + uj (g) ≥ ui (g + ij) + uj (g + ij). Remark: A+B =⇒ existence of a potential function: f (·). C. Meeting process and shocks: ∀t, at most one pair meets; every ij have some positive probability of being selected. Result [Butts ‘09, Mele ‘13, This paper]: Invariant distribution is exp(f (g; β)) 0 g 0 exp(f (g ; β)) g ∼ Pβ (g) := P Microfoundations Static Mutual Consent Introduction Simple example A. Payoffs: Payoffs by subgraph SERGMs SUGMs Illustrations Conclusion ui (g` , X` ; β` ) = β` hi (g` , X` ) − i,` Microfoundations Static Mutual Consent Introduction Simple example A. Payoffs: Payoffs by subgraph SERGMs SUGMs ui (g` , X` ; β` ) = β` hi (g` , X` ) − i,` Illustrations Conclusion B. Stability concept: Mutual consent g` ⇐⇒ ui (g` , X` ; β) ≥∗ 0, ∀i ∈ g` . Microfoundations Static Mutual Consent Introduction Simple example A. Payoffs: Payoffs by subgraph SERGMs SUGMs ui (g` , X` ; β` ) = β` hi (g` , X` ) − i,` Illustrations Conclusion B. Stability concept: Mutual consent g` ⇐⇒ ui (g` , X` ; β) ≥∗ 0, ∀i ∈ g` . C. Meeting process and shocks: Subgroups of size m` with with probability π` and shocks i,` ∼ F` (·): Y p` (X` ; β) = π` · F` (β` h). i∈g` Microfoundations Static Mutual Consent Introduction Simple example A. Payoffs: Payoffs by subgraph SERGMs SUGMs ui (g` , X` ; β` ) = β` hi (g` , X` ) − i,` Illustrations Conclusion B. Stability concept: Mutual consent g` ⇐⇒ ui (g` , X` ; β) ≥∗ 0, ∀i ∈ g` . C. Meeting process and shocks: Subgroups of size m` with with probability π` and shocks i,` ∼ F` (·): Y p` (X` ; β) = π` · F` (β` h). i∈g` Result [This paper]: Under some assumptions β consistently estimable distrubtion is g ∼ Pβ (g) := P exp(f (g;β)) 0 g 0 exp(f (g ;β)) Microfoundations Introduction Static Search Intensity Simple example SERGMs SUGMs Illustrations Conclusion Agents put in (costly) search effort to form cliques for a payoff Whether a given clique forms is stochastic, depending on efforts of all involved Resulting graph reflects subgraphs (cliques) of various sizes occuring with various frequencies Outline Introduction Simple example SERGMs SUGMs Illustrations Conclusion (i) Simple example (ii) Subgraph Generated Models (iii) Illustrations ERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Workhorse: Exponential Random Graph Models Example: studied extensively by Strauss 86, Wasserman and Pattison 96, Park and Newman 04, Butts 06, Chatterjee and Diaconis 11, Mele 14,... e.g., probability of a graph depends on how many links present how many triangles present - e.g., Coleman’s closure, support in Jackson et al., etc. ERGMs: An Example Introduction Simple example SERGMs SUGMs Illustrations Conclusion Want probability of graph g to depend on θI #isolates(g) + θL #links(g) + θT #triangles(g). ERGMs: An Example Introduction Simple example SERGMs SUGMs Illustrations Conclusion Want probability of graph g to depend on θI #isolates(g) + θL #links(g) + θT #triangles(g). Set Pθ (g) ∝ exp (θI #isolates(g) + θL #links(g) + θT #triangles(g)) ERGMs Introduction Simple example SERGMs SUGMs More generally Illustrations Conclusion exp (θ1 S1 (g) + ... + θk Sk (g)) 0 0 g 0 ∈G n exp (θ1 S1 (g ) + ... + θk Sk (g )) Pθ (g) = P where: g is a graph Sl (g) is a “statistic” - often the number of copies of some subgraph within the parent graph ERGMs Introduction Simple example SERGMs SUGMs More generally Illustrations Conclusion exp (θ1 S1 (g) + ... + θk Sk (g)) 0 0 g 0 ∈G n exp (θ1 S1 (g ) + ... + θk Sk (g )) Pθ (g) = P where: g is a graph Sl (g) is a “statistic” - often the number of copies of some subgraph within the parent graph MCMC techniques for estimation have led to these becoming standard (Snijders 02, Handcock 03, ...) Estimating ERGMs Introduction Simple example SERGMs SUGMs Illustrations Bhamidi, Bresler and Sly (2008), Chatterjee and Diaconis (2011) Conclusion For “large classes” (dense enough) ERGMs... MCMC: (Gibbs sampling) Mixing time is less than exponential only if networks have asymptotically approximately independent links (ER-like) So when ERGMs are interesting, then MCMC estimation will be exponentially slow Simulation evidence of instability even in the sparse case... Simple example Introduction Simple example SERGMs SUGMs Illustrations Conclusion Subgroups of nodes sometimes meet Make structures at random (e.g., cliques) Here: triangles, links, isolates Example: Isolates, Links and Triangles Introduction Simple example SERGMs SUGMs Illustrations Conclusion exp (θI SI (g) + θL SL (g) + θT ST (g)) . 0 0 0 g 0 exp (θI SI (g ) + θL SL (g ) + θT ST (g )) Pθ (g) = P Example: Isolates, Links and Triangles Introduction Simple example SERGMs SUGMs Illustrations exp (θI SI (g) + θL SL (g) + θT ST (g)) . 0 0 0 g 0 exp (θI SI (g ) + θL SL (g ) + θT ST (g )) Pθ (g) = P Conclusion ST (g): numer of triangles. SL (g): number of edges. SI (g): number of isolated nodes. Example: Isolates, Links and Triangles Introduction Simple example SERGMs SUGMs exp (θI SI (g) + θL SL (g) + θT ST (g)) . 0 0 0 g 0 exp (θI SI (g ) + θL SL (g ) + θT ST (g )) Pθ (g) = P Illustrations Conclusion ST (g): numer of triangles. SL (g): number of edges. SI (g): number of isolated nodes. n = 50 nodes Example: Isolates, Links and Triangles Introduction Simple example SERGMs SUGMs exp (θI SI (g) + θL SL (g) + θT ST (g)) . 0 0 0 g 0 exp (θI SI (g ) + θL SL (g ) + θT ST (g )) Pθ (g) = P Illustrations Conclusion ST (g): numer of triangles. SL (g): number of edges. SI (g): number of isolated nodes. n = 50 nodes 20 isolates, 45 links, 10 triangles on avg. Example: Isolates, Links and Triangles Introduction Simple example SERGMs SUGMs exp (θI SI (g) + θL SL (g) + θT ST (g)) . 0 0 0 g 0 exp (θI SI (g ) + θL SL (g ) + θT ST (g )) Pθ (g) = P Illustrations Conclusion ST (g): numer of triangles. SL (g): number of edges. SI (g): number of isolated nodes. n = 50 nodes 20 isolates, 45 links, 10 triangles on avg. Corresponds to pI = 0.33, pL = 0.042, pT = 0.0014 Isolates: θbI Introduction Simple example SERGMs SUGMs Illustrations Conclusion Isolates: usually not awful (sometimes) Introduction Simple example SERGMs SUGMs Illustrations Conclusion Links: θbL Introduction Simple example SERGMs SUGMs Illustrations Conclusion Links: usually too dense Introduction Simple example SERGMs SUGMs Illustrations Conclusion Triangles: θbT Introduction Simple example SERGMs SUGMs Illustrations Conclusion Triangles: too much closure or empty Introduction Simple example SERGMs SUGMs Illustrations Conclusion Take-away for this example Introduction Simple example SERGMs SUGMs Illustrations Conclusion Conclusions: Even fixed value of S leads to unstable estimates of θ, with unreasonable chains of graphs being drawn, and unreasonable inference suggested. Take-away for this example Introduction Simple example SERGMs SUGMs Illustrations Conclusion Conclusions: Even fixed value of S leads to unstable estimates of θ, with unreasonable chains of graphs being drawn, and unreasonable inference suggested. But the model was so easy to describe... Idea Here Introduction Simple example SERGMs SUGMs Illustrations Conclusion Many g’s but fewer possible statistics (values for S) Many networks lead to the same statistics Probabilities depend on statistics Networks with same statistics are equally likely Can collapse all equivalent networks to sufficient statistics What did we do? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Exponential random graph model exp (β · S(g)) 0 g 0 ∈G n exp (β · S(g )) Pβ (g) = P with N (s) = |{g : S(g) = s}|. What did we do? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Rewrite denominator as sum over equivalence classes exp (β · S(g)) 0 0 s0 ∈S n NS (s ) exp (β · s ) Pβ (g) = P with N (s) = |{g : S(g) = s}|. What did we do? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Rewrite denominator as sum over equivalence classes exp (β · S(g)) 0 0 s0 ∈S n NS (s ) exp (β · s ) Pβ (g) = P with N (s) = |{g : S(g) = s}|. What did we do? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Sum LHS and numerator over equivalence classes NS (s) · exp (β · s) 0 0 s0 ∈S n NS (s ) exp (β · s ) Pβ (s) = P with N (s) = |{g : S(g) = s}|. Statistical form Introduction Simple example SERGMs SUGMs Illustrations Conclusion Sum LHS and numerator over equivalence classes NS (s) · exp (β · s) 0 0 s0 ∈S n NS (s ) exp (β · s ) Pβ (s) = P with N (s) = |{g : S(g) = s}|. SERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion NS (s) · exp (β · s) 0 0 s0 ∈S n NS (s ) exp (β · s ) Pβ (s) = P Nothing special about NS (s) - just the counting measure. But typically impossible to compute... SERGMs Introduction Simple example SERGMs SUGMs Illustrations NS (s) · exp (β · s) 0 0 s0 ∈S n NS (s ) exp (β · s ) Conclusion Pβ (s) = P Nothing special about NS (s) - just the counting measure. But typically impossible to compute... KS (s) · exp (β · s) 0 0 s0 ∈S n KS (s ) exp (β · s ) Pβ (s) = P for other KS (·) 6= NS (·). SERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Pβ (s) = P KS (s) · exp (β · s) KS (s0 ) exp (β · s0 ) s0 ∈S n SERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Pβ (s) = P KS (s) · exp (β · s) KS (s0 ) exp (β · s0 ) s0 ∈S n Natural micro-foundations for other K(·) 6= N (·). K(·) 6= N (·) may recover desirable statistical properties. SERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion KS (s) · exp (β · s) 0 0 s0 ∈An KS (s ) exp (β · s ) Pβ (s) = P S can encode many things: Links, cliques, k-stars, other subgraphs, friends in common per link, multigraphs, etc. Can do preference based models Allows for node characteristics Outline Introduction Simple example SERGMs SUGMs Illustrations Conclusion (i) Simple example (ii) Subgraph Generated Models (iii) Illustrations Economic Networks Introduction Simple example SERGMs SUGMs Illustrations Conclusion What do many economic networks look like? (i) Interdependency between links: more than random (conditional on covariates) chance of one’s friends being friends themselves (ii) Mostly empty. SUGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Subgraph Generation Models Subnetworks generated, network is a by-product/projection People form links, triangles; some folk are anti-social (isolates), etc... SUGM Idea Introduction Simple example SERGMs SUGMs Illustrations Conclusion Nature forms subgraphs of each type w/ some probability May intersect, may overlap Observe resulting graph, infer probabilities Nodes Introduction Simple example SERGMs SUGMs Illustrations Conclusion Triangles Form Introduction Simple example SERGMs SUGMs Illustrations Conclusion Links Form Introduction Simple example SERGMs SUGMs Illustrations Conclusion Desired Network Introduction Simple example SERGMs SUGMs Illustrations Conclusion Model Introduction Simple example SERGMs SUGMs Illustrations Conclusion subgraphs: G = (G1 , ..., Gk ) Model Introduction Simple example SERGMs subgraphs: G = (G1 , ..., Gk ) SUGMs Illustrations Conclusion def. nicely ordered: G` 6⊂ G`0 , `0 > ` g`0 ∈ G0` ⇐⇒ g`0 6⊂ g` , g` ∈ G` , `0 > ` Model Introduction Simple example SERGMs subgraphs: G = (G1 , ..., Gk ) SUGMs Illustrations Conclusion def. nicely ordered: G` 6⊂ G`0 , `0 > ` g`0 ∈ G0` ⇐⇒ g`0 6⊂ g` , g` ∈ G` , `0 > ` e.g., G1 = triangles and G2 = {links 6⊂ triangles} Model Introduction Simple example SERGMs subgraphs: G = (G1 , ..., Gk ) SUGMs Illustrations Conclusion def. nicely ordered: G` 6⊂ G`0 , `0 > ` g`0 ∈ G0` ⇐⇒ g`0 6⊂ g` , g` ∈ G` , `0 > ` e.g., G1 = triangles and G2 = {links 6⊂ triangles} parameters: pn = (pn1 , ..., pnk ) k or w/ covariates pn (x; γ) = (pn ` (x` ; γ` ))`=1 Model Introduction Simple example SERGMs subgraphs: G = (G1 , ..., Gk ) SUGMs Illustrations Conclusion def. nicely ordered: G` 6⊂ G`0 , `0 > ` g`0 ∈ G0` ⇐⇒ g`0 6⊂ g` , g` ∈ G` , `0 > ` e.g., G1 = triangles and G2 = {links 6⊂ triangles} parameters: pn = (pn1 , ..., pnk ) k or w/ covariates pn (x; γ) = (pn ` (x` ; γ` ))`=1 formation: Type 1 subgraphs form with probability pn 1. Type 2 subgraphs form with probability pn 2. ... Type ` subgraphs form with probability pn `. ... Problem: Incidental Generation Introduction Simple example SERGMs SUGMs Illustrations Conclusion def. generating class: for every G` there are classes J of subgraphs that can generate g` ∈ G` . e.g., {Triangles, Links}. Triangles have 4 generating classes: (T, T, T ); (T, L, L); (T, T, L); (L, L, L). def. relatively sparse: a sequence of models w/ nicely ordered subgraphs and parameters is relatively sparse if the expected share of incidentally generated subgraphs to truly generated subgraph vanishes Q n j∈J Ep [S`j (g)] →0 M n J Epn [S` (g)] Relative Sparsity Reasonable? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example {Triangles, Links} (T,T,T): n3 p3T = p2T n3 → 0 =⇒ pT = o(n−3/2 ) pT √ Remark 1: a given node can be in n2 pT = o( n) triangles. Relative Sparsity Reasonable? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example {Triangles, Links} (T,T,T): n3 p3T = p2T n3 → 0 =⇒ pT = o(n−3/2 ) pT √ Remark 1: a given node can be in n2 pT = o( n) triangles. (L,L,L): p3L → 0 =⇒ pL = o(n−1/2 ) pT √ Remark 2: a given node can be in (n − 1)pL = o( n) links. Relative Sparsity Reasonable? Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Leider, Mobius, Rosenblatt, Do (‘09) Harvard social network: n ≈ 4000 Avg number of links per person: 7 Avg % of one’s friends that are friends themselves: 8% Observe that pT = 3/n2 , pL = 2/n yields Avg number of links per person: 8 Avg % of one’s friends that are friends themselves: 10.71% SUGMs behave well Introduction Simple example SERGMs SUGMs Illustrations Conclusion Theorem: Consider a nicely ordered and relatively sparse SUGM with statistics on m1 , ..., mk nodes. k Letting Dn = diag {p` nm` }`=1 : 0 0 Dn1/2 (b pn1 , ..., pbnk ) − (pn1 , ..., pnk ) √ N 0, Ik 2 . SUGMs behave well Introduction Simple example SERGMs SUGMs Illustrations Conclusion Theorem: Consider a nicely ordered and relatively sparse SUGM with statistics on m1 , ..., mk nodes. k Letting Dn = diag {p` nm` }`=1 : 0 0 Dn1/2 (b pn1 , ..., pbnk ) − (pn1 , ..., pnk ) √ N 0, Ik 2 . Covariates: Trivial to add discrete covariates. Subgraphs tagged by covariates. [curse of dimensionality] Straightforward to add continuous covariates: pn` (x` ; γ` ). √ cf. Hjort and Pollard (‘93), sub- nm` rates due to sparsity. Relation: SERGM/SUGM Introduction Simple example SERGMs SUGMs Illustrations Conclusion Q. Can we view SUGMs as SERGMs? Relation: SERGM/SUGM Introduction Simple example SERGMs SUGMs Illustrations Q. Can we view SUGMs as SERGMs? Conclusion Ans. Yes, and it motivates specific K(·)’s. Relation: SERGM/SUGM Introduction Simple example SERGMs SUGMs Illustrations Q. Can we view SUGMs as SERGMs? Conclusion Ans. Yes, and it motivates specific K(·)’s. Theorem: Suppose that the probability that subgraph of type ` exp β` forms is given by p` = 1+exp β` . This form of SUGM can be represented in a SERGM form: e exp Se · β K n (S) e =P , Pnβ (S) n 0 0 s0 K (s ) exp (s · β) where K`n (s` ) = n S` s` and K n (s) = Q ` K`n (s` ). Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i (i) Pick a nicely ordered set of statistics (e.g., 4cliques, triangles, and links) Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i (i) Pick a nicely ordered set of statistics (e.g., 4cliques, triangles, and links) (ii) Run a regression on all* 4-tuples of whether it is a 4clique, conditional on covariates P 0 P(ijkl ∈ K4 |Xijkl ) = Λ(Xijkl γ4 ) =⇒ γ b4 −→ γ4 . Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i (i) Pick a nicely ordered set of statistics (e.g., 4cliques, triangles, and links) (ii) Run a regression on all* 4-tuples of whether it is a 4clique, conditional on covariates P 0 P(ijkl ∈ K4 |Xijkl ) = Λ(Xijkl γ4 ) =⇒ γ b4 −→ γ4 . (iii) Remove links used in (ii). Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i (i) Pick a nicely ordered set of statistics (e.g., 4cliques, triangles, and links) (ii) Run a regression on all* 4-tuples of whether it is a 4clique, conditional on covariates P 0 P(ijkl ∈ K4 |Xijkl ) = Λ(Xijkl γ4 ) =⇒ γ b4 −→ γ4 . (iii) Remove links used in (ii). (iv) Run a regression on all* triples of whether it is a triangle, conditional on covariates. P 0 P(ijk ∈ K3 |Xijk ) = Λ(Xijk γ3 ) =⇒ γ b3 −→ γ3 . Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i (i) Pick a nicely ordered set of statistics (e.g., 4cliques, triangles, and links) (ii) Run a regression on all* 4-tuples of whether it is a 4clique, conditional on covariates P 0 P(ijkl ∈ K4 |Xijkl ) = Λ(Xijkl γ4 ) =⇒ γ b4 −→ γ4 . (iii) Remove links used in (ii). (iv) Run a regression on all* triples of whether it is a triangle, conditional on covariates. P 0 P(ijk ∈ K3 |Xijk ) = Λ(Xijk γ3 ) =⇒ γ b3 −→ γ3 . (v) Remove links used in (iv). Easy to implement Introduction Simple example SERGMs SUGMs Illustrations Conclusion Example: Researcher has graph g and covariates Xi for all i (i) Pick a nicely ordered set of statistics (e.g., 4cliques, triangles, and links) (ii) Run a regression on all* 4-tuples of whether it is a 4clique, conditional on covariates P 0 P(ijkl ∈ K4 |Xijkl ) = Λ(Xijkl γ4 ) =⇒ γ b4 −→ γ4 . (iii) Remove links used in (ii). (iv) Run a regression on all* triples of whether it is a triangle, conditional on covariates. P 0 P(ijk ∈ K3 |Xijk ) = Λ(Xijk γ3 ) =⇒ γ b3 −→ γ3 . (v) Remove links used in (iv). (vi) Run a regression on all* pairs of whether it is an edge, conditional on covariates. P 0 P(ij ∈ K2 |Xij ) = Λ(Xij γ2 ) =⇒ γ b2 −→ γ2 . Outline Introduction Simple example SERGMs SUGMs Illustrations Conclusion (i) Simple example (ii) Subgraph Generated Models (iii) Illustrations Illustration Introduction Simple example SERGMs SUGMs Illustrations Conclusion 1. Why need SUGMs/SERGMs and not simply dyadic with rich covariates? 2. Social pressure and caste example Need for SUGMs/SERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Examine data from 75 Indian villages from Banerjee, Chandrasekhar, Duflo, Jackson ‘13 Estimate a model and then use it to generate networks How well do the model-recreated networks match real networks on non-modeled characteristics Need for SUGMs/SERGMs Introduction Simple example SERGMs SUGMs Illustrations Conclusion Step 1: Estimate models Dyadic: estimate pL (covariates) Geographic distance, caste, amenities (housing material, roofing material, number of beds), etc. SUGM: estimate pclose , pfLar , pclose , pfTar L T Step 2: Randomly generate networks Dyadic: generate links given pL (covariates) , pfTar SUGM: generate subgraphs w/ pclose , pfLar , pclose L T Recreate Networks Introduction Simple example SERGMs SUGMs Illustrations Conclusion Data Data Data Link-based Link-based Link-based model model Link-based model SUGM SUGM model with SUGM Link-based with links links with SUGM SUGM model links SUGM with with SUGM links with SUG iso Data Data with with covariates covariates with covariates with covariates andand triangles triangles with and covariates triangles and links triangles links andand links trian an t [1] [1] [1] [1] [2] [2] [2] [1] [2] [3] [3] [3] [2] [3] [4] [4] Unsupported of mber Unsupported of Number Unsupported Unsupported Number Links of Links Unsupported 160.8 160.8 Links160.8 160.8236.2 236.2 236.2 160.8 236.2161.2 161.2 161.2 236.2 161.2161.8 161. Models to are fitLinks toofLinks Triangles of mber Triangles of Number Triangles of Triangles Number of Triangles 39.2 39.2 39.2 39.2 3.13.1 39.2 3.1 3.1 39.7 39.7 39.7 3.1 39.7 39.5 39.5 ifferent of ombinations gree erage Degree Degree AverageofDegree Average Degree 2.3243 2.32432.3243 2.3243 2.3260 2.32602.3260 2.3243 2.3260 2.5916 2.59162.5916 2.3260 2.5916 2.5219 2.521 hese statistics. of mber solates Isolates of Number Isolates of Isolates Number of Isolates 54.9722 54.9722 54.9722 54.9722 25.7222 25.7222 25.7222 54.9722 25.7222 31.4444 31.4444 31.4444 25.7222 31.4444 65.9167 65.91 erage Clustering stering Clustering Average Clustering Average Clustering 0.0895 0.08950.0895 0.0895 0.0105 0.01050.0105 0.0895 0.0105 0.1268 0.12680.1268 0.0105 0.1268 0.0829 0.082 Giant in ction Giant Component in Fraction Giant Component Component in Giant Fraction Component in Giant Component 0.7061 0.70610.7061 0.7061 0.8315 0.83150.8315 0.7061 0.8315 0.7982 0.79820.7982 0.8315 0.7982 0.6718 0.671 None odels of the models envalue st alue First fit Eigenvalue First Eigenvalue 5.5446 5.54465.5446 5.5446 3.8578 3.85783.8578 5.5446 3.8578 4.6762 4.67624.6762 3.8578 4.6762 5.3025 5.302 to reEigenvalue directly to ny of Gap these Gap Spectral Gap ectral pGap Spectral 0.9550 0.95500.9550 0.9550 0.3354 0.33540.3354 0.9550 0.3354 0.6684 0.66840.6684 0.3354 0.6684 1.0617 1.061 tatistics. Eigenvalue cond nvalue Eigenvalue Second of Stochastized of Stochastized Eigenvalue of Stochastized Second Matrix ofMatrix Eigenvalue Stochastized Matrix of 0.9573 Matrix Stochastized 0.95730.9573Matrix 0.9573 0.9632 0.96320.9632 0.9573 0.9632 0.9559 0.95590.9559 0.9632 0.9559 0.9069 0.906 erage Path h Length Length Path Average Length Path Average Length Path Length4.6921 4.69214.6921 4.6921 5.6565 5.65655.6565 4.6921 5.6565 5.1215 5.12155.1215 5.6565 5.1215 4.1180 4.118 Notes: nts erage [1] average the presents Column value average value of the[1] various of value average presents various ofnetwork various value network the average ofcharacteristics network various characteristics value characteristics network ofacross various across characteristics theacross network the 36 36 villages. the villages. characteristics across 36 Columns villages. the Columns 36 Columns [2], villages. across [2], [3] the [3] and Columns [2], and 36 [4] [3] villages. [4] present and [2], present [4] [3] Columns simulation present and simulation [4][2], simulat presen results [3] res te meters imulation rst ersparameters estimate of aofgiven awe given parameters first of model a model given estimate forof model for a agiven parameters given a given for village model a village given ofand for a village given and then a given then model randomly and village randomly then for randomly draw aand given draw then a graph village arandomly draw graph from and a from graph the draw then the model from randomly amodel graph the with model with from the draw the estimated the with a estimated graph model the estimated from parameters. with parameters. the the model estimate paramet WewW r each eimulations es or villages for each ofeach the offor villages the for ofeach the each models models offor ofthe the and each models and villages average ofaverage the and models for across average each across the and ofacross the simulations, the average simulations, models theacross simulations, and andaverage the and thesimulations, the entries and entries across the report entries report the and these simulations, the these report averaged entries averaged these report and across averaged the across these entries theacross averaged the villages. report villages. thethese across villages. avera the v Introduction Simple example SERGMs SUGMs Illustrations Conclusion Introduction Simple example SERGMs SUGMs Illustrations Conclusion Introduction Simple example SERGMs SUGMs Illustrations Conclusion Introduction Simple example SERGMs SUGMs Illustrations Conclusion Illustration: Social Pressure Introduction Simple example SERGMs SUGMs Illustrations Conclusion Cross-caste relationships more likely to occur : “in private” with no friends in common? or with the same frequency in embedded relationships? B B C A A Rela%vely less likely? Social Pressure Introduction Simple example SERGMs SUGMs Illustrations Conclusion Need more consent to form triad than dyad. Need to account for these preferences, otherwise mechanically find less desired dyad less desired triad > . more desired dyad more desired triad Social Pressure Introduction Simple example SERGMs SUGMs Illustrations preferences: Conclusion ui (j) = αL + βL 1{caste(i) 6= caste(j)} − ij,L ui (jk) = αT + βT (1 − 1{caste(i) = caste(j) = caste(k)}) − ijk,T frequencies: qL (x), qT (x) x ∈ {Same(1), Dif f (0)} question: qL (0) qT (0) > qL (1) qT (1) Introduction Simple example SERGMs question: qT (0) qL (0) > qL (1) qT (1) SUGMs Illustrations Conclusion issues: heterogenous meeting probabilities: πL (x), πT (x) a link requires 2 consents; a triangle 3 consents we observe g and therefore: pL (x) = πL (x) · qL (x)2 and pT (x) = πT (x) · qT (x)3 . Introduction we observe g and therefore: Simple example SERGMs SUGMs pL (x) = πL (x) · qL (x)2 and pT (x) = πT (x) · qT (x)3 . Illustrations Conclusion assume: people spend f > 1/2 share of time mixing with own caste; 1 − f with other caste Introduction we observe g and therefore: Simple example SERGMs SUGMs pL (x) = πL (x) · qL (x)2 and pT (x) = πT (x) · qT (x)3 . Illustrations Conclusion assume: people spend f > 1/2 share of time mixing with own caste; 1 − f with other caste, implying πT (0) πL (0) > . πT (1) πL (1) Introduction we observe g and therefore: Simple example SERGMs SUGMs pL (x) = πL (x) · qL (x)2 and pT (x) = πT (x) · qT (x)3 . Illustrations Conclusion assume: people spend f > 1/2 share of time mixing with own caste; 1 − f with other caste, implying πT (0) πL (0) > . πT (1) πL (1) Then, a sufficient condition for pT (0) < pT (1) qL (0) qL (1) > pL (0) pL (1) qT (0) qT (1) 3/2 . is 1.2 Introduction Simple example SERGMs 1 SUGMs Illustrations Conclusion pT(different)/pT(same) 0.8 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 (pL(different)/pL(same))3/2 0.8 1 1.2 1.2 Introduction Below Median Above Median Simple example SERGMs 1 SUGMs Illustrations Conclusion (pT(different)/pT(same)) 0.8 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 (pL(different)/pL(same))3/2 0.8 1 1.2 Social pressure conclusion Introduction Simple example SERGMs SUGMs Illustrations Conclusion People show significantly stronger preference for cross-caste relationships when links are in isolation Suggestive that it is especially true with more symmetrically sized castes Conclusion Introduction Simple example SERGMs SUGMs Illustrations Conclusion ERGMs face estimation challenges Computational difficulties Poor asymptotic properties Work with subgraphs directly to simplify Consistency, fast estimation Natural (though limited) microfoundations Reflects patterns in empirical data Applications Recreate networks, how relationship types are sensitive to context, easy to extend to multigraphs