Discovering Communities through Friendship Greg Morrison * , L. Mahadevan
by user
Comments
Transcript
Discovering Communities through Friendship Greg Morrison * , L. Mahadevan
Discovering Communities through Friendship Greg Morrison1*, L. Mahadevan1,2,3 1 School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America, 2 Wyss Institute of Biological Engineering, Harvard University, Cambridge, Massachusetts, United States of America, 3 Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America Abstract We introduce a new method for detecting communities of arbitrary size in an undirected weighted network. Our approach is based on tracing the path of closest-friendship between nodes in the network using the recently proposed Generalized Erdös Numbers. This method does not require the choice of any arbitrary parameters or null models, and does not suffer from a system-size resolution limit. Our closest-friend community detection is able to accurately reconstruct the true network structure for a large number of real world and artificial benchmarks, and can be adapted to study the multi-level structure of hierarchical communities as well. We also use the closeness between nodes to develop a degree of robustness for each node, which can assess how robustly that node is assigned to its community. To test the efficacy of these methods, we deploy them on a variety of well known benchmarks, a hierarchal structured artificial benchmark with a known community and robustness structure, as well as real-world networks of coauthorships between the faculty at a major university and the network of citations of articles published in Physical Review. In all cases, microcommunities, hierarchy of the communities, and variable node robustness are all observed, providing insights into the structure of the network. Citation: Morrison G, Mahadevan L (2012) Discovering Communities through Friendship. PLoS ONE 7(7): e38704. doi:10.1371/journal.pone.0038704 Editor: Yamir Moreno, University of Zaragoza, Spain Received February 13, 2012; Accepted May 13, 2012; Published July 20, 2012 Copyright: ß 2012 Morrison, Mahadevan. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors have no support or funding to report. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] partition into communities fci g that have a higher intra-community weight than would be expected randomly. This choice of a random network acts as a null model, although other choices are possible [13], and a wide variety of numerical approaches for efficiently computing the maximal partition exist, including statistical mechanical methods [14], bisection algorithms [11], and other greedy searches [15,16]. While modularity maximization is both intuitive and accurate in a variety of settings, Q has a natural system-size resolution limit [17,13]: if the number of nodes becomes large (N??), but the typical strength Wi of all nodes remains finite, the total strength W ?? and the second term in the sum in Eq. 1 becomes small (since Wi and Wj do not diverge). Thus, modularity maximization may not detect small communities in large networks due to this resolution limit. Simple methods to overcome this limitation include the introduction of a resolution parameter [14,13] c, with the redefinition of Q~(2W ){1 P (w {cW ij i Wj =2W )d(ci ,cj ), or multiresolution methods [18] ij which impose a self-loop of strength r on the network (i.e. wij ?wij zrdij ) in Eq. 1. Both of these approaches overcome the problem of a resolution limit by introducing an arbitrary parameter in detecting community structure that must be tuned. Alternate approaches to community detection avoid a resolution limit through other means, such as thresholding the resistance distance between nodes, with nodes having low resistance distance between each other belonging to the same community [19], maximizing the ‘fitness’ of each node in a greedy fashion [20], creating block models to detect communities if the number of expected communities is exactly known [21], or refining communities by finding ‘statistically significant’ nodes [22]. In all these approaches, at least one free parameter is required to detect the communities, which may be useful in giving the ability to tune the resolution at which Introduction The topology of networks occurring in biological or chemical [1,2], social [3,4], political [5], or technological [6] systems can give profound insights into a variety of important aspects of these systems, such as the processes that generated the network [7], the stability of the system [8] or the properties of processes occurring on it [9]. An important aspect of common real-world networks is that of community structure [10], where subsets of the network are densely connected internally and weakly connected externally. Nodes in the same community have more in common than those in distinct communities, reflected in the topology of denser intracommunity edges than inter-community edges. However, the detection of communities in networks without apriori knowledge of their structure is highly nontrivial, and methods for community detection have recently attracted a great deal of interest. Perhaps the most common approach for community detection in networks is based on modularity maximization [11,12]. Each node i in a network of N nodes and M edges is assigned to a single community, ci , with the partition chosen to maximize Q~ Wi Wj 1 X (wij { )d(ci ,cj ), 2W 2W ij ð1Þ where wij is the weight of the edge between nodes i and j, P 1X Wi ~ j wij is the strength of node i, W ~ Wi , and i 2 d(ci ,cj )~1 if ci ~cj and 0 otherwise. For an unweighted network, wij :aij = 0 or 1, where aij is the adjacency matrix, and thus Wi ~ki is the degree of the node. Modularity compares the network in question to a randomly generated network with each node constrained to have the same strength, and is maximized by a PLoS ONE | www.plosone.org 1 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship algorithms) and without a system-size resolution limit [17,13], and therefore unambiguously chooses a ‘natural’ partition of the network. Despite the simplicity of our method, there exist pathological network topologies may require modification of the algorithm in order to accurately detect the community structure. As a simple example, a node that is connected to every other node in the network will be everyone’s closest friend, regardless of the topology of the rest of the network, and only one community will be detected using our approach (see Supplementary Information S1 for further discussion). Failure of the detection algorithm in this case can be avoided by searching for the closest unpopular friend (CUF), where the CUF is detected by sorting the closest friends of node i in descending order of node degree, and choosing the first node fu (i) who has degree less than or equal to the next-closest node. This ensures that we avoid nodes with extremely high degree (the popular close friends), who may have many out-of-community connections, and choose fu (i) to be a node that is simultaneously (a) a close friend (but not necessarily the closest) and (b) less likely to have out-of-community edges. The path of closest friendship is modified to be piu ~fi,fu (i),fu (fu (i)), . . .g, and community detection proceeds as described above. We note that neither the CF nor CUF approaches depend on the graph being Hamiltonian: the particular path pi or piu need not span the entire graph for any starting node i (and must not, if there is to be more than one community). Additional modifications to both the CF and CUF methods are required due to community fracture: communities may be split into two or more disjoint pieces due to the random fluctuations of the edges [25] (see Supplementary Information S1 for further discussion). Fractured communities may occur for any community detection algorithms, and a greedy approach to detect and merge fractured communities is described in Supplementary Information S1. communities are detected, but with no a-priori method for determining the ‘correct’ value that leads to a meaningful partition. In this paper, we develop a new parameter-free, resolutionlimit-free method for community detection, most easily understood intuitively in the context of a social network: a person belongs in the same community as his or her ‘closest friend’ (the node to which he or she has the greatest measure of ‘closeness,’ discussed below). Our method requires a way to measure closeness (or friendship) between nodes in a network, and a variety of such measures are available [23]. We will focus primarily on a recently proposed non-metric measure of closeness [24], the Generalized Erdös Numbers (GENs), which have been found useful in a variety of contexts in understanding the structure of network topology. This closest-friend community detection method is shown to be able to accurately detect communities in a variety of widely used benchmarks, in some cases outperforming some modularitymaximizing detection schemes in real world networks with a known ‘correct’ partition. We also extend the method to detect community structure at a lower resolution (macrocommunities formed from higher resolution microcommunities) without appealing to a free parameter. Our approach has the advantages of being intuitively accessible, free of arbitrary parameters, and able to accurately find communities in complex networks. We leverage our chosen measure of closeness between nodes in determining the robustness of assignment of each node into its community (rather than a global measure of the quality of the partition using modularity). Finally, our approach is applied to a citation network and a coauthorship network, and the complex hierarchical structure of each network is examined in detail. Methods Communities from Closeness In a network with community structure, nodes in a community have a higher density of edges internally (to other nodes in their community) than they do externally. While one approach to community detection maximizes global quality functions that depend on the density of edges [10], we could alternatively search for high densities of edges locally to find communities. Such a local method may use an appropriate measure of closeness between nodes, with ‘close’ nodes having multiple short-length paths between one another (implying a locally high density of edges; see below for examples). In the context of a social network, for example, it is natural to expect that closest friends (those who feel closest to one to another given a measure of ‘closeness’) should be found in the same community. Such an expectation can be enforced by determining the closest friend (CF) of each node i, denoted f (i), and requiring them to be in the same community. In other words, node i is assigned to the same community as the node to which it is topologically closest. The closest friend of f (i) (denoted f (f (i))) is also found in this community, and we generate a path of closest friendship pi ~fi,f (i),f (f (i)), . . .g (halting when a self-intersection occurs after which the cycle would repeat). Nodes i and j that share elements of their closest friend paths (i.e. Dpi \pj D=0) will all trace to the same central loop, and each of the elements of pi and pj are placed in the same community. If the closeness measure is well chosen (such that a higher density of edges implies a stronger feeling of ‘closeness’), the closest friend paths for nodes in each community will remain within the correct communities, allowing for an accurate partition of the network (discussed further in Supplementary Information S1). This approach has the advantage of generating a single partition (rather than a tree of many possible partitions from which the ‘correct’ partition must be chosen, commonly used in clustering PLoS ONE | www.plosone.org Choosing a Closeness Measure Before we apply the CF or CUF method for community detection, we must choose a measure of closeness between nodes in that network, with the only requirement being that nodes i and j are ‘closer’ if there is a higher density of edges (multiple shortranged paths) between them. We focus on the use of a recently developed closeness measure, the Generalized Erdös numbers [24] (GENs), created with two simple principles in mind: (i) connections from node j to nodes that feel close to a specified node (nodes {k} with low Eik ) are more important than connections to other nodes, and (ii) a connection of high weight from j to some node k should make node j feel more close to node k and less close to node i. This second expectation is natural if closeness is defined with a limited resource in mind, such as the time spent between people in a social or coauthorship network [24]. These expectations naturally lead to a weighted harmonic mean [24], with Eii ~0 and Wj X wjk ~ : Eij k[C Eik zw{1 jk j with Cj the set of nodes that are connected to j. Eij is not a distance metric (as Eij =Eji ), a desirable property because unpopular (low degree or low weight) individuals may feel close to popular (high weight) nodes, but not vice-versa. The GENs are computed numerically by setting Eij(0) ~(1{dij ) and iteratively computing P (t) (tz1) zw{1 Eij(tz1) ~Wj = k wjk =(Eik jk ), halting when maxij DEij {Eij(t) Dƒd for some tolerance d (we used 5|10{3 . Computing the closeness between all pairs of nodes i and j will scale as N|M, 2 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship hierarchical community structure. Instead of using a tunable resolution parameter whose ‘correct’ value(s) are unknown apriori, the CF/CUF method naturally suggests a simpler approach: to iteratively coarse grain the network using a highresolution partition (detected as described above) and then reapply our detection method on the lower resolution network. Communities in the high-resolution partition act as coarse grained nodes, and the average closeness felt between communities serves to determine closest friends. If the GENs are chosen as the Pmeasure c {1 of closeness, the averages are taken as (Ehg ) ~ i[g,j[h Eij{1 =ng nh , where ng is the number of nodes in g. While the choice of a method of coarse graining the network implies an additional degree of freedom in our algorithm, it is important to note the differences between the CUF method and modularity maximization with a variable resolution parameter. In the CF/CUF method, the resolution can not be tuned continuously by choosing different closeness measures or methods of coarse graining. Rather, the choice of measure and method set an optimal apriori resolution for hierarchical community detection, which is likely to be robust to changes in the method if the closeness measure and the coarse graining method are well chosen. The accuracy of our hierarchical detection method on a commonly used artificial benchmark, implemented in Ref. [18], is shown in Fig. 1(g), with additional benchmarks discussed further in Supplementary Information S1. A network of 256 nodes is formed from 16 communities of 16 nodes each, in turn composed of 4 macrocommunities containing 4 communities each. Each node has on average 13 edges within its community and 4 edges outside of its community but within its macrocommunity, and 1 edge outside of its macrocommunity. This is similar to the Reichardt and Bornholdt [14,20] benchmark discussed in Supplementary Information S1 and adapted in the next section. We compare the partitions detected using the CUF algorithm with a simulated annealing maximization of the multiresolution modularity (that is, Eq. 1 with wij ?wij zrdij , where r is a resolution parameter ranging from rmin ~{W =N to ?). The average modularity Qr for the modularity maximizing partition is shown by the red points in Fig. 1(g), and this modularity maximizing partition transitions smoothly between the high-resolution communities detected using our CUF algorithm for large r and the low-resolution coarse grained using our hierarchical algorithm for small r. Additional analysis of a similar benchmark for our hierarchical detection algorithm can be found in Supplementary Information S1. and is the slowest step in detecting communities using the CF or CUF approaches. To see how our closeness measure works in detecting communities in a network with known community structure, we examine the Girvan-Newman benchmark [1,12] in Fig. 1(a), which consists of four equal-sized communities of 32 nodes, each with kout edges leading out of the community and 16{kout edges within the community. The connectivity between communities can also be described by the mixing parameter m~kout = (kin zkout )~kout =16, with detection of the correct communities > 0:5. The level of > 8 or m * becoming difficult when kout * agreement between the detected and correct partition is quantified using the normalized mutual information [10]: ! Nnij nij log t 0 ni nj i[Pt ,j[P0 P 0 I~2 P t ni log (nti =N)z nj log (n0j =N) P i[Pt ð2Þ j[P0 with nti the number of nodes in community i of the trial partition (Pt ), n0j is the number in community j of the true partition (P0 ), and nij is the number simultaneously occurring in i and j of Pt and P0 . In Fig. 1(a), we see that the accuracy of the CUF approach does depend on the choice of closeness measure, where we compare the performance of the GEN measure with others [23] such as the overlap measure (Oij ~DCi \Cj D with Cj the set of neighbors of j) and the Jacard coefficient (Jij ~DCi \Cj D=DCi |Cj D ). Similarly, in real-world networks with an apriori known community structure (shown in Fig. 1(b)) such as the Football network [1], the Political Blogs network [26], and the Political Books network [27] (see Supplementary Information S1), both the GENs and overlap are consistently more accurate in community detection than greedy modularity maximization. Because the GENs are the most accurate on both real world and artificial networks of all of the closeness measures attempted, we choose to focus on them as our measure of closeness in the rest of the paper. Additional Benchmarks of Community Detection As a systematic test of the method on a more complex benchmark, apply our detection method to the benchmark of Lancichinetti, Fortunato, and Radicchi [28]. Communities are of variable size (with the size s of each drawn from a power law distribution, P(s)*s{b ) and the degree of each node is drawn from a scale free distribution as well (P(k)*k{c ). Each node has on average a fraction m of its edges within its assigned community and 1{m edges outside of its community. The complex structure of this network makes community detection non-trivial, but as seen in Fig. 1(c-f) our method is accurately able to reconstruct the correct partition for various values of b, c, and m (for N~1000 and 50 realizations of the network for each data point). So long as mƒ0:5, > 0:9, we typically find the normalized mutual information I * indicating a good agreement with the correct partition. Our approach produces partitions that are less accurate than the results reported in Fig. 5 of Ref. [28], in accordance with the observations in Fig. 1(a) that the method underperforms modularity maximization when the correct partition is also modularity maximizing. However, the CUF method still performs admirably, with the additional benefits of no fitting parameters or resolution limits. Robustness of Individual Nodes It is desirable that any method for community detection be relatively robust to small changes in network connectivity. Modularity may be used to assess the quality of a partition on a global level at a particular resolution, but not the robustness of a individual node. The assignment of node i to a particular community may be fragile (non-robust) if it P (a) has few edges within its assigned community (i.e. small kiin ~ j[ci aij ) or (b) has a small ratio of in-community and out-of-community edges (i.e. small kiin =(ki {kiin )~kiin =kiout ). It is useful to incorporate both of these elements into a single measure, which we call the degree of robustness: di(1) is the number of the kiin nodes to which i feels closest that are in i’s microcommunity. Nodes with high robustness can be considered the ‘core’ of their community, since of all of the nodes in the community they have the largest number of close friends amongst the other community members. In networks with a hierarchical community structure, nodes may have varying robustness at each resolution. Nodes that are robustly assigned to a microcommunity may have a fragile assignment to its macro- Hierarchical Communities In many cases [29,20] networks have community structure at multiple resolutions, begging the question of how to detect such a PLoS ONE | www.plosone.org 3 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship Figure 1. Benchmarks of the community detection algorithm. (a) shows the mutual information between the detected and true partitions for varying kout and for different closeness measures on the Girvan-Newman benchmark [1,12]. Up and down triangles show modularity maximization using a greedy [16] (implemented in Mathematica) and Potts model [14,32] for comparison with the CUF method implemented using the Jacard Coefficients (black circles), GENs (red squares) and overlap (blue stars) as closeness measures. (b) Percent improvement of the CUF approach over a greedy modularity maximization [16] using the GENs (red), overlap (blue), and Jacard Coefficients (black) as a closeness measure for real world networks with a ‘correct’ partition known apriori. Taken together, (a) and (b) suggest the GENs are typically more accurate measure of closeness. (c-f) show the CUF method implemented on the benchmark of Lancichinetti, Fortunato and Radicchi for varying k, b and c (compare to Fig. 5 and 7 of Ref. [28]). The CUF method performs well for mƒ0:5, although modularity maximization is more accurate (as is the case in (a)), and beings to fail significantly for mw0:5 as expected. (g) shows the multiresolution modularity [18] Qr of the high (solid black line) and low (dashed blue line) resolution partitions using our CUF algorithm, alongside the maximum modularity determined via simulated annealing. The modularity maximizing solutions transition smoothly between the coarser partition for small r and the finer partition for larger r as expected, indicating that our CUF method does indeed detect the two levels of hierarchy accurately without appealing to arbitrary parameters. doi:10.1371/journal.pone.0038704.g001 Supplementary Information S1). This modified benchmark allows us to examine the effectiveness of the multi-level hierarchical community detection as well as the utility of the degree of robustness D(j) i . An example of the benchmark is shown explicitly in Fig. 2(a) for a~8, for which the in-, out-, and mix-degrees of nodes vary significantly with i (see the caption of Fig. 2). Fig. 2(b-c) show the in-degrees and in-out ratios for the highest resolution of the hierarchy and (e-f) for the coarsest resolution, with a decrease in kiin implying a node is less connected to its community and a in out mix decrease in r(1) i ~ki =(ki zki ) indicating a node is highly connected to nodes outside of its community. When we apply our community detection algorithm, the CUF approach recovers the correct partition with a mutual information of SImicro T~0:95 on the micro-scale and SImacro T~0:85 on the macro-scale (see eq. 2) at a~8. The mutual information at each scale increases for for decreasing a, but begins to drop rapidly > 10. The high value of the mutual information shows near a * that the CUF algorithm accurately detects the intended communities for reasonably large asymmetry in the community structure (see Supplementary Information S1 for further hierarchical benchmarking). The benchmark shows that the degree of robustness D(j) i accurately determines nodes that are less robustly assigned to their intended community at both levels of resolution (shown in Fig. 2(d) and (g)). Nodes in macrocommunity A are less connected to the network overall (and are less robustly assigned at all scales), with and D(2) are decreasing with and unsurprisingly both D(1) i i i ~½(i{1) mod 32=31 as expected. In macrocommunity B, nodes have a constant in-community degree and a decreasing ratio of in- to out-of-community degree at each scale, so nodes should community, and vice versa. To assess the robustness at each level (j) (j{1) , where di(j) is of the hierarchy, we can compute D(j) i ~di {di the robustness of a node i at the j th resolution in the hierarchy, (1) setting di(0) ~0 for notational convenience so that D(1) i ~di . (j) Nodes with small Di are weakly connected to the other nodes in their community (i.e. their assignment to the micro- or macrocommunity is fragile, regardless of the robustness in communities of other resolutions). Note that the normalized degree of robustness D(j) i =ki is useful in detecting nodes on the boundary between communities (having many edges, but few close friends in their assigned community), but that D(j) i more directly indicates robustness as the number of strong in-community edges. At each level of resolution, the average robustness of any community can P (j) (j) {1 be estimated as r(j) c ~SDi Ti[c ~nc i[c Di . An Artificial Benchmark with Variable Robustness In order to introduce variable node robustness into an artificial benchmark, we modify the benchmark of Reichardt and Bornholdt [14,20] (similar to that in Fig. 1(g)) which includes 512 nodes, 16 microcommunities of 32 nodes, and 4 macrocommunities of 128 nodes (see Supplementary Information S1 for more details). Each node i has on average kiin edges connecting it to its microcommunity, kiout zkiin edges in its macrocommunity, and kimix edges outside of its macrocommunity. In order to modify the benchmark to allow for variable node robustness, we choose kiin , kiout , and kimix to depend on i in a simple fashion, depending on the macrocommunity it is assigned to (labelled A– D in Fig. 2(a)) and an asymmetry parameter a§0, with a~0 corresponding to the standard Reichardt-Bornholdt benchmark [14] (see the table in the caption of Fig. 2 and discussion in PLoS ONE | www.plosone.org 4 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship Figure 2. Benchmarks with variable node robustness. (a) A snapshot of the benchmark with hierarchical community structure and variable node robustness at a~8. The behavior of the nodes as a function of a and i ~½(i{1) mod 32=31 is described in the table, with (2) in out mix in out mix r(1) is the in-out ratio at the i ~ki =(ki zki ) the average in-out ratio at the microcommunity resolution, and ri ~(ki zki )=ki macrocommunity resolution. In the table, down arrows, up arrows, and dashes denote increasing, decreasing, and constant values (respectively) of the quantities on average. (b) and (e) show the in-degrees at each resolution, kiin for microcommunities and kiin zkiout for macrocommunities. (2) (1) Likewise, (c) and (f) show the ratio of in- and out-degrees at each resolution, r(1) i and ri . (d) shows the degrees of robustness Di at the micro-scale on the macro-scale. The behavior of the degrees of robustness at both resolutions agrees with the expectations in and (g) shows the robustness D(2) i most cases: if the in-degrees or in- to out-degrees decrease, the nodes become less robust. doi:10.1371/journal.pone.0038704.g002 Table 1. Macrocom. kiin kiout kimix kiin r(1) i Behavior kiin zkiout r(2) i k0mix (1{ai =k0in ) ; – Less robust ; – Less robust – ; Less robust Behavior A k0in {ai k0out (1{ai =k0in ) B k0in k0out zai =2 k0mix zai =2 – ; Less robust C k0in k0out {ai =2 k0mix zai =2 – – Constant ; ; Less robust k0in zai k0out z2ai k0mix z3ai =r(2) i : ; More robust{ : – More robust D The robustness with increasing i depends on how slowly kiin increases. doi:10.1371/journal.pone.0038704.t001 { be less robust with increasing i . While the expected decrease in robustness is clearly observed for D(1) i , at the macro-scale there is a slight (but unexpected) increase in the robustness of each node as i increases. This is due to errors in the macro-scale community detection, with macrocommunity B being the most difficult to detect of all of them. Nodes in macrocommunity C have constant in-degree and in-out ratio at the micro-scale (with the corresponding robustness D(1) i nearly constant), but at the macro-scale are less robust with both the in-degree and inout ratio decreasing (leading to an expected decrease in D(2) i with increasing i ). Finally, the nodes on the micro-scale in macrocommunity D simultaneously have increasing in-degree but decreasing in-out ratio with increasing i . While we find the (1) degree of robustness D(1) i increasing, the rate of increase of Di depends on the interplay between the increased robustness due to more in-community edges and the decreased robustness due in macrocommunity B to more out-of-community edges. D(1) i in macrocommunity D are both clear examples and D and D(2) s in of the dependence of the rate of increase in D(j) s on both k and (j) r . The successes in correctly determining not only the hierarchical community structure but also node robustness of this simple benchmark suggest that our approach may be PLoS ONE | www.plosone.org fruitfully applied to complex real world networks with hierarchical structure. Results and Discussion The Harvard Coauthorship Network Turning now to real examples, we look at the network of scientific journals which we expect can be divided into sub-fields at varying resolutions. We construct a network from publications found in the Digital Access to Scholarship at Harvard (DASH) repository, a database of journals, book chapters, and conference proceedings uploaded by Harvard faculty. The available metadata includes the authors and the journal of publication, which we use to generate a weighted network with each journal as a node. The weight of the edge between nodes i and j, wij , is the number of article pairs that have at least one author in common, with one article published in journal i and the other in journal j. The largest connected component of this network (comprising N~779 journals as nodes, shown in Fig. 3(a)) has a complex structure: while the degree of each node (the number of edges with non-zero weight) is exponentially distributed, P(ki ~k)*e{k=15:1 , the strength of each node is log-normally distributed, with a good fit 2 given by P(Wi ~W )*W {1 e{0:24½log (W ){5:3 (see Fig. 3(b-c)). It is 5 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship nity (so more edges are directed towards Philosophy and History). The degree of robustness is thus able to home in on microcommunities that may be on the boundary between macrocommunities and identifying particularly complex topologies. interesting to note that an exponentially distributed degree sequence is indicative of network growth without preferential attachment [30], while log-normally distributed strengths may indicate growth with a localized preferential attachment in the weight (see ref. [31] and below for further discussion). This may illuminate some of the details of how a publication network grows: while authors preferentially publish in high-profile journals or proceedings (leading to the fat tail on the strength distribution), they may choose to publish in new or lower profile journals if necessary (leading to the exponential, non-preferential attachment distribution of the degree sequence). In Fig. 3(a), 36 microcommunities in the DASH network are found, and in most cases an inspection of the group memberships showed the members of each community were related (a full list is found in Supplementary Information S1). It is worth noting that using a Potts model approach to modularity maximization [14,32] (with resolution c~1) yields 32 distinct microcommunities, and the partitions generated by the two methods share much in common, suggesting the CUF results are reasonable. The hierarchical detection scheme shows that each of the microcommunities falls into 6 natural macrocommunities (see Fig. 3(a)). The two largest macrocommunities show a division between the Physical Sciences (physics, biology, chemistry, and geology) and the Mathematical Sciences (pure mathematics, economics, and computer science). Three additional macrocommunities consist of a combination of Philosophy and the History of Science, Linguistics, and Law, and a final macrocommunity having no obvious meaning on inspection (see Supplementary Information S1 for the member journals of each community). We note that this hierarchical partition is not easily detected using the Potts modularity maximization approach: even for c~0:02, there are still 23 microcommunities detected via modularity maximization. Thus, the partition into distinct scientific fields naturally arises from the coarse graining in our approach, but is difficult to detect using modularity methods alone. Further coarse graining shows that there is no additional hierarchical structure to be found in the DASH network. The average robustness of the nodes in each community of the DASH data is very heterogeneous (the multi-colored bars in Fig. 3(d)), which can be of use in determining which microcommunities are held together weakly, either because of the complex network topology involving the nodes in the community or due to an incorrect partitioning of the network. Many of the detected communities have few nodes, and are correspondingly less robust on average. Even some large communities have low average robustness, which could indicate an incorrect assignment or an unexpected network topology around a community. For example, Phys. Sci. 5 (PS5 in Fig. 3(d)) consists of 26 journals, with a very small average degree of robustness of r(1) PS5 ~2:8. The surprisingly low robustness of PS5 is not due to sparse connections between nodes within the community (the average degree of nodes in PS5, Skiin T~7:6), but is because of the fact that these journals are highly connected externally (Skout T~5:5). The robustness of a node’s assignment to its macrocommunity (the thin black bars in Fig. 3(d)) is not determined by how robustly assigned it is to its microcommunity. The average robustness r(2) c gives an indication of how strongly a microcommunity is attached to its macrocommunity, and we find that Philosophy/History 1 (PH1) is the most weakly assigned, with r(2) PH1 &0:12, despite the very robust assignment of the nodes in the microcommunity (r(1) PH1 ~9:8). Two journals in PH1 are very strongly connected to the Mathematical Sciences macrocommunity (so much weight is directed to Math. Sci. from PS1), while many journals in PH1 are more weakly connected to the journals in its own macrocommuPLoS ONE | www.plosone.org The Physical Review Citation Network Another real-world network where one may expect a hierarchical structure is that of a citation network (independent of their journal of publication), with an expectation of divisions between fields and sub-fields as was observed in the DASH network. We examine the citation network of articles published in the Physical Review journals [33,31], with articles as nodes and citations between articles as edges. Citations naturally form directed edges (a citation between i and j does not imply a citation between j and i), but to apply our methods we study the undirected (wij ~wji ) version. The degree distribution of this network has been previously shown to be log-normally distributed [31], which may indicate the underlying dynamics of the growth of the network. Network growth coupled with with preferential attachment produces a scale free degree distribution [30,7], but Redner [33] has noted that a modified, locally defined preferential attachment process explains the emergence of a log-normally distributed data. Rather than citing the most important papers, an author chooses to cite either a randomly chosen paper or one of the citations of that paper (with the latter likely to be highly cited [34]). The lognormal distribution is also observed in the highly-cited subset of the network considered (see below for further discussion), suggesting that this smaller sample is reasonably representative of the structure of the full network. Applying the CUF method to the Physical Review network detects four distinct hierarchies of community structure, ranging from the finest resolution of numerous small microcommunities to the coarsest resolution with two large macrocommunities (see Fig. 4(ac) for a schematic ranging from coarsest to finest). At the highest resolution, 266 communities are detected, and the partition has the modularity Q1 ~0:63 (at c~1). This is in reasonable agreement with a similar previously studied Phys. Rev. network [33] with 274 detected communities and a modularity of Q~0:54, suggesting that this fine resolution partition of the more current data is reasonable. High-modularity partitions are also detected using our coarse graining method, with the modularities Q2 ~0:75 for the 62 communities on the second level of the hierarchy and Q3 ~0:74 for the 11 communities at the third level (see Fig. 4(a-b)). The final level of coarse graining does not produce a very high modularity (with Q4 ~0:33) for two macrocommunities, but the meaning of the partition recognizable on inspection of the component communities for its distinction between earth-bound and cosmological research. At each level of hierarchy, the partitioning is both reasonable from a scientific perspective as well as generally producing a large modularity, suggesting that CUF approach is able to discern the natural partitions of the network without need for a resolution parameter. The distribution of the degrees of robustness found in the Physical Review network is shown in Fig. 4(d), along side the degree distribution of the nodes. As mentioned earlier, the degree distribution is well fit by a log-normal distribution [31] 2 P(ki ~k)*k{1 e{1:1½log (k){2 , with a fatter tail than exponential but vanishing faster than a power law. The distribution of node robustness D(j) i , which indicates how robustly the node i is assigned at the j th level of the hierarchy, decays much more rapidly for large D(j) i for all four of the hierarchical levels. At the finest resolution (blue squares in Fig. 4(d)), the degrees of robustness are well fit by {D=4:5 , and although the tail an exponential decay P(D(1) i ~D)*e 6 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship PLoS ONE | www.plosone.org 7 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship Figure 3. The network of journals from the DASH data. (a) Low weight edges (with 1ƒwij ƒ5) are shown in blue, while higher weight nodes (wij §6) are shown in red. Nodes are ordered in order of descending macrocommunity size, then descending microcommunity size, and finally in descending strength. The 36 microcommunities are denoted by the smaller black squares, while the 6 macrocommunities are shown in the larger thick black squares. Some microcommunities are labelled with their two most robust nodes (having largest D(1) i ). The degree distribution of the DASH data in (b) is exponential, while the distribution of node strengths in (c) appears to be log-normal. In (d), the average robustness of nodes in the (2) microcommunities (r(1) c , thick bars of varying color) and macrocommunities (rc , thin black bars) for the DASH data. In (d), the bar for Mathematical Sciences 2 (MS2) is cut off, having a very high average degree of robustness of r(1) MS2 ~39:8. doi:10.1371/journal.pone.0038704.g003 beyond D~20 (incorporating below 2.5% of the nodes) is slower than exponential, it remains faster than log-normal. The far more rapid decay of the degrees of robustness suggest that highly-cited papers have applications in a wide variety of fields (i.e. are have many out-of-community edges). The robustness of the nodes at the lower-resolution partitions are all similar to one another (triangles and stars in Fig. 4(d)), all satisfying an exponential initial decay of {D=2:8 over a somewhat shorter range. Each node P(D(j) i ~D)*e has roughly the same robustness on each level of the hierarchy, suggesting that an equal fraction of nodes are involved in forming the edges of the different levels of the hierarchies. trace a path of closest friendship that defines a high-resolution partition of the community, resulting in a method with (1) reasonable computational complexity in comparison to other methods [10], (2) easy detection of multiple levels of community structure without the need for an (unknown apriori) resolution parameter [17,13], and (3) a simple yet powerful method of measuring the robustness of the assignment of an individual node to its community. We must note that there are also limitations to our approach, including the free choice of a closeness measure, pathological network topologies (which, for example, necessitates the use of the CUF over the CF; see Supplementary Information S1), and the requirement that no community can be formed from only one node. Despite these possible limitations, the advantages of our approach in automatically detecting and evaluating hierarchical community structure are significant. Using the recently proposed Generalized Erdös Numbers [24] as a closeness measure (which performs better than other measures in benchmarks) we examined two real world systems where a hierarchical community structure is naturally expected: a Conclusions In this paper, we have described a new and intuitive method for detecting hierarchical community structure in complex networks that does not rely on free parameters or require advanced knowledge of the number or size of the communities. Given a method for measuring the ‘closeness’ between two nodes in a network, one can Figure 4. The hierarchical community structure of the Physical Review network. (a-c) shows a progressively coarsened view of the network, with the text labels of the communities composed of the most statistically significant words found in the titles of the articles in the communities. (a) shows the microcommunity structure of 148 nodes, with (b) a zoomed-out picture of the 625 nodes in one macrocommunity of the second level of the hierarchy, and (c) the full network (showing the final two levels of hierarchy). (d) shows the degree distribution as well as the distribution of node robustness at each level of the hierarchy (shown log-linear in the inset). Black circles show the degree distribution, which is log-normally distributed [31] (the best fit is the black line). The distribution of robustness on the micro-scale, D(1) i , is shown with the blue squares, while the distribution for the other hierarchical degrees of robustness D(j) i are all quite similar (shown with the up triangles, down triangles, and stars). The initial decay of the robustness is well-fit by an exponential in all cases (with the best fit for each shown as lines). doi:10.1371/journal.pone.0038704.g004 PLoS ONE | www.plosone.org 8 July 2012 | Volume 7 | Issue 7 | e38704 Discovering Communities through Friendship coauthorship network defined by the DASH data and a citation network generated from the Physical Review data. Our approach is able to detect a high-resolution partition of each dataset that is composed of well defined communities of variable size, and an inspection of the member nodes suggests that the partition is meaningful in both the DASH- and Phys. Rev. networks. Our coarse graining method of detecting hierarchy finds a reasonable macrocommunity partition for the DASH data (with each of the macrocommunities clearly linked upon inspection), with this coarse-grained partition not obviously detected using modularity maximization. By examining the degree of robustness of these communities on the micro- and macro-scale, we are able to rapidly home in on the most interdisciplinary communities (those with many significant connections to other communities). The Phys. Rev. citation network naturally partitions into four distinct hierarchies of communities (without any apriori assumption of the correct number of hierarchies), with the nodes in the communities generally related to each other upon inspection. The ability to find communities of arbitrary size, detect the structure of a natural (and system-defined) number of hierarchies, and locate particularly insular or interdisciplinary communities are all significant advantages of our method, and clearly displayed in the analysis of both the DASH and Phys. Rev. networks. Supporting Information Supplementary Information S1 (PDF) Acknowledgments We would like to thank Reinhard Engels for providing us with a easily processed copy of the DASH data, Levi Dudte for many useful conversations on the methods and paper, and the Wyss Institute for Biologically Inspired Engineering at Harvard. Author Contributions Conceived and designed the experiments: GM LM. Performed the experiments: GM. Analyzed the data: GM. Contributed reagents/ materials/analysis tools: GM LM. Wrote the paper: GM LM. References 1. Girvan M, Newman M (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99: 7821. 2. Bilke S, Peterson C (2001) Topological properties of citation and metabolic networks. Physical Review E 64: 36106. 3. Castellano C, Fortunato S, Loreto V (2009) Statistical physics of social dynamics. Reviews of modern physics 81: 591–646. 4. Barabási A, Jeong H, Néda Z, Ravasz E, Schubert A, et al. (2002) Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 311: 590–614. 5. Porter M, Mucha P, Newman M, Friend A (2007) Community structure in the united states house of representatives. Physica A: Statistical Mechanics and its Applications 386: 414–438. 6. Yan K, Fang G, Bhardwaj N, Alexander R, Gerstein M (2010) Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks. Science’s STKE 107: 9186. 7. Barabási A, Albert R (1999) Emergence of scaling in random networks. Science 286: 509. 8. Albert R, Jeong H, Barabasi A (2000) Error and attack tolerance of complex networks. Nature 406: 378–382. 9. Moore C, Newman M (2000) Epidemics and percolation in small-world networks. Physical Review E 61: 5678–5682. 10. Fortunato S (2010) Community detection in graphs. Physics Reports 486: 75– 174. 11. Newman M (2006) Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103: 8577. 12. Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Physical Review E 69: 26113. 13. Kumpula J, Saramäki J, Kaski K, Kertesz J (2007) Limited resolution in complex network community detection with potts model approach. The European Physical Journal B 56: 41–45. 14. Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Physical Review E 74: 16110. 15. Newman M (2004) Fast algorithm for detecting community structure in networks. Physical Review E 69: 066133. 16. Clauset A (2005) Finding local community structure in networks. Phys Rev E 72: 026132. 17. Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proceedings of the National Academy of Sciences 104: 36. PLoS ONE | www.plosone.org 18. Arenas A, Fernandez A, Gomez S (2008) Analysis of the structure of complex networks at different resolution levels. New Journal of Physics 10: 053039. 19. Wu F, Huberman B (2004) Finding communities in linear time: a physics approach. The European Physical Journal B-Condensed Matter and Complex Systems 38: 331–338. 20. Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11: 033015. 21. Karrer B, Newman M (2011) Stochastic blockmodels and community structure in networks. Phys Rev E 83: 016107. 22. Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PLoS One 6: e18961. 23. Liben-Nowell D, Kleinberg J (2007) The link prediction problem for social networks. Journal of the American Society for Information Science and Technology 58: 1019–1031. 24. Morrison G, Mahadevan L (2011) Asymmetric network connectivity using weighted harmonic averages. Europhys Lett 93: 40002. 25. Guimera R, Sales-Pardo M, Amaral LAN (2008) Modularity from fluctuations in random graphs and complex networks. Phys Rev E 70: 025101. 26. Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election. Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem. 27. Krebs V. Network data website. Available: http://www-personal.umich.edu/ ˜mejn/netdata/, maintained by M. E. J. Newman. Accessed 2012 Jun 1. 28. Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78: 46110. 29. Sales-Pardo M, Guirmera R, Moreira AA, Amaral LAN (2007) Extracting the hiearchical organization of complex systems. Proc Natl Acad Sci 104: 15224. 30. Barabasi AL, Albert R, Jeong H (1999) Mean-field theory for scale-free networks. Physica A 272: 173. 31. Redner S (2005) Citation statistics from 110 years of physical review. Physics Today 58: 49. 32. Csrdi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Systems : 1695. 33. Chen P, Redner S (2010) Community structure of the physical review citation network. J Infometrics 4: 278. 34. Feld SL (1991) Why your friends have more friends than you do. Amer J Sociol 96: 1464. 9 July 2012 | Volume 7 | Issue 7 | e38704 Supplementary Information Greg Morrison and L. Mahadevan September 30, 2012 1 Closeness Measures and Community Detection A schematic diagram of a network with easily-detected community structure is shown in Fig. 1(a). In this network, a pair of communities with |c| = N/2 nodes each is connected by exactly one edge (between α and β). For any reasonable measure of closeness, a node will feel closer to other nodes within its community rather than those in a different community, with the Generalized Erdös numbers (GENs), Jacard Coefficients (JCs), and overlap explicitly demonstrated as having this property. Resistance Distance, the mean first passage time between nodes, and the Adar / Adamic coefficient[1] all behave in a similar manner (not shown). There is a clear separation between in-community and outof-community closenesses for the network in Fig. 1(a), which can be used to determine the correct community structure. Each node i is constrained to be in the same community as their closest friend, f (i). This is similar in spirit to the resistance-distance approach of Wu and Huberman[2], but does not require an arbitrary threshold for defining communities. Each measure of closeness will behave differently when fuzzy communities are detected, with some outperforming others in the ability to detect communities (as discussed in the main text). We note that no nodes in a network can be in a community by themselves using this approach, since all connected nodes necessarily have a closest friend. It may be possible to remove this restriction by introducing self-loops into the network, but we leave this to later work. In Fig. 1(a), it is important to note that it is not possible to continuously tune an arbitrary parameter to find different partitions. Using modularity maximization with resolution parameter γ as an example, at γ = 1 we expect to detect the correct partition of two communities. For γ → 0, we expect to find only a single community, including all nodes in the network. No reasonable measure of closeness will ever produce this coarsest partition of Fig. 1(a), since it would require a node in c1 to feel closer to nodes it has fewer connections to than those it has many connections to. Regardless of the closeness measure chosen (and even if a tunable free parameter included in our the measure), a single community can not be detected using the CF approach so long as nodes feel closer to their neighbors than their 1 non-neighbors. The coarser partition of a single community is, however, readily detected using the hierarchical approach described in the text. In Fig. 1(b), we show a pathological network topology for which the CF method will fail: two distinct communities with each node connected to a single, central node (δ). Most measures of closeness will find all nodes feel closest to δ (and all reasonable measures will, so long as the intra-community edges are sufficiently sparse and the network sufficiently large), so the CF approach will assign all nodes to the same community as δ. In such a case, only a single community will be (incorrectly) detected. This can be avoided by searching for the closest unpopular friend: after sorting nodes into ascending order of how close node i feels to them, the closest node with degree less than or equal to the next-closest is selected as f (i). Note that for the closest unpopular friend algorithm on a weighted graph, we still search for lower degree ki rather than strength Wi , which avoids nodes that are connected to many other nodes (ki � 1) but not nodes that are strongly connected to a few nodes (Wi � 1). The node δ is assigned to the community its closest unpopular friend is in, which will depend on the details of the network topology and the choice of closeness measure. 2 Fractured Communities While the CF and CUF algorithms provides a intuitive method for detecting communities in an arbitrary graph, it is possible for a correct community to be fractured into two or more parts due to the local variability of the density of edges. As pictured in Fig. 2), an intended community A may be split into two groups, A1 and A2 due to the asymmetric connections each of these sub-communities has to the communities B and C. As pictured, there are either more edges leading from A1 to B than there are between A1 and C or the total weight between A1 and B is larger than between A1 and C. While useful information could be found in the structure of the fractured communities, is also desirable to recover the ‘correct’ communities despite these local variations. In order to produce a more useful method for community detection, we must supplement both of these approaches with an algorithm to merge fractured communities to better recover the ‘correct’ partition. The fracture of communities can be due to two aspects of the detection: first, the decisions are purely local (even if the closeness measure incorporates the global topology). Because the decisions are not made with a global quality function, the splitting of a community into two pieces is not penalized. Second, the random nature of the networks allows for variability in the local density of edges. These fluctuations in density will affect all community detection methods[3], and in some cases may call into question the ‘correctness’ of the intended partition. In Fig. 2, the detected groups A1 and A2 (which are in the same community in the 2 ‘correct’ partition) will likely have a large number of edges between them. If we imagine a single community is mistakenly broken into two sub-communities of the same size (n nodes apiece), the number of edges between the sub-communities should scale as n2 (with a uniform density of edges in the correct community A). This allows us to build a relatively simple greedy search for communities to merge. Once a CF or CUF partition has been determined, we perform a search for the pair of communities g and h with the largest value of kg→h /max(ng , nh )2 , where kg→h is the number of edges leading from group g to group h and ng is the number of nodes in group g. Before we merge the communities g and h, we check to ensure that kg→h ≥ min(kg→g , kg→h ). If the inequality is not satisfied (i.e. there are fewer edges between g and h are in either g or h alone) the greedy search is halted, otherwise g and h are merged and the search repeats. In Fig. 3, we use the GENs to compare the CF and CUF methods for with or without community merging (averaged over 100 realizations of the network). We see that the CUF approach with merging gives the best overall results, with the largest normalized mutual information[4, 5] (as defined in eq. 2 of the main text) for all values of k out , with only a moderate improvement over the CF approach. However, the CUF approach is more prone to community fracture (where the black circles do not converge to I = 1 as k out → 0), and greedy merging is therefore essential for reliable reconstruction of the network. We note that the merging of fractured communities as implemented here could also be used with modularity maximizing methods and may improve the spurious splitting of communities in some cases. As noted in Fig. 2, variability in the local density of edges can lead to fractured partitions. The propensity of modularity maximization for finding spurious sub-communities can perhaps be most clearly seen by considering a random network without any community structure. We generate networks with the probability of an edge between any nodes is pedge , with 0.04 ≤ pedge ≤ 1. In Fig. 4, we show the number of communities detected in these networks using both greedy modularity maximization (squares) and the CUF approach (circles). The CUF performs far better than the greedy modularity maximization for pedge � 0.1, while modularity maximization consistently finds more than one community for all pedge < 1. For small pedge , we expect fluctuations in the edges will produce a locally higher density of edges randomly, which may be detected using any community detection method[6]. However, as pedge increases, these fluctuations should be less significant, and the CUF that detects only a single community may be preferable. 3 A Common Hierarchical Benchmark It is natural to define a coarse grained network formed with the communities in the higherresolution partition acting as nodes in the new, lower-resolution network in order to detect 3 a hierarchy of community structure in the network. However, it is not immediately obvious how to choose the new edges in the new network, and rather than attempt to define coarsegrained edges at this resolution, we take the closeness between the coarse-grained nodes to be the average closeness between communities in the high-resolution partition. In the particular case of the GENs, the harmonic mean is the appropriate way to average the closeness, as the closeness between communities should be dominated by the nodes in each that feel close to one another (small Eij , thus more significant in the harmonic mean), rather than the nodes that do not feel close to one another (large Eij , less significant in the harmonic mean). For other closeness measures, it a linear mean may be the more reasonable choice for averaging the closeness between communities. It is worth re-emphasizing that we do not expect to be able to continuously tune the resolution of the coarse-grained network with a free parameter. Our ability to detect hierarchical structure of course depends on the accuracy of the higher resolution partition (with an inaccurate partition unlikely to accurately detect the correct macrocommunities), and the existence of a ‘correct’ hierarchical partition. If nodes in communities g and h are very close to one another in the original network, a reasonable method of averaging should ensure they are close in the coarse grained network. While the detected partition will weakly depend on the method of coarse graining, it is not possible to tune the averaging as it is in the case of modularity maximization (or other approaches), where choosing a resolution γ � 1 will assign all coarse-grained nodes to the same community. We apply our coarse graining approach to detect the community structure of the benchmark presented by Rechardt and Bornholdt [7] depicted in Fig. 5. The network of N = 512 nodes is composed of 16 microcommunities with on average k in = 16 edges internally per node. Four of these microcommunities form a macrocommunity, with on average k out edges per node within a macrocommunity and k mix edges per node between macrocommunities. Note that this is the benchmark that is modified in order to produce the benchmark of variable robustness as described in the main text. The mutual information between the correct and detected partitions of the micro-communities (using the CUF approach with the GENs as the closeness measure) is shown in Fig. 6(a) for varying k out and k mix . The microcommunities are detected accurately for small k out , with the transition from ‘good’ to ‘bad’ detection occurring for k out + k mix ≈ 34 (the point at which I = 0.5, averaged over the four curves shown), more than twice the value of k in = 16. It is worth noting that for the larger values of k out (12 or 14), often the failure to saturate to I = 1 at k mix = 0 is due to the fact that the method will fail to detect the microcommunity structure of 16 communities, but rather the macrocommunity structure of 4 macrocommunities. For sufficiently dense connections within the macrocommunities, the CUF method does fail to detect the finest resolution of the network. However, so long as the microcommunities are accurately detected, the macrocommunity structure is also correctly determined (as shown in Fig. 6(b)). For k out = 16, we generally fail to find the macrocommunity structure because of the poor detection of the fine-resolution structure, while for k out = 12 or 14, the 4 macrocommunity structure is not reliably found as k mix → 0. Modularity-based methods or other approaches may outperform these results[7, 8] if the correct (but a priori unknown) resolution parameter is chosen. However, our approach gives a single partition for each scale (both micro- and macro-), and performs very well so long as the micro-communities are not too fuzzy (k out is sufficiently small), without using an unknown parameter. 4 Common Real-World Community Benchmarks Modularity maximization performs quite well on the artificial GN benchmark precisely because of the modular structure inherent in the test: the correct solution was also the modularity maximizing one. This may not be the case in real world networks, where the ‘correct’ partition is determined from external information and is independent of the partition’s modularity. To see the utility of the CF or CUF methods, we examine three simple real-world benchmarks with an a priori known partition in the main text. The football network[9] is comprised of nodes representing american football teams, with edges denoting games played between them in 2000. The ‘correct’ partition groups each team within their externally-defined division. The political blogs network[10] is a set of blogs in the leadup to the 2006 US midterm election, with an edge representing a link from one blog to another (we use an undirected version of this network). The political books network[11] is a set of books purchased on amazon.com around the 2004 US presidential elections, with an edge representing a co-purchase of a pair of books. In the political blogs and books networks, the ‘correct’ partition is the node’s apparent political leaning: liberal vs. conservative in the former and liberal, independent, or conservative in the latter. All of these benchmarks are unweighted networks (with wij = 0 or 1). One common benchmark with a known community structure not mentioned in the main text is Zachary’s Karate club[12]. This is a very small network of 34 nodes representing members of a karate club at an unnamed university, with edges denoting the out-of-club interactions between individuals. The club split into two parts due to a disagreement over the club’s leadership, and the ‘correct’ partition denotes which individuals fell on a particular side of the disagreement. The karate club is partitioned using a number of approaches in Fig. 7, with from left to right modularity maximization, CF/CUF using the GENs, using the JCs, and using overlap. For the Karate club benchmark, we find surprisingly that both overlap and the GENs perform extremely poorly while the Jacard coefficients (JCs) perfectly reconstruct the correct partition. However, if the closest friend (CF) approach is used (rather than the closest unpopular friend approach, which avoids high degree nodes when assigning communities and is implemented throughout the main text), the GENs perfectly reconstruct the network, followed by overlap and then by the JCs. This illustrates that pathological networks do indeed exist that have not been fully accounted for in the CUF methodology, and it is difficult to predict exactly which method 5 will be optimal a priori. We also note that a CF partition can be generated rapidly when generating a CUF partition, and by examining a global quality function (such as modularity), one can easily distinguish which partition better represents the structure of the network. Thus, despite the unexpected behavior of our approach when considering the Karate Club network, we determine that (a) the GENs remain a reasonable choice for the closeness measure and (b) that it may be necessary to compare the results of the CUF approach to a global quality function (such as modularity) to determine if the partition is reasonable. 5 Simulated Annealing of the Benchmark In order to generate the network used in benchmarking the community detection and robustness measure in the main text, we used simulated annealing to produce a network with the desired properties. The desired in-, out-, and mixing-degree of each node were computed: kiin,0 , the desired number of edges from node i to nodes in its microcommunity, kiout,0 , the desired number of edges leading from i to any node in its macrocommunity (but not in the microcommunity) and kimix,0 , the number of edges leading from i out of its macro� community. From these the total number of edges M = 12 i (kiin,0 + kiout,0 + kimix,0 ) was determined, and a network of N = 512 nodes was generated having precisely M randomly distributed edges. The network was then randomly rewired, with a new trial configuration generated by removing one edge connecting the randomly chosen i and j, and a new edge being drawn between i and k. This trial configuration was accepted using a metropolis criterion: pacc = min(1, e−β(Eold −Etrial ) ), with the energy of a configuration � � 1� in,0 2 out,0 2 mix,0 2 in out mix E= (ki − ki ) + (ki − ki ) + (ki − ki ) (1) 2 i where the first term of E is minimized if the in-, out-, and mix-degrees of each node satisfy our desired conditions.The temperature parameter β is set to β = 1 initially, and incrementally increased by 2×10−5 at each attempted rewiring. A total of 500,000 rewiring attempts were made, with each edge on average experiencing ≈ 975 attempted rewirings. 6 How Ties are Handled Unlike many real world networks, the network in Fig. 1(a) is highly symmetric and the closeness between nodes in groups A or B is likewise symmetric so that there is not a unique closest friend. In this case, we must develop a rule for handling ties in the closeness. In the case of a tie, we randomly but consistently select the ‘closest’ neighbor of i, f (i). This 6 is accomplished by initially randomizing the node index, and choosing the node with the lowest (random) index as closest. In practice, the importance of ties in the artificial or real world networks networks depends on the choice of closeness measure. The Jacard coefficient JCij = |Ci ∩ Cj |/|Ci ∪ Cj | can easily produce ties[13] for complex networks, whereas the GENs require highly symmetric networks to see a tie. The lack of ties is an additional advantage of measures that incorporate the global topology of the network, rather than purely local information. 7 Details of the DASH robustness The DASH database, downloaded in June 2010 contained N0 = 918 journals and 2404 articles published by 3385 unique author names, not all of which work at Harvard. Because of the interdisciplinary and highly connected nature of the journals Science, Nature, and Proc. Natl. Acad. Sci, these three journals are removed from the network. This alteration does not alter the shape of either the degree or weight distributions (although the removal of edges does affect their particular fitting parameters). While briefly discussed in the text, it is worthwhile to examine the structure of the DASH network in detail, to determine the power of the degree of robustness in finding complex topologies or incorrectly assigned nodes. When we examine the degrees of robustness observed in the network, nodes with few edges connecting them to their community have a correspondingly low degree of robustness, reflecting the fact that they are only weakly (1) connected to their assigned community. Low values for the degree of robustness Di for these weakly connected nodes is unsurprising. We can use the degree of robustness to find nodes that are on the boundary between communities (i.e. that are strongly connected both to their assigned community as well as to a different community to which they are (1) not assigned). We find 142 nodes with Di ≤ 2, 53% of which have kiin ≤ 2 (indicating that they are simply of low degree, rather than on the boundary of a community). However, there are a few nodes that have D(1) ≤ 2 but are strongly connected to their respective communities (having high degree and weight directed into ci ). Due to their large values of kiin , these nodes are most likely on the boundary of their respective communities. The (i) (1) five journals with smallest D1 /ki with Di = 1 or 2 are shown in Table 1. Some of these in journals have a ki � ki (so many edges lead from i to different communities), while others have kiin ≈ ki (so most of the edges from i are within its assigned community). Examining the topology of the DASH network connected to these nodes that are boundarylike shows two distinct causes of high in-degree and low degree of robustness. Cognition, the second journal in Table 1 has more than twice as many out-edges as in-edges, but these out-edges are distributed amongst a wide range of communities. In Table 1, Cognition has the most weight (Wiin = 14) directed towards its community (Phys. Sci. 4, primarily 7 focused on Oceanography and Atmospheric Science), but has a large weight of 12 directed towards the Phys. Sci. 3 community (focused primarily on Psychology and Neuroscience, a more natural choice of community assignment for Cognition). It is likely that this node was incorrectly assigned, but the fact that the highest weight points towards Phys. Sci. 4 makes the misassignment understandable. The degree of robustness has allowed us to locate this possible error with ease, while the in-degree (kiin = 8), total degree, the ratio of in- to total degrees (kiin /ki = 0.32, and is the 17th worst of all journals), or the ratio of in- to total strengths (Wiin /Wi = 0.34, the 10th worst of all journals) would not highlight Cognition as a particularly troublesome node. The other journals in Table 1 all have a low degree of robustness for a different reason. For these, the largest number of edges point towards their assigned communities, and in all but one case (the Journal of Economic History) the largest weights are also pointed towards their respective communities. However, in each case the journal is connected to the ‘core’ of a different community: nodes in a different community with both high inweights or in-degrees and high robustness. While the assignment of each node in Table 1 to its respective community is often reasonable (since the majority of edges are within its assigned community), each of these nodes is also connected to one or more nodes that effectively define a neighboring community. These journals act as a bridge between the (generally less robust) communities to which they are assigned and the core of a robust, strongly connected community. It is also of interest to determine the quality of the assignment of each microcommunity to its macrocommunity. The thin black lines in Fig. 2 of the main text denote the macrocom(2) (2) munity robustness rc = �Di �i∈c of each assignment. We note that a robust microcom(1) munity (with high rc = �Di �i∈c ) does not necessarily imply a robust assignment to its macrocommunity, and that many well formed microcommunities have a very low value of (2) (2) rc . Table 2 shows that the lowest values of rc typically occur for communities that have relatively few out-edges (and thus their assignment to their macrocommunity is expected to be fragile). However, the assignment of the Philosophy and History 1 (PH1) microcommunity to its macrocommunity is surprising, as it has a very low ratio of in- to out-degree and in- to out-strength. While the placement of PH1 to the Philosophy and History macrocommunity may appear to be an error, the surprising assignment is due to the fact that 75% of the out-of-macrocommunity edges and 84% of the out-of-macrocommunity weight are due to only two journals: the strong connections that Social Studies of Science and Annual Review of Sociology have towards Mathematical Sciences 3 (also focused on the Social Sciences). There are three journals in PH1 that are connected to the Philosophy and History macrocommunity, Isis, Persepectives on Science, and Journal of the History of Ideas. Two of these journals are in the ‘core’ of PH 1 (with D(1) = 17), while only one of the journals strongly connected to Math. Sci. is in the core (with D(1) = 16). Thus, the assignment of PH1 to the Philosophy and History macrocommunity is due to 8 the fact that, while more weight is directed out of the assigned macrocommunity, the core journals of PH1 are more strongly connected to Philosophy and History journals. PH 1 (2) is clearly boundary-like, and our robustness measure of rc accurately detects this fragile assignment. 8 Additional details of the Phys. Rev. Network The Physical Review network included over 462,000 articles published in any Physical Review journal up to July 2010. Due to the size of the network , we consider only the subset of articles that have garnered at least 100 citations, with the largest connected component including 3651 articles and over 16,000 edges. While the network is unweighted (one citation is neither stronger nor weaker than another, thus wij = 0 or 1) and directed (article i cites article j, but not vice-versa), we consider the non-directed version (with wij = wji =0 or 1). The community structure at one resolution of the Phys. Rev. network up to 2007 has previously been determined[14]. The detected communities are similar in many respects to the community structure we have detected, although these other papers did not report an examine of any additional hierarchical structure, as we discuss in the main text. References [1] Adamic L, Adar E (2001) Friends and neighbors on the web. Social Net 25: 211-230. [2] Wu F, Huberman B (2004) Finding communities in linear time: a physics approach. The European Physical Journal B-Condensed Matter and Complex Systems 38: 331– 338. [3] Sun Y, Danila B, Josic K, Bassler KE (2009) Improved community structure detection using a modified fine-tuning strategy. Europhys Lett 86: 28004. [4] Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78: 46110. [5] Guimerà R, Sales-Pardo M, Amaral L (2007) Module identification in bipartite and directed networks. Physical Review E 76: 36102. [6] Guimera R, Sales-Pardo M, Amaral LAN (2008) Modularity from fluctuations in random graphs and complex networks. Phys Rev E 70: 025101. [7] Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Physical Review E 74: 16110. 9 [8] Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11: 033015. [9] Girvan M, Newman M (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America 99: 7821. [10] Adamic LA, Glance N (2005) The political blogosphere and the 2004 us election. Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem . [11] Krebs V. http://www-personal.umich.edu/ mejn/netdata/, maintained by M. E. J. Newman. [12] Zachary W (1977) An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33: 452–473. [13] Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466: 761–764. [14] Chen P, Redner S (2010) Community structure of the physical review citation network. J Infometrics 4: 278. 10 Tables Name J. Econ. Hist. Cognition Ecol. Appl. Oikos∗ Brit. Med. J. ∗ Community Math. Sci. 3 Phys. Sci. 4 Phys. Sci. 5 Phys. Sci. 5 Math. Sci. 6 (1) Di 2 1 1 1 2 kiin 16 8 8 8 14 ki 24 25 10 10 15 Wiin 71 14 18 18 29 Wi 157 41 22 22 32 Oikos is an ecology journal published by the Nordic Ecol. Soc., and has exactly the same connections as Ecol. Appl. (1) Table 1: The five most boundary-like nodes (with the lowest non-zero values of Di /kiin ). The first, J. Econ. Hist., has a high degree and strength and large k in and k out . Similarly, Cognition has the smallest ratio of in-edges to total node degree, and is connected to a large number of other communities. The last three elements in the table are surprising in that they have a few connections outside of their communities (kiout = 1 or 2 compared to k in = 8 or 10) but still have low degrees of robustness. This is because while they have many in-community connections, their few out-of-community connections lead to strong, central nodes in other communities. These boundary-like nodes would not be easily detected by simply looking at the in-degrees or in-out ratio. 11 Community Phil. & Hist. 1 Phys. Sci. 13 Math. Sci. 10 Math. Sci. 11 Phys. Sci. 9 Focus History Info. Theory Drug Addiction Crystallography High En. Physics (2) rc 0.12 0.2 0.25 0.33 0.38 kcin Wcin kcout Wcout 3 1 1 1 3 7 1 1 1 6 28 0 0 0 0 45 0 0 0 0 Table 2: The five least robust macrocommunity assignments. kcin and Wcin denote the total number of edges and total weight from the microcommunity to other microcommunities in its macrocommunity respectively, while kcout and Wcout denote the total number and weight of edges into any other macrocommunity. Philosophy and History 1 (PH 1) is the worst, and lies on the boundary of the Philosophy and History macrocommunity and the Mathematical Sciences macrocommunity. The other macrocommunity assignments are very fragile do to the very small number of connections, and are peripheral microcommunities. Figure Captions 12 (a) in E ~ |c| α β 1/2 E in J ~ 1 in c1 c2 O ~ |c| J out out out O ~ |c| 5/4 (b) Central Node δ ~ 0 ~ 0 c1 Connected to all other nodes c2 Stronger Strongest A (fractured) Figure 1: Detecting communities with the CF and CUF methods. (a) An example of two clearly defined communities (c1 and c2 ), each of size N/2 with exactly one edge connecting them. Any plausible measure of closeness based on the network topology will clearly distinguish between intra- and inter-community connections. The closeness between nodes within the community as measured by the GENs (E in ), JCs (J in ), and overlap (Oin ), with |c| the number of nodes in each community. Likewise, the closeness between nodes in different communities is shown with the superscript ‘out’. (b) A schematic network of a single highly connected node (δ) to which all nodes in the network will feel closest. Assigning each node to the same community as their closest friend (the CF approach) will assign all nodes to the same community as δ, thus detecting only one community. By avoiding high-degree nodes (the CUF approach), the two communities are correctly detected, with δ assigned to one or the other. A1 B ak we we ak A2 Stronger C Figure 2: Merging of fractured communities. Community A is fractured into two communities, A1 and A2 due to the fact that A1 is more strongly connected to B (connections labelled ‘stronger’) than to C (connections labelled ‘weak’), while community A2 is more strongly connected to C than B. In this coarse-grained schematic, ‘stronger’ may represent either high weight or many edges between them. Because A1 and A2 are truly subsets of the same community in the ‘correct’ partition, we expect a large number of edges between them. 13 Mutual Information 1 0.75 0.5 CUF, merged CF, merged 0.25 CUF, unmerged CF, unmerged 0 2 4 6 k 8 10 out Figure 3: Improvements in the methods using fracture merging. A comparison of the approaches for community detection using the GENs, using the Newman-Girvan benchmark. Red squares denote the CUF after community merging, which gives the best overall results. Black circles denotes the result of the CUF without merging, and has a low mutual information to the expected partition due to fracture (even for clear communities, with low k mix ). The blue up and purple down triangles are the results for the CF algorithm with and without fracture correction, respectively. log2( <n c>) 23 22 Greedy-Q 21 20 0 CUF 0.25 0.5 0.75 1 pedge Figure 4: Community detection in unstructured networks. The number of communities nc detected using greedy modularity maximization (up and down triangles) or the CUF method (squares and circles) for a randomly linked network (with no intended community structure) as a function of the probability of an edge between two nodes, pedge . Greedy modularity maximization is shown in the purple down triangles for N = 100 nodes and black up triangles for N = 200 nodes, while the blue circles shows CUF detection for N = 100 and red squares for CUF with N = 200. When there is no intended structure in the network, modularity maximization tends to find a relatively large number of communities, while the CUF method typically finds only one community (for sufficiently large pedge � 0.1. 14 in k out = 14 k = 16 k mix = 16 Figure 5: The adjacency matrix of the Reichardt-Bornholdt hierarchical benchmark. Each node is a member of a micro-community of 32 nodes, with k in = 16 connections to the other nodes in its micro-community on average. Each micro-community is a member of one of four macro-communities, and each node in a macro-community has k out edges internally on average. Each node has on average k mix edges to nodes outside of their macro-community. Micro-scale detection (a) 1.0 Mutual Information 1.0 Mutual Information Macro-scale detection (b) 0.8 in k =16 for all 0.6 out k =8 out k = 10 out k = 12 out k = 14 0.4 0.2 0.0 0 5 0.8 0.6 0.4 0.2 0.0 10 k 15 mix 20 25 30 0 5 10 15 mix 20 25 30 k Figure 6: Accuracy of the hierarchical benchmark. The detection of (a) micro- and (b) macro-communities averaged over 100 realizations of the network. For all samples, k in = 16 is held fixed. k out is varied as k out = 8 (blue circles), 10 (red squares), 12 (black up triangles) and 14 (purple down triangles). The mixing between macrocommunities is varied with 2 ≤ k mix ≤ 30. The CUF approach accurately detects the microcommunities over a wide range of values of k out and k mix , and is clearly able to accurately detect the microcommunity structure for sufficiently clear communities. So long as the microcommunity structure is accurately detected, the macrocommunity structure seems reliably determined as well. 15 (a) Modularity GENS Jacard Overlap Jacard Overlap 1 CF Mutual Information 0.75 Gr ee Po dy tts CUF 0.5 0.25 0 Modularity (b) 0.4 0.3 0.2 0.1 0 Modularity GENS Figure 7: The karate club network. Mutual information (a) and modularity (b) of the partitions of the Karate club[12] detected by a variety of approaches, with the a priori correct partition known. The leftmost results show the results of partitions using the greedy (striped red) and Potts model (striped blue) modularity maximizing partitions. For the remainder, red denotes the closest friend (CF) while blue denotes the closest unpopular friend (CUF) approach, with the GENs, JCs, and overlap shown. Surprisingly, the GENs implemented using the CUF method performs the worst in all respect (in contrast to most other benchmarks where it performs the best). For the Karate club network, the GENs do reconstruct the exact ‘correct’ partition if the CF method is used. 16 Community Membership of the DASH Network Greg Morrison and L. Mahadevan May 29, 2012 Natural Sciences, Community 1 Accounts of Chemical Research ACS Nano Acta Materialia Advanced Materials Animal Behavior Annals of Applied Statistics Annals of Statistics Applied and Environmental Microbiol... Applied Physics A Applied Physics Letters Behavioral Ecology and Sociobiology BMC Genomics Chemistry and Biology Defect and Diffusion Forum Environmental Science and Technolog... IEEE Photonics Technology Letters IEEE Journal of Quantum Electronics IEEE Journal of Selected Topics in ... International Journal of Primatolog... Journal of Applied Physics Journal of Archeological Science Journal of Bacteriology Journal of Chemical Physics Journal of Computer Aided Materials... Journal of Crystal Growth Journal of Dramatic Theory and Crit... Journal of Materials Science Journal of Microelectromechanical S... Journal of Physical Chemistry C Journal of Physics: Condensed Matte... Journal of the American Chemical So... Journal of The Electrochemical Soci... Journal of Vacuum Science and Techn... Journal of Vacuum Science & Techno... Lab on a Chip Materials Research Society Symposia... Materials Science and Engineering Materials Science and Engineering A Materials Science in Semiconductor ... Metallurgical and Materials Transac... Microbiology Modern Drama Molecular and Cellular Biology Molecular Biology and Evolution MRS Bulletin Nano Letters Nanoscale Research Letters Nanotechnology Nature Biotechnology Nature Methods Nature Physics New Journal of Physics Nuclear Instruments and Methods in ... Nuclear Instruments and Methods in ... Nucleic Acids Research Optics and Photonics News Optics Express Optics Letters Physica C Physical Review A Physical Review B Physical Review D Physical Review Letters Plant Cell and Environment Proceedings of SPIE Progress in Biophysics and Molecula... Social Text The Open Inorganic Chemistry Journa... Trends in Ecology and Evolution 1 Natural Sciences, Community 2 ACM SIGCSE Bulletin American Journal of Botany American Journal of Science Annual Review of Earth and Planetar... Annual Review of Ecology, Evolution... Annual Review of Microbiology Astrobiology Canadian Journal of Earth Sciences Canadian Journal of Forest Research... Chemical Geology Earth and Planetary Science Letters Elements Geobiology Geochemistry Geophysics and Geosyst... Geochimica et Cosmochimica Acta Geological Magazine Geological Society of America Bulle... Geology International Journal of Plant Scie... Harvard Papers in Botany Icarus International Workshop on Wearable ... Journal of Geophysical Research - P... Journal of Mathematical Biology Journal of Paleontology Journal of Petrology Journal of Plant Growth Regulation Lethaia Nature Reviews Neuroscience New Phytologist Oceanography Organic Geochemistry Origins of Life Origins of Life and Evolution of th... Palaios Paleobiology Philosophical Transactions of the R... Physical Review Physics Today Phytochemistry Plant Physiology Plos One Precambrian Research Proceedings of the American Philoso... Review of Scientific Instruments Sedimentary Geology Taxon The Sciences Natural Sciences, Community 3 American Journal of Psychiatry Archives of Neurology Biological Psychology California Law Review Cognitive Brain Research Cognitive Neuropsychology Current Directions in Psychological... Developmental Biology European Review of Social Psycholog... IEEE Transactions on Information Te... Journal of Adult Development Journal of Cognitive Neuroscience Journal of Consulting and Clinical ... Journal of Experimental Psychology:... Journal of Experimental Psychology ... Journal of General Internal Medicin... Journal of Mathematical Psychology Journal of Neurophysiology Journal of Neuroscience Journal of Personality and Social P... Journal of Physiology - Paris Journal of the American Academy of ... Journals of Gerontology Series B Language and Cognitive Processes Memory & Cognition Mind, Brain, and Education Molecular Psychiatry New Ideas in Psychology Personality and Individual Differen... Personality and Social Psychology B... Perspectives on Psychological Scien... Psychiatry Research Psychological Science Psychonomic Bulletin & Review Psychology and Aging Psychoneuroendocrinology Public Opinion Quarterly Review of General Psychology Self and Identity Small Group Research Social Cognitive and Affective Neur... Trends in Cognitive Sciences Visual Cognition 2 Natural Sciences, Community 4 Aerosol Science and Technology Agricultural and Forest Meteorology Atmospheric Chemistry and Physics Atmospheric Environment Child Development Cladistics Climate of the Past Cognition Cognitive Psychology Computers and Geosciences Cortex Current Biology Deep Sea Research Part A. Oceanogra... Dynamics of Atmospheres and Oceans Earth Science Reviews ECS Transactions Europhysics Letters General Psychologist Geophysical Research Letters Global Biogeochemical Cycles Global Biogeochemical Sciences Global Change Biology Intercultural Pragmatics Journal de Physique IV Journal of Climate Journal of Fluid Mechanics Journal of Geophysical Research Journal of Geophysical Research -Al... Journal of Marine Research Journal of Physical Oceanography Journal of the Atmospheric Sciences Molecular Phyogenetics and Evolutio... Monthly Weather Review- Usa Philosophy and Literature Nature Geoscience Paleoceanography Quarterly Journal of the Royal Mete... Quaternary Science Reviews Statistics in Medicine Systematic Biology The Mental Lexicon Theoretical and Applied Climatology Natural Sciences, Community 5 Bioinformatics Biology Letters BMC Biochemistry BMC Biology BMC Ecology Breviora Canadian Journal of Zoology Ecological Applications Ecology Ecology Letters Evolution Frontiers in Ecology and the Enviro... Global Ecology and Biogeography Herpetologica Journal of Evolutionary Biology Journal of Experimental Botany Journal of Theoretical Biology Journal of Vertebrate Paleontology Malaria Journal Methods in Ecology and Evolution Molecular Ecology Oikos Physical Review Series e Plos Computational Biology Proceedings of the Royal Society B Theoretical Population Biology Natural Sciences, Community 6 American Journal of Human Biology American Journal of Physical Anthro... Cancer Causes and Control Cancer Epidemiology Biomarkers and ... Cancer Research Early Human Development European Journal of Cancer Preventi... European Journal of Cancer Suppleme... Evolution and Development Evolutionary Biology Fertility and Sterility Hormones and Behavior Human Reproduction Integrative and Comparative Biology International Journal of Andrology Journal of Anatomy Journal of Experimental Biology Journal of Human Evoluion Journal of Morphology Medicine and Science in Sports and ... PaleoAnthropology Schizophrenia Research Sports Medicine The Anatomical Record 3 Natural Sciences, Community 7 Applied and Preventive Psychology Behaviour Research and Therapy Biological Psychiatry Consciousness and Cognition Emotion Journal of Anxiety Disorders Journal of Consumer Research Journal of Experimental Social Psyc... Journal of Family Psychology Journal of Personality Disorders Journal of Psychiatric Research Memory Neuroimage Neuropsychologia Psychological Medicine Psychopharmacology Psychophysiology Suicide and Life-Threatening Behavi... The American Journal of Psychiatry Natural Sciences, Community 8 American Naturalist Auk Bioscience BMC Evolutionary Biology Genetical Research Genetics Genome Biology International Journal of Plant Scie... Journal of Experimental Zoology Par... Philosophy of Science -East Lansing... Plant Cell Plant Methods Quarterly Review of Biology Trends in Genetics Yeast Natural Sciences, Community 9 Advances in Theoretical and Mathema... Annals of Physics Annual Review of Nuclear and Partic... Classical and Quantum Gravity Fortschritte der Physik General Relativity and Gravitation Journal of High Energy Physics Nuclear Physics B Natural Sciences, Community 10 ACM Transactions on Sensor Networks Annual Review of Neuroscience Cerebral Cortex -New York- Oxford U... Experimental Brain Research Neuron PLoS Biology The Journal of Neuroscience Natural Sciences, Community 11 ACM SIGPLAN Notices Annual Symposium on Principles of P... International Conference on Functio... Journal of Functional Programming Proceedings of the 25th ACM SIGPLAN... Proceedings of the 26th ACM SIGPLAN... Proceedings of the ACM SIGPLAN 1996... Natural Sciences, Community 12 Antiquity Archaeological Papers of the Americ... Asian Perspectives Backdirt: Annual Review of the Cots... Current Anthropology Symbols Natural Sciences, Community 13 IEEE Journal of Selected Areas in C... IEEE Signal Processing Magazine IEEE Transactions on Signal Process... IEEE Transactions on Information Th... IEEE Transactions on Wireless Commu... Natural Sciences, Community 14 Cell European Journal of Biochemistry Nature Structural and Molecular Bio... Molecular Biology of the Cell Structure 4 Mathematical Sciences, Community 1 4th Multidisciplinary Workshop on A... AAAI Fall Symposium on Negotiation ... AAAI Spring Symposium on Empirical ... American Economic Journal: Microeco... American Economic Review American Journal of Computational L... American Sociological Review Annual Review of Economics Applied Economics Research Bulletin Brookings Papers on Economic Activi... Artificial Intelligence Autonomous Agents and Multi-Agent S... Canadian Journal of Economics Carnegie-Rochester Conference Serie... China Economic Review Cognitive Systems Research Computational Management Science Computers & Operations Research Contributions in Macroeconomics Decision Support Systems Econometrica Economia mexicana Economic Journal Economic Policy Economics of Transition European Economic Review European Finance Review Explorations in Economic History Foreign Affairs Games and Economic Behavior Handbook of Macroeconomics Health Services Research IEEE Infocom IEEE Intelligent Systems Information Fusion International Journal of Game Theor... Journal of Artificial Intelligence ... Journal of Business and Economic St... Journal of Business -Chicago- Journal of Economic Dynamics and Co... Journal of Economic Literature Journal of Economic Perspectives Journal of Economic Theory Journal of Empirical Finance Journal of Epidemiology and Communi... Journal of Finance Journal of Financial Economics Journal of Health Economics Journal of International Economics Journal of Law and Economics -Chica... Journal of Mathematical Economics Journal of Monetary Economics Journal of Money, Credit and Bankin... Journal of Policy Modeling Journal of Political Economy Journal of Public Economics Journal of Public Policy and Market... Journal of the European Economic As... Journal of Urban Economics Management Science National tax journal National Tax Journal NBER Macroeconomics Annual Nber Working Paper Series NBER Working Paper Series New York Review of Books Operating Systems Review Proceedings 35th International Conf... Proceedings of Autonomous Agents an... Proceedings of the 2006 AAAI Spring... Proceedings of the 21st Annual Mee... Proceedings of the Conference on Ap... Proceedings of the DARPA Workshop o... Proceedings of the Eighteenth Inter... Proceedings of the Eighth National ... Proceedings of the First Internatio... Proceedings of the First Joint Conf... Proceedings of the Multi-Agent Sequ... Proceedings of the Nineteenth Annua... Proceedings of the Sixth Internatio... Proceedings of the Workshop on Theo... Public Choice Quarterly Journal of Economics Rand Journal of Economics Rationality and Society Review of Economic Dynamics Review of Economics and Statistics Tax Policy and the Economy Review of Economic Studies Review of Financial Studies The B.E. Journal of Macroeconomics The Economic Journal The Lancet Theoretical Economics University of Chicago Law Review WIDER Research Paper 5 Mathematical Sciences, Community 2 ACM Transactions on Graphics Cartographica Communications of the Association f... Computational Intelligence Computational Linguistics Computing Surveys Electronic Commerce Research and Ap... Formal Grammar Conferences Future Generation Computer Systems IEEE Computer Graphics and Applicat... IEEE Transactions on Visualization ... Information Technology Journal of Biomedical Informatics Journal of Heuristics Journal of Linguistics Journal of Logic Programming Journal of Optimization Theory and ... Journal of Psycholinguistic Researc... Library Quarterly Linguistics and Philosophy Natural Language Engineering Nos Proceedings of ACL-08: HLT Proceedings of Algorithms and Exper... Proceedings of Graph Drawing Proceedings of SIGGRAPH Proceedings of the 10th Annual Symp... Proceedings of the 10th Internation... Proceedings of the 11th Conference ... Proceedings of the 19th Internation... Proceedings of the 2003 Internation... Proceedings of the 2005 ACL Worksho... Proceedings of the 7th Conference o... Proceedings of the Eighteenth Inter... Proceedings of the Eighth Internati... Proceedings of the Eighth Internati... Proceedings of the Fifth SIGdial Wo... Proceedings of the First Workshop o... Proceedings of the Human Language T... Proceedings of the ICAPS-05 Worksho... Proceedings of the Intelligent User... Proceedings of the Ninth Internatio... Proceedings of the Second TAG Works... Proceedings of the Seventh Internat... Proceedings of the Seventh Internat... Proceedings of the Sixth Internatio... Proceedings of the Thirteenth Annua... Proceedings of the Twenty-First Nat... Proceedings of the Workshop on Synt... Proceedings of UIST Transactions on Graphics Transactions on Systems, Man and Cy... Mathematical Sciences, Community 3 Agricultural History American Journal of Political Scien... American Political Science Review Annals of the American Academy of P... Annual Review of Psychology BioSocieties British Journal of Political Scienc... Dissent D-Lib Magazine Economic Development and Cultural C... Economics & Politics (Oxford, Engl... Educational Policy Genewatch IMF Staff Papers Indiana Journal of Global Legal Stu... International Organization International Studies Perspectives Journal of Conflict Resolution Journal of Economic History Journal of Labor Economics Journal of Legal Studies Journal of Policy History Journal of Politics Journal of Social Issues Journal of Statistical Software Medical Anthropology Negotiation Journal Perspectives on Politics PLoS Medicine Political Analysis Population Health Metrics PS: Political Science and Politics Research in Higher Education Social Justice Research Social Research Social Science and Medicine Social Science History Social Science Research Sociological Methods and Research Statistical Science Studies in American Political Devel... The American Sociologist The Annals of the American Academy ... The Good Society World Politics The Political Quarterly Yale Journal of International Law 6 Mathematical Sciences, Community 4 Acta Mathematica -Stockholm- Advances in Mathematics American Journal of Mathematics Annales Academiae Scientiarum Fenni... Annales Scientifiques- Ecole Normal... Annals of Mathematics Applied and Computational Harmonic ... Biological Cybernetics Bulletin- American Mathematical Soc... Commentarii Mathematici Helvetici Communications on Pure and Applied ... Discrete and Computational Geometry Documenta Mathematica Duke Mathematical Journal Electronic Journal of Combinatorics Experimental mathematics Foundations and Trends in Computer ... Geometric and Functional Analysis Harvard College Mathematics Review IEEE Transactions on Biomedical Eng... IEEE Transactions on Pattern Analys... International Journal of Computer V... Inventiones mathematicae Journal- American Mathematical Soci... Journal fur die Reine und Angewandt... Journal of Algebra Journal of Number Theory Journal of the European Mathematica... Journal of the Optical Society of A... Journal of Topology Manuscripta Mathematica Mathematical Research Letters Mathematische Annalen Nagoya Mathematical Journal New York Journal of Mathematics Pacific Journal of Mathematics Periodica Mathematica Hungarica Proceedings- American Mathematical ... Proceedings of the Thirty-Fifth Ann... Publications Mathematiques de l’Ins... The Harvard College Mathematics Rev... Topology Transactions- American Mathematical... Vision Research -Oxford- Mathematical Sciences, Community 5 ACM International Conference Procee... Annual Symposium on Foundations of ... Computational Complexity Computer Aided Geometric Design Computer Graphics and Applications Computer Graphics Forum Computers and Graphics Eurographics/SIGGRAPH symposium on ... Information and Computation International Journal of Image and ... International Mathematics Research ... Journal of Computer & System Scien... Lecture Notes in Computer Science MM: Proceedings of the seventh ACM ... Proceedings of the 15th Internation... Proceedings of the Annual ACM Sympo... Proceedings of the Canadian Confere... SIAM Journal on Computing The Visual Computer: International ... Mathematical Sciences, Community 6 Argumentation ARL: A Bimonthly Report BMJ: British Medical Journal College & Research Libraries News Earlhamite Infection Control and Hospital Epid... Journal of Biology Journal of Law and Education Journal of Speculative Philosophy Legal Writing: The Journal of the L... New England Journal of Medicine Newsletter on Teaching Philosophy Philosophy and Rhetoric SPARC Open Access Newsletter St. John’s Review Mathematical Sciences, Community 7 Administrative Science Quarterly American Journal of Sociology American Psychologist Crime and Justice Du Bois Review Journal of Organizational Behavior Research Evaluation Science, Technology and Human Value... Social Service Review Mathematical Sciences, Community 8 American Statistician Biometrika Journal – American Statistical Ass... Journal of Econometrics Journal of Educational Psychology NBER Technical Working Paper Psychological Methods Working paper series (National Bure... 7 Mathematical Sciences, Community 9 Computer Architecture News Proceedings of the 2005 INFOCOM 24t... Systems Administration Conference www.eecs.harvard.edu/ margo/papers/... Mathematical Sciences, Community 10 Addictive Behaviors American Journal of Drug and Alcoho... Behavior Research Methods Learning and Motivation Mathematical Sciences, Community 11 Acta Crystallographica Section C: C... Chemistry & Biology Tetrahedron Letters History / Philosophy, Community 1 American Historical Review American Scientist Annals of Science Annual Review of Sociology Architectural History British Journal for the History of ... Bulletin of the History of Medicine Central European History Common Knowledge Configurations Contemporary European History Historical Journal History of Science International Migration Review Isis Journal of British Studies Journal of the History of Biology Journal of the History of Ideas Journal of Visual Culture Modern Language Notes Oxford Review of Education Past and Present Perspectives on Science Public Understanding of Science Science in Context Science Studies Shakespeare Survey Social Forces Social Studies of Science The British Journal for the History... The British Medical Journal The International Migration Review Transactions of the Institute of Br... Transactions of the Royal Historica... History / Philosophy, Community 2 Australasian Journal of Philosophy Biological Theory Bulletin of Symbolic Logic European Journal of Philosophy Journal of Aesthetics and Art Criti... Journal of Symbolic Logic Philosophical Quarterly Nous Philosophers’ Imprint Philosophical Review Philosophical Studies Philosophical Topics Philosophy and Phenomenological Res... Proceedings of the Aristotelian Soc... Southern Journal of Philosophy The Cambridge Companion to the Phil... Theoria : Revista de Teoria, Histor... History / Philosophy, Community 3 Contemporary Readings in Law and So... Economics and Philosophy Ethics Harvard Divinity Bulletin Journal of Philosophy Pacific Philosophical Quarterly Philosophical Perspectives Philosophy and Public Affairs Proceedings and Addresses of the Am... Ratio Social Philosophy and Policy The Constitution of Agency The Quality of Life The Tanner Lectures on Human Values The Monist Women, Culture, and Development: A ... 8 History / Philosophy, Community 4 American Literary History American Literary Scholarship American literature Critical Inquiry Prospects Linguistics, Community 1 riu Acta Poetica Amsterdam Studies in the Theory and... Annual of Armenian Linguistics Baltistica Brain Research Die Sprache Euskalingua Harvard Ukrainian Studies Harvard Working Papers in Linguisti... Heritage Language Journal Historische Sprachforschung Indo-European Studies Innsbrucker Beitrge zur Spr... Journal of Cuneiform Studies Journal of Indo-European Studies Journal of the American Oriental So... Journal of the Cork Historical and ... Language Language and Linguistics Compass Language Research Lingua Linguistic Variation Yearbook Mnchener Studien zur Sprach... MIT Working Papers in Linguistics Natural Language and Linguistic The... Oceanic Linguistics Proceedings of the North East Lingu... Syntax The Crane Bag Tocharian and Indo-European Studies Transactions of the Philological So... Linguistics, Community 2 Brain and Language Journal of East Asian Linguistics Journal of Memory and Language Linguistic Inquiry Synthese Miscellaneous, Community 1 Book History Contributions to the History of Con... European Review French Historical Studies Journal of Modern History Modern Intellectual History Pmla Princeton University Library Chroni... Proceedings of the British Academy Representations Miscellaneous, Community 2 American Journal of Public Health Annals of Internal Medicine Daedalus International Anesthesiology Clinic... Journal of Social Work and Human Se... Journal of the American Medical Ass... Journal of the American Medical Wom... Milbank Quarterly The Hastings Center Report Miscellaneous, Community 3 American Music Black Music Research Journal Early Music History Journal of Musicology Journal of the Society for American... Musical Quarterly Tempo Law, Community 1 Emory Law Journal Georgetown Law Journal Harvard Civil Rights-Civil Libertie... Journal of Legal Analysis Loyola of Los Angeles Law Review Maryland Law Review Michigan Law Review New York University Law Review Roger Williams University Law Revie... Southern California Law Review 9 Law, Community 2 Arizona Law Review Global Policy Harvard International Law Journal Harvard Journal on Legislation Harvard BlackLetter Journal Harvard Law Review Harvard Women’s Law Journal Lewis and Clark Law Review Minnesota Law Review 10