• Bookmark at
  • Bookmark "Sampling Online Social Networks" at Google
  • Bookmark "Sampling Online Social Networks" at Facebook
  • Bookmark "Sampling Online Social Networks" at del.icio.us
  • Bookmark "Sampling Online Social Networks" at Reddit
  • Bookmark "Sampling Online Social Networks" at Digg
  • Bookmark "Sampling Online Social Networks" at StumbleUpon
  • Bookmark "Sampling Online Social Networks" at Twitter
  • Bookmark "Sampling Online Social Networks" at Yahoo! Bookmarks
  • Bookmark "Sampling Online Social Networks" at Live Bookmarks
  • Bookmark "Sampling Online Social Networks" at Favorites

Sampling Online Social Networks

People

Funding

This work is currently supported by the following grants:

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.

Available Datasets

Presentations

Publications

Datasets

The conditions of use for the released datasets are:

  • the corresponding paper is cited
  • no further re-distribution without our permission. 

Facebook Geosocialmap

In our Coarse-Grained Topology Estimation paper we developed estimators that take as input a probability sample of nodes from an original graph and produce a category-to-category graph. In the original graph each node/user has declared a category i.e. a node/user can declare that she belongs to the “UC Irvine” Facebook network. In the category-to-category graph, each node corresponds to a category i.e. two nodes can be “UC Irvine” and “UC San Diego”. Additionally, each edge in the category-to-category graph reflects the strength between two category members in the original graph i.e. the weighted edge between “UC Irvine” and “UC San Diego” is interpreted as the probability that a random user from “UC Irvine” is a friend of a random user from “UC San Diego”.

As a practical illustration of our approach, we applied our methodology to our previously collected datasets Facebook Social Graph and Facebook Weighted Random Walks. We estimated several Facebook category graphs and visualized them at Geosocialmap. Here we make available (i) the mapping of anonymized networkIDs to Facebook network names in our released Facebook Social Graph dataset (ii) all the estimated category-to-category graphs.

University Category Graph

This category graph has been estimated from the Facebook Weighted Random Walks dataset. Its categories are the top 133 US national universities according to the “US News World Report ’09”.

Nodes
Category nodes are contained in the file “univ_nodes_2010.csv”. Each row in this file describes a category node and related category features. The structure of each row is as follows:

<Node ID> | <Node Name> | <Longitude> |<Latitude> | <FB Network Name> | <US State> | <Tier> | <Rank> | <Score> | <Type> | <Year Founded> | <Religion> | <Calendar> | <# Students> | <Setting> | <Acceptance Rate> | <Tuition Cost>

An example of a category node is the following:
16777277|University of California–Irvine|-117.8426417|33.64535|UC Irvine|CA|1|44|58|Public|1965|None|quarter|21696|suburban|0.556|7556

Edges
Category edges are contained in the file “univ_edges_2010.csv”. Each row in this file contains a category edge and has the following structure:

<Edge ID> | <Node ID> | <Node ID>| <Edge Weight>

Country Category Graph

This category graph has been estimated from the Facebook Social Graph dataset. Its categories are world countries that Facebook users could join as a regional network in 2009.

Nodes
Category nodes are contained in the file “country_nodes_2009.csv”. Each row in this file describes a category node and has the following structure:

<Node ID> | <Node Name> | <Longitude> |<Latitude>

Edges
Category edges are contained in the file “country_edges_2009.csv”. Each row in this file contains a category edge and has the following structure:

<Edge ID> | <Node ID> | <Node ID>| <Edge Weight>

North America Counties Category Graph

This category graph has been estimated from the Facebook Social Graph dataset. Its categories are regions of the United States and Canada that Facebook users could join in 2009.

Nodes
Category nodes are contained in the file “northamerica_nodes_2009.csv”. Each row in this file describes a category node and has the following structure:

<Node ID> | <Node Name> | <Longitude> |<Latitude> | <Country>

Edges
Category edges are contained in the file “northamerica_edges_2009.csv”. Each row in this file contains a category edge and has the following structure:

<Edge ID> | <Node ID> | <Node ID>| <Edge Weight>

UK Cities Category Graph

This category graph has been estimated from the Facebook Social Graph dataset. Its categories are regions of the United Kingdom that Facebook users could join in 2009.

Nodes
Category nodes are contained in the file “ukcities_nodes_2009.csv”. Each row in this file describes a category node and has the following structure:

<Node ID> | <Node Name>

Edges
Category edges are contained in the file “ukcities_edges_2009.csv”. Each row in this file contains a category edge and has the following structure:

<Edge ID> | <Node ID> | <Node ID>| <Edge Weight>

Mapping of NetworkIDs to Network names

The file “net2net_mapping_2009” contains all regional/school/workplace Facebook networks discovered during the collection of the Facebook Social Graph dataset. The structure is:

<Network ID> # <Network Name>

One can use the mapping of networksIDs in combination with the Facebook Social Graph to estimate the category-to-category graphs. One can use the estimated category-to-category graphs to create models and test hypotheses on how category features (rank and type of a university, language and religion of a country, geographical distance) affect the inter-category interaction rates.

You can request the Facebook Geosocialmap dataset here.

As a condition of usage, please cite the Facebook Geosocialmap dataset with the following bibtex entry :

   @inproceedings{kurant11_coarsetopology,
   title= {{Coarse-Grained Topology Estimation via Graph Sampling}},
   author= {Maciej Kurant and Minas Gjoka and Yan Wang and Zack W. Almquist and Carter T. Butts and Athina Markopoulou},
   booktitle = {Proceedings of ACM SIGCOMM Workshop on Online Social Networks (WOSN) '12},
   address = {Helsinki, Finland},
   month = {August},
   year = {2012}
   }

Facebook Social Graph - MHRW & UNI

We release the following datasets, collected in April of 2009 through data scraping from Facebook :

  1. MHRW - A sample of 957K unique users obtained Facebook-wide by 28 independent Metropolis-Hastings random walks
  2. UNI - A sample of 984K unique users that represents the “ground truth” i.e., a truly uniform sample of Facebook userIDs, selected by a rejection sampling procedure from the system's 32-bit ID space.

For each dataset, we release two files. The first file contains for each sampled userID,the number of times the user was sampled and the userIDs of his/her friends.
<uid> <#times sampled> <friend_uid_1> <friend_uid_2> .. <friend_uid_j>

The second file contains additional node properties for each sampled user. For each sampled userID we have the number of times sampled, the total number of friends, the privacy settings and network membership.
<uid> <#times sampled> <#totalfriends> <privacy settings> <networkID(s)>

The privacy settings consist of four basic binary privacy attributes: 1) Add as friend 2) Photo Thumbnail 3) View Friends 4) Send message. We refer to the resulting 4-bit number as the “privacy settings of a user” and encode it in the released dataset as a decimal number i.e., 1111 is “15”, 1000 is “8”, 0001 is “1” . The list of networkIDs contains the regional/school/workplace FB networks that the user is a member of. For more information about the privacy settings and network membership see journal paper.

UserIDs are consistent across files. UserIDs and networkIDs are anonymized. The mapping of anonymized networkIDs to Facebook network names is available as part of our Facebook Geosocialmap dataset release.

For assumptions, goal and applications of the collected datasets, see Sections III-A and III-B in the journal version of this work

MHRW
UNI

As a condition of usage, please cite the Facebook Social graph data sets using the following bibtex entry:

   @inproceedings{gjoka10_walkingfb, 
   author= {Minas Gjoka and Maciej Kurant and Carter T. Butts and Athina Markopoulou}, 
   title= { {Walking in Facebook: A Case Study of Unbiased Sampling of OSNs} }, 
   booktitle = {Proceedings of IEEE INFOCOM '10},
   address = {San Diego, CA}, 
   month = {March}, 
   year = {2010} 
   }

Facebook Social Graph - Breadth First Search

We extend the Facebook Social Graph dataset by releasing two more social graph samples collected in April of 2009 through data scraping from Facebook :

  1. BFS-28 - A sample of 2,198K unique users collected by 28 independent Breadth-First-Search Traversals of length 81K.
  2. BFS-1 - A sample of 1,189K unique users collected by 1 Breadth-First-Search traversal.

The BFS-28 and BFS-1 social graph samples are released in an adjacency list format. Each line contains a sampled userID and the userIDs of his/her friends.
<uid> <friend_uid_1> <friend_uid_2> .. <friend_uid_j>

UserIDs are anonymized and are not consistent between the BFS-28 and BFS-1 social graph samples.

BFS-28
BFS-1

As a condition of usage, please cite the “Facebook Social Graph - Breadth First Search” dataset using any of the following bibtex entries:

   @inproceedings{gjoka10_walkingfb, 
   author= {Minas Gjoka and Maciej Kurant and Carter T. Butts and Athina Markopoulou}, 
   title= { {Walking in Facebook: A Case Study of Unbiased Sampling of OSNs} }, 
   booktitle = {Proceedings of IEEE INFOCOM '10},
   address = {San Diego, CA}, 
   month = {March}, 
   year = {2010} 
   }
  @article{mgjoka_recommendations_jsac,
  title= {{Practical Recommendations on Crawling Online Social Networks}},
  author= {Minas Gjoka and Maciej Kurant and Carter T. Butts and Athina Markopoulou},
  journal = {IEEE J. Sel. Areas Commun. on Measurement of Internet Topologies},
  year = {2011}
  }

Facebook Applications

Dataset I

We release dataset I, which contains the number of active users and total application installations daily for every Facebook application between 08/29/2007 and 02/14/2008 . The data was retrieved from the Adonomics website, which had been collecting aggregate applications statistics, Daily Active Users (DAU) and Application Installs, by scraping the Facebook application directory.

Dataset I comprises of 16,812 files (one file for each application present in the Facebook application directory until 02/14/2008). The filename of each file contains the respective <app_id>. The structure of each file is
””, <app_name>, ””
“Time”,”Installs”,”DAU”
<day_1>, <#installs_1>, <#DAU_1>
<day_2>, <#installs_2>, <#DAU_2>
<day_3>, <#installs_3>, <#DAU_3>

Dataset I - Facebook Applications Statistics in Aggregate

Dataset II

We release dataset II, collected in February 2008, which contains a list of installed applications for 297K Facebook users
<uid> <app_id_1> <app_id_2> .. <app_id_j>

UserIDs are anonymized. More information about the collection process and the representativeness of this dataset is contained in the paper.

Dataset II - Facebook Application Installations per User

As a condition of usage, please cite the Facebook Applications data sets using the following bibtex entry:

   @inproceedings{mgjoka_wosn08,
   author= {Minas Gjoka and Michael Sirivianos and Athina Markopoulou and Xiaowei Yang},
   title= { {Poking Facebook: Characterization of OSN Applications} },
   booktitle = {Proceedings of ACM SIGCOMM Workshop on Online Social Networks (WOSN) '08},
   address = {Seattle, WA},
   month = {August},
   year = {2008}
   }

Facebook Weighted Random Walks

We release the following datasets, collected in October of 2010 through data scraping from Facebook :

  1. RW - A sample of 1M unique users obtained Facebook-wide by 25 independent simple Random Walks
  2. Hybrid - A sample of 1M unique users obtained Facebook-wide by 25 independent Stratified Weighted Random Walks (S-WRW) with hybrid conflict resolution. The measurement objective in the Hybrid sample are Facebook users with college network membership.

For each dataset, we release two files. The first file contains for each sampled userID, (i) the weight of the sampled user, (ii) the number of vfriends, the visitable friends during the social graph exploration (or friends for which “View Friends”=1), (iii) the total number of friends , and (iv) list of networkIDs of which the user is a member of. The symbol ”#” is used as a separator.
<uid> <weight> <#vfriends> <#totalfriends> <networkID_1#networkID_2#>

The second file contains mappings from networkIDs to network names and network types (college,work,school) .
<networkID_1> <network_name_1> <network_type_1>

UserIDs are anonymized and are consistent across files.

RW
Hybrid

As a condition of usage, please cite the Facebook Weighted Random Walks data sets using the following bibtex entry:

   @inproceedings{kurant11_magnifying,
   title= {{Walking on a Graph with a Magnifying Glass: Stratified Sampling via Weighted Random Walks}},
   author= {Maciej Kurant and Minas Gjoka and Carter T. Butts and Athina Markopoulou},
   booktitle = {Proceedings of ACM SIGMETRICS '11},
   address = {San Jose, CA},
   month = {June},
   year = {2011}
   }

Last.fm Multigraph

We plan to release single and multigraph crawls of Last.fm, collected during July 2010 and presented in our preprint. Please contact us to get more information and get notified about their release.

  • Bookmark at
  • Bookmark "Sampling Online Social Networks" at Google
  • Bookmark "Sampling Online Social Networks" at Facebook
  • Bookmark "Sampling Online Social Networks" at del.icio.us
  • Bookmark "Sampling Online Social Networks" at Reddit
  • Bookmark "Sampling Online Social Networks" at Digg
  • Bookmark "Sampling Online Social Networks" at StumbleUpon
  • Bookmark "Sampling Online Social Networks" at Twitter
  • Bookmark "Sampling Online Social Networks" at Yahoo! Bookmarks
  • Bookmark "Sampling Online Social Networks" at Live Bookmarks
  • Bookmark "Sampling Online Social Networks" at Favorites