Skip to content

Methodology

Overview

FollowGraph is derived from analysis of social media following data from public social media accounts. Every variable (with the exception of the "Interests" variables) is individually modeled with the goal of predicting the likelihood that any individual (or aggregate individuals in an area) follow a given account or group of accounts.

Predicted following percentage (aka likelihood) is then indexed to the national average (see the Index Scores section for more info about index scores)

Sources

The following datasets were used in the creation of FollowGraph variables:

  1. Social Media Following - millions of lists of the account's people follow
  2. Mobile Visitation - cell phone movement/visitation aggregated at the census block group level
  3. Demographics
    1. Census demographics
    2. Individual and household level variables

Coverage

FollowGraph is available in entire USA (all 50 states). The dataset has over 99% coverage of US census block groups. Further, the dataset is backed by a database of approximately 281 million records to match individuals across the USA.

Regionality and Local-ness

Some variables are significantly affected by geography and regionality. Examples of these types of variables include sports teams (Cincinnati Bengals, Dallas Mavericks, etc.), local politicians, and more.

For every variable in our dataset, we calculate a score of how "local" or regional that variable is. Variables that are above a certain threshold in their "regionality" or "locality" score include special variables in the modeling stage to account for their relationship to geography and location.

Interest Variables

All variables have a top level categorization of either Brands, Channels, Interests, or Top Social Media Accounts. See the categorization section for more info. Variables categorized as "Interests" are generated slightly differently.

Interest variables are calculated by taking the average index score across the accounts which make up the index. Here are a couple of example indexes:

  • UFC & MMA Enthusiasts - the average of scores for variables UFC, Dana White, Ronda Rousey, and Conor McGregor
  • Adult Cartoon Enthusiasts - Adult Swim, Family Guy, Rick & Morty, and South Park
  • Liberal Podcast Listeners - Pod Save America and Ezra Klein

The key file that comes with the FollowGraph specifies the accounts that make up each Interest variable.

There are some Interest variables that are made up of only one account. These variables will have the exact same values as the individual variables which correspond to them. Many of these variables are related to specific brands (especially car brands) and are generated for use on external advertising platforms.


Back to top