Delivery Format
Now that we've explained the dataset structure, we can go over format in which this data is delivered.
File Format¶
Due to the high number of variables, this is a large dataset. Only subsets of variables or geographies will be able to be opened in applications like Excel, and those opening this data in more advanced statistical or programming environments may need to manage memory.
That said, this data is typically delivered in .csv or .parquet format with each row representing a single geography and each column representing a FollowGraph variable. Our standard geographic file is at the census block group level.
The below table is an example table with 10 random selected variables (you can scroll the table right to see the whole thing) at a census block group level.
| BLOCKGROUP | Boy Band Fans | Budget Fashion Enthusiasts | Business Readers | Dalai Lama | Daniel Dale | Olive Garden | Productivity Apps Enthusiasts | Rachael Ray Show | Soccer Enthusiasts | Cardi B |
|---|---|---|---|---|---|---|---|---|---|---|
| 010010201001 | 119 | 141 | 60 | 74 | 60 | 146 | 58 | 126 | 85 | 92 |
| 010010201002 | 106 | 104 | 68 | 81 | 78 | 116 | 78 | 109 | 107 | 81 |
| 010010202001 | 123 | 165 | 76 | 83 | 76 | 151 | 71 | 125 | 92 | 149 |
| 010010202002 | 111 | 139 | 73 | 80 | 70 | 140 | 72 | 122 | 88 | 123 |
| 010010203001 | 105 | 125 | 72 | 78 | 75 | 131 | 75 | 120 | 93 | 104 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
All scores provided are index scores where 100 represents the national average. The first value in the first row is 119, indicating that the people in block group 010010201001 are 1.19 times as likely to follow Boy Bands on social media. This can also be stated as "19% more likely". This is further explained in the Dataset Structure section.
Your data file may include columns IDs instead of names (for the column headers). The "Key" file contains the full variable listing and you can use the IDs to match the data and key files. The key file contains a list of all the variables including names, descriptions, categorizations, and the social media accounts which each variable is based on.
Multiple Files¶
This dataset has thousands of variables. Depending on use case, we may deliver the data multiple files. Typically, we would split along variable sections (i.e. one file for "Interests", one file for "Celebrities & Influencers", etc.).