Dataset Structure¶

The Proximity dataset scores every area (where there is enough data) for each of 72 different Geosocial segments. Each of these segments scores higher when social media of its type is present. For example, the Natural Beauty segment scores high when posts mention things like camera brands, waterfalls, parks, #OutdoorPhotography, and #NoFilter, to name a few.

Natural Beauty Imagery

Percentile Scores¶

Spatial.ai GeoSocial segments are scored 0-100 as a percentile compared to a reference set of geographies. For the USA, the standard reference set of geographies is all USA census block groups. So, a score of 80 for the “Bookish” segment means that this particular segment scores higher than 80% of all USA block groups.

Geographies are always given a percentile score based on similar geographies in their own country unless otherwise specified. USA block groups are scored against all USA block groups, Canadian dissemination areas are always scored against all Canadian dissemination areas, etc.

When no scores are provided for a given geography, this means that there was not enough social media in the area to provide an accurate score.

Data Relative to DMA, County, etc.¶

Sometimes data is provided that is based on a local reference set. For example, a dataset may specify that it is “county-level” or “relative to county”. In this case, the reference set has been limited to that geography. So a Bookish score of 80 would mean that this block group scored higher than 80% of block groups in its county. Local reference sets used include but are not limited to the following.

Designated Marketing Area (DMA) - USA
County - USA
Economic Region (ER) - Canada
Consolidated Census Subdivision (CCS) - Canada

This type of relative dataset is helpful in many situations. For example, a brand with locations in both the Midwest and NYC might find that the Party Life segment is predictive of store performance. It's NYC location has a Party Life score of 82, which is higher than 82% of block groups. Typically, this is a good score for the brand. However, by using a percentile score relative to DMA or county, they realize that while this is a high score relative to the nation, it is not a high score in NYC, and may explain why that location lags in performance.

Segment ID Codes¶

Each segment has an associated name and code. The name gives a clear idea of the behavior a segment measures, while the code provides a simple identifier that is easy to work with for development and analytics purposes.

For example, the “Bookish” segment is EA01. The letter “E” reflects that this segment is part of our E generation of Proximity segments. The “A” identifies which family the segment belongs to (the “Hobbies & Interests” family). The “01” is the unique identifier within that family. All segment codes, names, and families can be found in the tables section below.

Index Variables¶

Along with the Proximity segment data, this dataset may come with index variables. These variables will always start with the prefix “Index -”. They are averages of Proximity segments that have mathematical relationships with the index’s theme. For example, “Index - High-End Affinity” is the average of segments LGBTQ Culture, Wine Lovers, Yoga Advocates, Mindfulness & Spirituality, and Dog Lovers. Each of these segments has a strong association with higher-income individuals and more expensive tastes.

If you are a customer for our Proximity dataset, you can find the full list of indexes and their component segments in your cloud account.

Volume Variables¶

The other variables included in the Proximity dataset are associated with social media volume. In general, volume variables can be extremely useful in their own right. It is highly recommended to include volume variables variables in modeling applications.

While our standard volume variable is the "Volume Index", we may provide up to 4 different measures of volume. See the data schema for their definitions.

Data Dictionary¶

Variable Name	Description	Data Type	Example
GEOID	Unique geographic identifier. Blockgroup ID, County ID, Zip Code, Dissemination Area, etc.	String	390610007003
EA01 - Bookish	One of our core 72 Geosocial segments measured at national level. You can learn more about this segment at taxonomy.spatial.ai.	Float	81.2
EA02 - Engine Enthusiasts	One of our core 72 Geosocial segments measured at national level. You can learn more about this segment at taxonomy.spatial.ai.	Float	81.2
...	...	...	...
Index - High-End Affinity	This index indicates an area’s affinity for high-end behavior. It is calculated by taking the average of LGBTQ Culture, Wine Lovers, Yoga Advocates, Mindfulness & Spirituality, and Dog Lovers.	Float	65.3
Index - Mid-Range Consumers	This index indicates an area’s affinity for mid-range consumers. It is calculated by taking the average of Green Thumb, Animal Advocates, Handcrafted, Fitness Fashion, and Organized Sports.	Float	65.3
...	...	...	...
Volume Index	This variable is a measure of social media volume. It is directly proportional to the median amount of social media for that geography. A value of 2 indicates that this block group has twice the median amount. When aggregating data, this variable should be used as the weight in a weighted average. Other volume variables may also be used as the weight.	Float	2.5
Volume Percentile	This variable is a measure of social media volume, transformed to be a percentile. So, for example, a block group that has an 80 for its volume percentile means that it has a higher amount of social media than 80% of block groups. The volume percentile may be calculated by ranking the volume index from 0 to 100 (we use 2-decimal precision).	Float	70.3
Density Index	This variable is a measure of social media density. It is directly proportional to the median social media density. A block group that has a value of 2 means that it has 2x the median social media density (amount of social media divided by land area).	Float	2.5
Density Percentile	This variable is a measure of social media density. It is a percentile ranking of social media density. So, for example, a block group that has an 80 for its density percentile means that it has a higher density (amount of social media divided by land area) than 80% of block groups.	Float	70.3
Buffer Meters	This variable indicates whether and how large of a buffer was used in the scoring of a particular geography. A value of 400 indicated that a 400-meter buffer was used in assigning social media to a given geography. Buffers are only used when the geography itself does not contain enough social media. If this variable is not included in the dataset, then it means that no buffer was used for any of the data.	Integer	400