Methodology¶
Segmentation systems structured similarly to PersonaLive have been around for a while. The true innovation in our dataset is the addition of new data sources and the data science to combine them with the old. These new data sources allow us to differentiate people on more than just demographic variables and create rich metadata without out the need for survey based data.
Sources¶
PersonaLive is created with the following datasets:
- Geolocated social media posts
- Primarily Twitter and Instagram, but also Foursquare and Facebook
- Non-geolocated social media (connected to individuals)
- Social media posts
- Following data
- Mobile Visitation
- Credit Card Spend Data
- Demographics
- Census variables
- Individual household variables
Clustering¶
Variables were selected from the above datasets on the basis of coverage, effectiveness, and to minimize inter-correlation. Various combinations of variables and weights were experimented with to find the clustering approach that performed the best in a variety of predictive tests.
Clustering was performed at the block group level, with individual level data used to enhance and re-allocate an individual's classification.
Evaluation¶
In order to evaluate the strength of PersonaLive, we constructed a suite of tests using real world outcome data that spanned a breadth of industries and use cases. Some of these datasets are publicly available while others are private. These datasets include
- Sales
- Subscription Membership
- E-Commerce
- Pandemic related social distancing behavior
- Insurance
- Retail Sales
- Many more...
In these tests we evaluated various iterations of the PersonaLive dataset against each other and competing segmentation datasets. Additionally, statistical measures such as inertia were used to evaluate a given cluster model's ability to differentiate records.