Segmentation systems structured similarly to PersonaLive have been around for a while. The true innovation in our dataset is the addition of new data sources and the data science to combine them with the old. These new data sources allow us to differentiate people on more than just demographic variables and create rich metadata without out the need for survey based data.
The following datasets were used or experimented with in the creation of PersonaLive:
- Geolocated social media posts
- Primarily Twitter and Instagram, but also Foursquare and Facebook
- Non-geolocated social media (connected to individuals)
- Social media posts
- Following data
- Mobile Visitation (provided by Uber Media, which was purchased by Near )
- Web Visitation (provided by Dstillery)
- Census variables
- Individual household variables (provided by Infutor)
Variables were selected from the above datasets on the basis of coverage, effectiveness, and to minimize inter-correlation. Various combinations of variables and weights were experimented with to find the clustering approach that performed the best in a variety of predictive tests.
Clustering was performed at the block group level, with individual level data used to enhance and re-allocate an individual's classification.
In order to evaluate the strength of PersonaLive, we constructed a suite of tests using real world outcome data that spanned a breadth of industries and use cases. Some of these datasets are publicly available while others are private. These datasets include
- Subscription Membership
- Pandemic related social distancing behavior
- Retail Sales
- Many more...
In these tests we evaluated various iterations of the PersonaLive dataset against each other and competing segmentation datasets. Additionally, statistical measures such as inertia were used to evaluate a given cluster model's ability to differentiate records.