5 min read

CDP Architecture: How CDPs Drive First-party Data Digital Marketing

Tiago Boldt Sousa
Tiago Boldt Sousa
Updated on
March 15, 2023
Engineering

Customer Data Platforms (CDPs) enable brands to collect first-party data about their users and leverage it to reach the right audience without sharing private user data with third parties. Advertisers can pay to target their target audience segments, simultaneously boosting publisher revenue and advertising performance.

This article documents how the CDP architecture works by explaining its main components: event tracking, ID matching, user profile storage, segmentation, and activation,as well as how they relate to each other. 

How we got here: digital marketing with third-party data

Marketing platforms like Google and Facebook have a large pool of first and third-party data about their users and the opportunity to expose them to a large volume of ads. Third-party data is generated outside these platforms, like user interactions on a brand's website. This data is shared with the marketing platform to optimize the brand's campaigns on that platform. 

As the digital marketing ecosystem grew, so did the desire for efficiency. Soon, institutions began sharing or selling user data for targeting in third-party marketing platforms. More user data meant better audiences and more profitability. 

Nevertheless, some of these deals were ethically questionable, and the growing concerns about data privacy motivated a response at several levels. Concerned users began using:

  • Adblockers
  • Third-party cookie blockers
  • Privacy-oriented browsers 

As governments recognized the exploitation of user data, they introduced legislation to ensure its legitimate usage, such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in California.

More recently, browsers have stepped forward with attempts to protect their users, introducing several strategies to mitigate data sharing with third parties. Apple led the most recent example of this with Safari now blocking third-party cookies from being stored, rendering most third-party tracking software useless. 

How CDPs entered the picture: digital marketing with first-party data

Eventually, the digital marketing ecosystem recognized the issue of illegitimate data usage. Brands started prioritizing their first-party marketing strategy, leveraging user data to optimize marketing strategies without sharing it with third parties.

Large marketing platforms advocate this strategy as well. Google asserts that by doing so, businesses can improve revenue by 2.9X and reduce costs by 1.5X.

User audiences still need to be made available to marketing platforms for running campaigns. Still, for brands that have the in-house capability to create their audiences, only anonymized identifiers need to be shared with third parties. These include hashed email addresses, hashed phone numbers, or shared platform-specific IDs.

Customer Data Platforms (CDPs) allow brands to further understand their users, along with the opportunity to create their audiences. CDPs help brands collect, analyze, segment, and anonymously activate their data on third-party platforms. This ensures that marketing initiatives are still viable but without the need to share personal user data. Citing G2, a CDP is a customer database that automatically updates as new data becomes available from a multitude of sources, predominantly first-party data and sometimes third-party data.

CDPs can then structure this collected data into centralized customer profiles to enable organizations to identify and easily engage with their customers, leading to a high lifetime value with them.

There is already a large offer of CDPs available. Most CDPs centralize user data in the providers’ cloud infrastructure (Twilio Segment, Zeotap, Lytics). Others are fully private, deployed, and operated within the brands’ cloud infrastructure (Kevel Audience, Apache Unomi, custom solutions).

CDP architecture

This section presents a brief overview of the architecture of a CDP, introducing five of the most relevant components and how they cooperate in  providing a CDP. They are: 

  1. Event Tracking: Capture interactions between the user and the brand.
  2. ID Matching: Create or adopt a platform-specific identifier (ID) for the user upon first interaction on each platform, and use it on all upcoming user events for identification. 
  3. User Profile Storage: Provide a database for user data that stores all user attributes, identifying users by user IDs.
  4. Segmentation: Provide a way for brands to express rules over user attributes. Segmentation rules must be boolean expressions that identify if a user belongs, or not, to an audience.
  5. Activation: Activate audiences to third-party marketing platforms. A dedicated integration with the destination platform is needed so that audiences can be made available to it, often using common or pre-shared user identifiers.

CDP architecture

The figure above demonstrates how the components interact with each other. A concrete example for this interaction can be described as: Event Tracking identifies the events from a user that visits an e-commerce store. When the user uses another device to interact with the brand and authenticates in both, an ID Matching takes place, unifying the user profile.

Event Tracking

User audiences let brands personalize their marketing messaging to fit the right users at the right time. Messaging can be customized to reach users according to several strategies, like: 

  • User recurrent behavior
  • Real-time activity

E-commerce events can be any user interaction, namely web page views and clicks, product views and clicks, adding and removing products from the cart, or purchases. Event data can be made available from:

  • Online stores
  • Mobile applications
  • Email newsletters
  • Brick-and-mortar stores’ loyalty systems
  • Any other point of contact between the user and the brand.

An Event Tracking service can expose a set of endpoints to collect user events.

Events collected through Event Tracking can then be processed to update user attributes in User Profile Storage, which can then be used for Segmentation.

Most CDPs follow a similar strategy for this component, providing an API to collect events, as is the case of MParticle, Kevel Audience, and Segment.

ID Matching

Users tend to use multiple devices during their buying journey, like beginning on a mobile application by adding products to their cart, but later, using a browser to complete the purchase. Interacting with a brand on multiple devices can mistakenly lead to two independent user profiles instead of one  on multiple devices. 

Adopting a platform-specific identifier (ID) for users upon first interaction with each platform and using it on all upcoming user events helps create unified user journeys. When a user logs in, match their ID with the user’s email address or username. Repeat across platforms and allow all user IDs to be available to User Profile Storage. Platform-specific IDs can be browser cookies, mobile phone advertising IDs, or any other identifier that can be reused on that platform.

With this strategy, however, profiles can remain independent if the user does not log in or fails to provide a stable identifier. During this time, two or more independent user profiles will coexist.

When there is an ID Matching opportunity, two or more profiles must be merged. This requires the User Profile Storage to have a defined strategy to merge the attributes in each profile. Different attributes must be merged with different strategies. For example, an attribute with the count of orders must be added from each profile, while others with the average order value must be recalculated.

ID Matching gives businesses a more accurate view of user activity and  allows them to interact with users more consistently across channels and platforms.

User Profile Storage

Event Tracking and ID Matching allow user activity to flow into the CDP. By itself, this information is not very useful. Given the end goal of creating user audiences, how can we organize incoming user data to be easily explored and segmented?

A CDP also provides a database for user data that stores all user attributes, identifying the user by any of his user IDs. Each user profile has a set of user attributives that can be queried and updated. User attributes can be distilled from incoming event data by defining strategies for filtering events, extracting data from them, and aggregating it onto user attributes. An example would be how many purchases a user has made as a customer. 

Alternatively, user attributes can be imported from offline sources, such as the brand’s Customer Relationship Management (CRM) software. Distilling the incoming raw data into user attributes reduces the information required to create user audiences and understand the user’s behavior. Segmentation can use those attributes for defining rules to create user audiences. Some CDPs also have machine learning models generating user attributes, for example, for generating predictions about their future behavior, enabling highly optimized Segmentation.

Known user profiles are often accessible programmatically from the CDP’s User Profile Storage. 

Segmentation

Brands want to reach different users with different marketing approaches. To do so, they need the ability to create user audiences based on their characteristics: 

  • Demographics
  • Propensity to buy
  • Users who purchased during a specific season
  • Other strategies that adjust to the strategy of a given marketing campaign. 

CDPs make it so user attributes can be used for creating a marketing audience. Segmentation rules must be expressions that identify if a user belongs to an audience or not. Segmentation can be provided using a visual interface to define the audiences and their rules. Audiences become available for Activation. Rules can be created leveraging any attribute available on User Profile Storage, namely those made available using machine learning models. 

The figure above demonstrates the creation of an audience of male users likely to buy shirts in the following 14 days. It leverages Kevel Audience’s machine learning model to predict the likelihood of a user buying a product in a category.

segmentation

Activation

CDPs aim to improve marketing initiatives, leveraging all data available to brands. Ultimately, these initiatives will be delivered on a marketing platform that shows the ads to users, such as Google or Facebook. Once user audiences are created using Segmentation, and a marketing strategy is defined to go with it, brands need to make the audiences available to the marketing platform. 

CDP architecture allows audience activation on third-party marketing platforms. A dedicated integration with the destination platform makes audiences available to it, often using common or pre-shared user identifiers. Common identifiers are often the user’s email address, phone number, or respective hashed versions. Platform-specific IDs can also be shared using a strategy similar to ID Matching, and used during Activation to identify the users in an audience.

A CDP activating audiences to multiple destinations facilitates how marketing teams operate across multiple marketing platforms by having consistent audiences without sharing additional user data and manually configuring each.

Audience synchronization can be synchronous, updating users as their attributes change on the CDP, or asynchronous, with the Activation executing on a given schedule. Deciding when to use each strategy is tightly coupled with the destination platform and its capacity to handle a varying volume of requests. Some marketing platforms also load audiences from files uploaded with user identifiers.

Conclusion

Digital Marketing is undergoing extensive innovation, continuously targeting improved performance while researching new strategies to respect user privacy. Widely recognized by the industry, first-party data can play a relevant role in positively influencing the performance of marketing campaigns.

To leverage such performance, brands adopt  CDPs to track user interactions, match their IDs across channels, aggregate their profiles, segment them into audiences, and activate only the required data to marketing platforms to enable highly-performant campaigns. To learn more about how to launch your CDP today, contact us.

An academic version of this article was published with the ACM Digital Library titled Customer Data Platforms: A Pattern Language for Digital Marketing Optimization with First-Party Data

All ad tech in your inbox

Subscribe to our newsletter to stay up to date with the latest news.
Recommended Articles