The Look-alike notebook in AI Workbench helps you find new audiences that resemble your best customers. Given a target segment of known target profiles, such as high-value customers or known converters, the notebook trains a machine-learning model that learns the patterns distinguishing your target audience from the general population. The model then assigns each profile a look-alike score between 0 and 100, where higher values indicate greater similarity to the target audience.

The Look-alike notebook leverages the BlueConic Continuous models feature, so once the model is trained, it scores profiles in real time as visitors interact with your website or mobile app, or whenever a relevant profile property changes. You can use the resulting score to build segments, personalize experiences, and activate look-alike audiences across your channels.

Before you begin

Identify a source segment that represents the broader population you want to find look-alikes in, such as all recent visitors.
Identify a target segment that contains the profiles you want to find more of, such as known converters or high-value customers.
Decide which profile properties you hypothesize distinguish the target audience from the general population, such as interests, engagement metrics, or demographics.
Optionally, create a Continuous look-alike model in BlueConic to store the trained result. You can also create one directly from the notebook parameters UI.
Create or identify a profile property to store the look-alike score (0–100) for each profile.

Add a Look-alike notebook

Navigate to More > AI Workbench > Add notebook.
Choose Look-alike notebook from the pop-up window.
Give your notebook a name.
Click Save.

Set the Look-alike notebook parameters

On the Parameters tab, configure the required and optional settings that control how the model is trained and which profiles are scored.

Required parameters

Select the Parameters tab.
Under Source, select the source segment that represents the broader population to find look-alikes in.
Under Target, select the target segment that contains the profiles you want to find look-alikes for.
Under Profile properties, select the properties to use as features for the model.
Under Model, select the look-alike model to store the trained result in.
Under Look-alike score property, select the profile property that receives the model's output score.
Click Save.

Note: If you haven't created a look-alike model yet, you can do so from the Models section in the BlueConic before configuring the notebook.

Optional parameters

Scoring segment: Select a segment to restrict which profiles are scored by the model at prediction time.
Feature deny list: Enter feature names to exclude from the model. Use this to remove features that would give misleading results or that aren't relevant. Numerical properties use the property name as the feature name, such as page_views. Text properties use the format property=value, such as interest=sports.

Run the Look-alike notebook

Select the Schedule and run history tab.
Click Run now to train the model manually.
To schedule retraining for a future date, activate Enable scheduling.
Click the Settings icon to choose how often the notebook runs.
Set a time and click OK.
Click Save.

Note: The notebook only retrains the model. Profiles are automatically scored each time someone visits the website or opens the mobile app, or when one of the relevant profile properties changes.

View your results

After running the notebook, you can view its output by clicking Preview in the Run history table. The notebook produces several charts that help you evaluate model quality and choose a score threshold for your look-alike audience.

Feature importances

Shows the relative importance of each profile property, or specific property value for text properties, in the trained model, ranked from most to least important. Use this chart to understand which features drive the distinction between the target audience and the general population.

Feature contributions

Shows the most important profile properties and whether a higher value for each makes a profile more or less likely to be part of the target audience. Blue bars pointing right mean a higher value pushes profiles toward the target audience; pink bars pointing left mean the opposite. The longer the bar, the more influential the feature.

Lift chart

Shows how much more likely profiles are to be part of the target audience compared to random selection, as a function of the top percentage of the population ranked by score. For example, a lift of 5× at the top 10% means those profiles are five times more likely to be target audience members than a random sample.

Cumulative gain chart

Shows the percentage of target audience members captured as you include more of the population ranked by score. For example, if the curve reaches 80% at 20% of the population, the top 20% of scored profiles contain 80% of all target audience members.

Score threshold chart

Shows how audience quality (lift) and audience size change as the score threshold varies from 0 to 100. The left axis shows lift and the right axis shows audience size as a percentage of the total population. Use this chart to choose a score cutoff that balances reach and quality.

[Screenshot placeholder: Look-alike notebook insights]

FAQ

How do I decide which profile properties to use?

Start by defining hypotheses about what would make a customer move from the source segment to the target segment. Based on these hypotheses, figure out what data points are associated with each hypothesis, then match these data points to existing profile properties. Set up additional listeners, timeline rollups, or connections for data points that don't yet have an associated profile property.

How do I decide what threshold to use for the look-alike score?

The threshold depends on your use case and is generally a tradeoff between audience size and audience quality.The Cumulative gain chart helps you decide how large your target audience should be. The blue line shows what percentage of your target audience (Y-axis) you capture when you target a given percentage of the total population (X-axis), with profiles ranked by look-alike score. The dashed line represents random targeting. The bigger the gap between the blue line and the dashed line, the better the model is at concentrating your target profiles at the top of the score range.For example, if the blue line reaches 80% on the Y-axis at 20% on the X-axis, targeting the top 20% of profiles by score captures 80% of all your target audience members — far better than the 20% you'd capture by targeting randomly.The Score threshold chart helps you choose a score cutoff to define your look-alike audience. The blue line shows audience quality (lift) and the pink line shows audience size as a percentage of the total population. As you raise the score threshold, the audience gets smaller but higher quality. Look for a sweet spot where the blue line is well above 1 while the pink line still gives you enough reach for your campaign.

When are profiles scored?

The notebook itself only trains the look-alike model. The model is automatically executed each time someone visits the website or opens the mobile app, or when one of the relevant profile properties changes.

How often should I retrain the model?

Since the notebook only retrains the model, it doesn't need to run often. How often it should run depends on how quickly customer behavior changes, for example because of seasonality. In many cases retraining the model once a month is sufficient, and retraining more often than once a week is usually unnecessary. You can schedule the notebook on the Schedule and run history tab.

How can I see which profile properties influence the model's decisions?

The Feature importances chart shows which profile properties, or specific property values for text properties, have the most influence on the model, ranked from most to least important. The longer the bar, the more that feature helps the model distinguish your target audience from the general population. Numerical properties appear by name, such as page_views, while text properties are broken out per value, such as interest=sports. If a property you expected to be important has a short bar or doesn't appear, it may not be a useful distinguishing factor for this audience.The Feature contributions chart shows the most important profile properties and whether a higher value for each property makes a profile more or less likely to be part of the target audience. Bar length reflects how influential the feature is, and bar direction shows whether higher values push profiles toward or away from the target audience. Blue bars pointing right mean a higher value makes a profile more likely to be part of the target audience; pink bars pointing left mean the opposite.

I'm seeing something strange in the feature importances or feature contributions chart. What do I do?

The model can pick up a spurious correlation between one of the input features and the target segment. First, check if this is caused by an unintended difference between the source segment and the target segment. For example, if the lastvisitdate feature is very important and the source segment has a restriction on last visited date while the target segment doesn't, the model may have learned your segment configuration instead of an actual difference between the two groups. In this case, either remove the profile property or add the same restriction to the target segment.You can also exclude a specific value from a text profile property. To do so, concatenate the profile property ID and the value using the format property=value, such as interest=sports, and add it to the Feature deny list. After making changes, run the notebook again to retrain the model.

How can I see if the model is of sufficient quality?

The Lift chart shows how much better the model is at finding your target profiles compared to random selection. The Y-axis shows a multiplier — for example, a lift of 5× at 10% means that the top 10% of profiles ranked by score contain five times more target audience members than a random 10% sample. The dashed line at 1× represents random targeting. Higher lift at lower percentages of the population means the model does a good job of ranking your target profiles at the top. The chart is zoomed in on the top 20% of the population, since that's where the model's advantage is most visible.

How can I track model quality over time?

Each time the model is trained, it's validated against a holdout set to calculate an AUPRC (Area Under the Precision Recall Curve) score. To visualize this score over time, add a Notebook Performance Insight to a dashboard, select the notebook, and configure the score property.

How can I improve the quality of the model?

Make sure the right profile properties are selected. Use hypotheses to figure out which profile properties are relevant.
Make sure profile properties are processed correctly. For example, turn comma-separated values into separate profile property values.
Dive into the code: you can override all XGBoost parameters in the BaggingXGBoostLearner.

Look-alike notebook