Skip to main content

Dialogue models

Dialogue models let you use AI to decide in real time which variant of a BlueConic Dialogue to show to each visitor. Instead of relying on manual A/B tests or static rotation rules, a Dialogue model adapts dynamically based on profile data and conversion outcomes, leading to higher engagement, improved retention, and increased revenue.

Models can be trained directly inside AI Workbench or externally in your own data science environment, then uploaded into BlueConic. Once stored in BlueConic, a model can be associated with one or more Dialogues to power variant optimization.


Before you begin

Before creating or uploading a Dialogue model, confirm the following:

  • You have access to AI Workbench in BlueConic.

  • Your model is in ONNX format, or you have a source model ready to convert.

  • Any profile properties used by the model already exist in BlueConic.

  • You understand which inputs your model requires.

  • For an introduction to Dialogues Models, see Dialogues Models Overview.


Dialogue model format

A Dialogue model is an ONNX model that returns the ID of one or more variants to show to a visitor.

Inputs

Input

Type

Description

variant_ids

string[]

The variant IDs the model can choose from. Required.

profile

float[]

The profile feature vector, derived from the profilePropertyIds and featureNames of the model. When the model declares a profile input, the length of this array must equal the number of featureNames.

views

int64[]

The number of times each variant has been viewed.

clicks

int64[]

The number of times each variant has been clicked.

conversions

int64[]

The number of times each variant has converted.

conversion_value

float[]

The total value of the conversions.

indirect_conversions

int64[]

The number of indirect conversions, where the dialogue played a role in the user journey but was not the final interaction before conversion.

unique_views

int64[]

The number of unique views.

unique_clicks

int64[]

The number of unique clicks.

unique_conversions

int64[]

The number of unique conversions.

unique_indirect_conversions

int64[]

The number of unique indirect conversions.

Note: All inputs except variant_ids are optional. For a more detailed description of these metrics, see Tracking Views, Clicks, and Conversions.

Outputs

Output

Type

Description

variant_id

string[]

One or more variants to show.

Any additional outputs are stored in the modelOutput property of the view timeline event. A typical use is to log the propensity of the chosen variant.

Metadata

Metadata field

Python API name

Description

profilePropertyIds

property_ids

The profile property IDs used as input features for the model.

featureNames

feature_names

The names of the features the model uses. The profile input is filled based on these feature names; missing values are filled with zero.


How Dialogue models run in production

A Dialogue model is executed in real time, server-side, every time the BlueConic decisioning engine serves a dialogue with an associated model. This happens when someone views a page on your website, navigates to a different section in your app, or opens an email. The output of the model is never cached , every time the dialogue is shown, the model runs again. If the same profile views the same dialogue twice, the model executes twice independently.

Profile input

When the decisioning engine executes a model, it first converts the profile into a dictionary where keys are strings and values are numbers. The dictionary contains only the profile properties listed in the profilePropertyIds metadata attribute. Text properties are one-hot encoded by joining the property name and value with an equals sign and assigning the value 1.

For example, a profile where the city property is Nijmegen and the clickcount property is 9 becomes {"city=nijmegen": 1, "clickcount": 9}. If the featureNames metadata attribute is ["clickcount", "city=amsterdam", "city=nijmegen"], the profile input is [9.0, 0, 1]. Any feature name not present in the dictionary defaults to 0.

This dictionary is also stored on the view timeline event as the modelInput property.

Timeline events

Every time a dialogue is shown, a view event is added to the visitor's timeline. When a model is associated with the dialogue, the event also gets two extra properties:

modelInput: A JSON-encoded dictionary representing the profile at the moment the decision was made. All keys are lowercased strings; all values are numbers.

modelOutput: A JSON-encoded dictionary of all model outputs other than variant_id. If your model produces a propensity output, it appears here.

A click event (with a variantId) is added to the timeline whenever someone clicks on a dialogue. A conversion event (with a variantId and optional value) is added whenever someone converts.

The variantId on a click or conversion event links it back to the corresponding view event for the same variant, and therefore to the modelInput and modelOutput that produced that view.

Attribution

BlueConic uses last-touch attribution: the dialogue that was viewed (and, depending on your configuration, clicked) most recently gets credited with the conversion. There is no attribution window — a conversion is always attributed to the most recent applicable view or click, regardless of how long ago that happened. When the most recent view did not lead directly to the conversion, the otherCandidates property on the conversion event records the other variants that could have been credited.

Variants being added or removed

A dialogue's variants can change while the model is live. When this happens, BlueConic adjusts the contents of the variant_ids input: added variants appear there, removed variants no longer do. Your model needs to be robust against this. Write unit tests that exercise the cases described in the Testing your model section: unknown IDs, missing IDs, and reordered IDs.

Profile properties being deleted

Profile properties listed in a model's profilePropertyIds cannot be deleted while the model is using them. BlueConic enforces this through a delete-blocking relationship. To delete such a property, first update the model so it no longer references it.


Upload a Dialogue model

You can upload Dialogue models through the BlueConic UI, AI Workbench notebooks, or the REST API.

Using the UI

  1. Go to More > AI Workbench.

  2. Open the Models tab.

  3. Click Add Model.

  4. Set the model type to Dialogue.

  5. Upload your ONNX model file.

  6. (Optional) Select one or more profile properties to use as inputs.

  7. (Optional) Copy the feature names.

  8. Click Save.

Using the AI Workbench (Python notebook)

To run the AI Workbench notebook on a schedule and regularly retrain your model, use the update_model method together with a model parameter.

First, define your parameters:

import blueconic

bc = blueconic.Client()

# the model to update

MODEL_ID = bc.get_blueconic_parameter_value("Model", "model")

if not MODEL_ID:

raise ValueError("Please configure a model")

Then, once you have trained an ONNX model (called onnx_model in this example):

model = bc.get_model(MODEL_ID)

model.type = blueconic.domain.ModelType.DIALOGUE

model.model = onnx_model.SerializeToString()

bc.update_model(model)

For more information, see the BlueConic Python API documentation.

Using the REST API

Dialogue models can also be pushed to the customer data platform (CDP) through the REST API, for example as part of an external model pipeline. For more information, see the BlueConic REST API documentation.


Model limits and validation

Limits

The following limits apply by default:

  • The model file is at most 20 MB.

  • A tenant can have at most 10 Dialogue models.

  • Model execution must complete within 50 milliseconds.

Note: If any of these limits are insufficient for your use case, contact [email protected].

Validation

The following checks run automatically when you upload a model:

  • The model must be a valid ONNX model. BlueConic supports the latest standard, but very new operators may not be supported yet. The ai.onnx.ml operators are supported; custom operators are not.

  • The model must declare a variant_ids input.

  • The model must declare a variant_id output.

  • The model must not declare any input other than the ones described above.


Create a Dialogue model

Show a random variant

Start with a simple model that picks a random variant from the available variants:

from onnx import helper, TensorProto

RANDOM_SEED = 0.0

# a list containing the IDs of the available variants

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

nodes = [

# generate a random value for each variant

helper.make_node(

op_type="RandomUniformLike",

inputs=["variant_ids"],

outputs=["random_values"],

low=0.0,

high=1.0,

seed=RANDOM_SEED,

dtype=TensorProto.FLOAT,

),

# select the index of the highest random value

helper.make_node(op_type="ArgMax", inputs=["random_values"], outputs=["index"]),

# gather the variant ID matching the index of the highest random value

helper.make_node(

op_type="Gather",

inputs=["variant_ids", "index"],

outputs=["variant_id"],

axis=0,

),

]

# output: the randomly chosen variant ID

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

graph = helper.make_graph(

nodes=nodes,

name="Random variant",

inputs=[variant_ids_input],

outputs=[variant_id_output],

)

onnx_model = helper.make_model(

ir_version=10, graph=graph, opset_imports=[helper.make_opsetid("", 21)]

)

Although this model is not useful in production on its own, it demonstrates the fundamentals of how to build a Dialogue model:

  • variant_ids input determines which variants are available.

  • The variant_id output contains the variant ID the model has decided to show.

Log the propensity

It is often useful to also log the propensity of the chosen variant. The propensity enables Inverse Propensity Weighting (IPW) later on, which improves the quality of the model and reporting outputs.

from onnx import helper, TensorProto

RANDOM_SEED = 0.0

# a list containing the IDs of the available variants

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

nodes = [

# a constant containing 1.0 which we will use to calculate the propensity

helper.make_node(

op_type="Constant",

inputs=[],

outputs=["one"],

value=helper.make_tensor(name="one", data_type=TensorProto.FLOAT, vals=[1.0], dims=[1]),

),

# count the number of variant IDs

helper.make_node(

op_type="Shape",

inputs=["variant_ids"],

outputs=["number_of_variants_int"],

),

# convert the number of variant IDs to float

# since we will be using it in a division

helper.make_node(

op_type="Cast",

inputs=["number_of_variants_int"],

outputs=["number_of_variants"],

to=TensorProto.FLOAT,

),

# calculate the propensity of a given variant being chosen

helper.make_node(op_type="Div", inputs=["one", "number_of_variants"], outputs=["propensity"]),

# generate a random value for each variant

helper.make_node(

op_type="RandomUniformLike",

inputs=["variant_ids"],

outputs=["random_values"],

low=0.0,

high=1.0,

seed=RANDOM_SEED,

dtype=TensorProto.FLOAT,

),

# select the index of the highest random value

helper.make_node(op_type="ArgMax", inputs=["random_values"], outputs=["index"]),

# gather the variant ID matching the index of the highest random value

helper.make_node(

op_type="Gather",

inputs=["variant_ids", "index"],

outputs=["variant_id"],

axis=0,

),

]

# output: the randomly chosen variant ID

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

propensity_output = helper.make_tensor_value_info(

name="propensity", elem_type=TensorProto.FLOAT, shape=[None]

)

graph = helper.make_graph(

nodes=nodes,

name="Random variant",

inputs=[variant_ids_input],

outputs=[variant_id_output, propensity_output],

)

onnx_model = helper.make_model(

ir_version=10, graph=graph, opset_imports=[helper.make_opsetid("", 21)]

)

The propensity output is stored in the modelOutput property of the view timeline event.

Ensure each variant is exclusive

Sometimes a visitor who has seen a specific variant should always continue seeing that same variant. For example, once a visitor has been shown a particular discount code, you do not want to show them a different one.

Implement this by incorporating the interactions_viewed profile property into your model. This property contains the IDs of all Dialogue variants the visitor has ever viewed.

First, retrieve the variant IDs:

import blueconic

bc = blueconic.Client()

MODEL_ID = bc.get_blueconic_parameter_value("Model", "model")

if not MODEL_ID:

raise ValueError("Please configure a model")

DIALOGUE_ID = bc.get_blueconic_parameter_value("Dialogue", "dialogue")

if not DIALOGUE_ID:

raise ValueError("Please configure a Dialogue")

VARIANT_IDS = [

variant.id

for variant in bc.get_dialogue(DIALOGUE_ID).variants

if variant.id != "original"

]

Then build a model based on the retrieved VARIANT_IDS and store it with the interactions_viewed profile property:

from onnx import helper, TensorProto

RANDOM_SEED = 0.0

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

profile_input = helper.make_tensor_value_info(

name="profile", elem_type=TensorProto.FLOAT, shape=[len(VARIANT_IDS)]

)

nodes = [

helper.make_node(

op_type="Constant",

inputs=[],

outputs=["one"],

value=helper.make_tensor(name="one", data_type=TensorProto.FLOAT, vals=[1.0], dims=[1]),

),

helper.make_node(

op_type="Constant",

inputs=[],

outputs=["zero"],

value=helper.make_tensor(name="zero", data_type=TensorProto.FLOAT, vals=[0.0], dims=[1]),

),

# pad the profile with a trailing 0.0

# so that unknown variants (which we'll map to index len(VARIANT_IDS))

# safely gather a "not viewed" flag instead of wrapping around to profile[-1]

helper.make_node(

op_type="Concat",

inputs=["profile", "zero"],

outputs=["padded_profile"],

axis=0,

),

# map each input variant_id to its index in VARIANT_IDS

# unknown variants get len(VARIANT_IDS), which points at the padded zero

helper.make_node(

op_type="LabelEncoder",

inputs=["variant_ids"],

outputs=["variant_indices"],

domain="ai.onnx.ml",

keys_strings=list(VARIANT_IDS),

values_int64s=list(range(len(VARIANT_IDS))),

default_int64=len(VARIANT_IDS),

),

# look up the viewed flag (0.0 or 1.0) for each input variant

helper.make_node(

op_type="Gather",

inputs=["padded_profile", "variant_indices"],

outputs=["viewed_flags"],

axis=0,

),

# count the number of variants and compute the random propensity (1/N)

helper.make_node(

op_type="Shape",

inputs=["variant_ids"],

outputs=["number_of_variants_int"],

),

helper.make_node(

op_type="Cast",

inputs=["number_of_variants_int"],

outputs=["number_of_variants"],

to=TensorProto.FLOAT,

),

helper.make_node(

op_type="Div",

inputs=["one", "number_of_variants"],

outputs=["random_propensity"],

),

# pick a random index

helper.make_node(

op_type="RandomUniformLike",

inputs=["variant_ids"],

outputs=["random_values"],

low=0.0,

high=1.0,

seed=RANDOM_SEED,

dtype=TensorProto.FLOAT,

),

helper.make_node(

op_type="ArgMax",

inputs=["random_values"],

outputs=["random_index"],

),

# pick the first viewed variant (deterministic, but only meaningful

# when at least one viewed_flag is 1.0)

helper.make_node(

op_type="ArgMax",

inputs=["viewed_flags"],

outputs=["viewed_index"],

),

# decide which branch to use: was any *applicable* variant viewed?

# we sum viewed_flags (not profile) so this is true only when

# one of the variants currently in variant_ids has been viewed

helper.make_node(

op_type="ReduceSum",

inputs=["viewed_flags"],

outputs=["total_viewed"],

keepdims=0,

),

helper.make_node(

op_type="Greater",

inputs=["total_viewed", "zero"],

outputs=["any_viewed"],

),

# select the index based on whether anything applicable was viewed

helper.make_node(

op_type="Where",

inputs=["any_viewed", "viewed_index", "random_index"],

outputs=["index"],

),

# gather the chosen variant ID

helper.make_node(

op_type="Gather",

inputs=["variant_ids", "index"],

outputs=["variant_id"],

axis=0,

),

# propensity:

# 1.0 if we deterministically served a previously-viewed variant

# 1/N otherwise (uniform random over the applicable variants)

helper.make_node(

op_type="Where",

inputs=["any_viewed", "one", "random_propensity"],

outputs=["propensity"],

),

]

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

propensity_output = helper.make_tensor_value_info(

name="propensity", elem_type=TensorProto.FLOAT, shape=[None]

)

graph = helper.make_graph(

nodes=nodes,

name="Random variant with exclusive selection",

inputs=[variant_ids_input, profile_input],

outputs=[variant_id_output, propensity_output],

)

onnx_model = helper.make_model(

ir_version=10,

graph=graph,

opset_imports=[

helper.make_opsetid("", 21),

helper.make_opsetid("ai.onnx.ml", 4),

],

)

To store the model, add the interactions_viewed profile property to the model and generate feature names from the available variant IDs:

model = bc.get_model(MODEL_ID)

model.type = blueconic.domain.ModelType.DIALOGUE

model.model = onnx_model.SerializeToString()

model.property_ids = ["interactions_viewed"]

model.feature_names = [f"interactions_viewed={variant_id}"

for variant_id in VARIANT_IDS]

bc.update_model(model)

Run an automated A/B test

Use this model to determine which variant has the highest average conversion value per view. Once enough data has been gathered, the model automatically switches to the best-performing variant.

We assume that you have used a statistical method — for example, the solve_power method on statsmodels.stats.power.TTestIndPower — to calculate the minimum number of views per variant based on the number of variants and past conversion data. Once that number of views is reached, the variant with the highest conversion value per view is chosen.

from onnx import helper, TensorProto

# instead of defining a constant here, perform a power analysis

# based on your existing conversion data and the number of variants

SAMPLE_SIZE_PER_VARIANT = 10_000

# you can set this random seed e.g. based on the current timestamp

RANDOM_SEED = 0.0

# a list of variant IDs for the model to choose from

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

# the number of times the variant has been viewed

views_input = helper.make_tensor_value_info(name="views", elem_type=TensorProto.INT64, shape=[None])

# the total value of the conversions

conversion_value_input = helper.make_tensor_value_info(

name="conversion_value", elem_type=TensorProto.FLOAT, shape=[None]

)

nodes = [

# the sample size per variant

helper.make_node(

"Constant",

inputs=[],

outputs=["sample_size"],

value=helper.make_tensor(

"sample_size_val", TensorProto.INT64, [], [SAMPLE_SIZE_PER_VARIANT]

),

),

# a small epsilon to avoid division-by-zero errors

helper.make_node(

"Constant",

inputs=[],

outputs=["epsilon"],

value=helper.make_tensor("epsilon_val", TensorProto.FLOAT, [], [1e-9]),

),

# check if all variants hit the required sample size

helper.make_node("ReduceMin", inputs=["views"], outputs=["min_views"], keepdims=0),

helper.make_node(

"GreaterOrEqual", inputs=["min_views", "sample_size"], outputs=["is_exploitation"]

),

# select a random variant

helper.make_node(

op_type="RandomUniformLike",

inputs=["variant_ids"],

outputs=["random_values"],

low=0.0,

high=1.0,

seed=RANDOM_SEED,

dtype=TensorProto.FLOAT,

),

helper.make_node(

op_type="ArgMax", inputs=["random_values"], outputs=["random_index"], axis=0, keepdims=1

),

helper.make_node("Cast", inputs=["views"], outputs=["views_float"], to=TensorProto.FLOAT),

# add epsilon to safely avoid 0 views division-by-zero errors

helper.make_node("Add", inputs=["views_float", "epsilon"], outputs=["safe_views"]),

# calculate average conversion value per view

helper.make_node("Div", inputs=["conversion_value", "safe_views"], outputs=["performance"]),

# select the best performing variant

helper.make_node(

op_type="ArgMax", inputs=["performance"], outputs=["best_index"], axis=0, keepdims=1

),

# select the exploration or exploitation index based on the condition

helper.make_node(

"Where",

inputs=["is_exploitation", "best_index", "random_index"],

outputs=["selected_index"],

),

# gather the variant ID matching the finalized index

helper.make_node(

op_type="Gather",

inputs=["variant_ids", "selected_index"],

outputs=["variant_id"],

axis=0,

),

]

# output: the finalized variant ID

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

graph = helper.make_graph(

nodes=nodes,

name="Automated A/B test",

inputs=[variant_ids_input, views_input, conversion_value_input],

outputs=[variant_id_output],

)

onnx_model = helper.make_model(

ir_version=10, graph=graph, opset_imports=[helper.make_opsetid("", 21)]

)

Note: This model can start serving a different variant if the initially chosen variant suddenly starts performing worse. This is an explore-then-exploit model: it first explores (serving random variants) and then exploits (serving the best variant).

Optimize for conversion value automatically

The previous approach has two significant limitations:

  • the minimum number of conversions has to be known in advance; if that number is too low, the decision can be wrong

  • during exploration, the suboptimal variant is served a large fraction of the time, which has a direct cost in conversion value

This is known as the Explore-Exploit tradeoff: you need to explore enough (serve random variants) to identify the best variant, but you also want to exploit (serve the best variant) as much as possible to maximize value.

The agentic solution to this problem is a multi-armed bandit. The agent decides, on each decision, whether to explore or exploit.

A simple and effective example is the Epsilon-Greedy algorithm. The agent explores epsilon percent of the time (e.g. 10%) and exploits the rest of the time.

from onnx import helper, TensorProto

# you can set this random seed e.g. based on the current timestamp

RANDOM_SEED = 0.0

# the epsilon parameter of Epsilon-Greedy

# with 0.1 we are exploring 10% of the time

EPSILON = 0.1

# a list of variant IDs for the model to choose from

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

# the number of times the variant has been viewed

views_input = helper.make_tensor_value_info(name="views", elem_type=TensorProto.INT64, shape=[None])

# the total value of the conversions

conversion_value_input = helper.make_tensor_value_info(

name="conversion_value", elem_type=TensorProto.FLOAT, shape=[None]

)

nodes = [

# the epsilon parameter of epsilon-greedy

helper.make_node(

"Constant",

[],

["const_epsilon"],

value=helper.make_tensor("const_epsilon", TensorProto.FLOAT, [], [EPSILON]),

),

helper.make_node(

"Constant",

[],

["const_one_f"],

value=helper.make_tensor("const_one_f", TensorProto.FLOAT, [], [1.0]),

),

# compute expected value per variant

helper.make_node("Cast", ["views"], ["views_float"], to=TensorProto.FLOAT),

helper.make_node("Max", ["views_float", "const_one_f"], ["views_safe"]),

helper.make_node("Div", ["conversion_value", "views_safe"], ["expected_value"]),

# select the best arm (greedy)

helper.make_node(

"ArgMax",

["expected_value"],

["greedy_index"],

axis=0,

keepdims=1,

),

# select a random arm

helper.make_node(

"RandomUniformLike",

["variant_ids"],

["uniforms_k"],

dtype=TensorProto.FLOAT,

low=0.0,

high=1.0,

seed=RANDOM_SEED

),

helper.make_node(

"ArgMax",

["uniforms_k"],

["random_index"],

axis=0,

keepdims=1,

),

# decide between explore and exploit by tossing a coin

helper.make_node(

"RandomUniform",

[],

["coin"],

dtype=TensorProto.FLOAT,

low=0.0,

high=1.0,

shape=[1],

),

# explore = (coin < epsilon)

helper.make_node("Less", ["coin", "const_epsilon"], ["explore"]), # bool, shape [1]

# chosen_index = explore ? random_index : greedy_index

helper.make_node(

"Where",

["explore", "random_index", "greedy_index"],

["chosen_index"],

),

# look up the chosen variant ID based on the chosen index

helper.make_node(

"Gather",

["variant_ids", "chosen_index"],

["variant_id"],

axis=0,

),

# propensity computation

helper.make_node("Shape", ["variant_ids"], ["k_shape"]),

helper.make_node("Cast", ["k_shape"], ["k_float"], to=TensorProto.FLOAT),

helper.make_node("Div", ["const_epsilon", "k_float"], ["eps_over_k"]),

helper.make_node("Sub", ["const_one_f", "const_epsilon"], ["one_minus_eps"]),

helper.make_node("Add", ["one_minus_eps", "eps_over_k"], ["greedy_prop"]),

helper.make_node("Equal", ["chosen_index", "greedy_index"], ["chose_greedy"]),

helper.make_node(

"Where",

["chose_greedy", "greedy_prop", "eps_over_k"],

["propensity"],

),

]

# the ID of the chosen variant

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

# the propensity for the chosen variant

propensity_output = helper.make_tensor_value_info(

name="propensity", elem_type=TensorProto.FLOAT, shape=[1]

)

graph = helper.make_graph(

nodes=nodes,

name="Epsilon-Greedy variant selection",

inputs=[

variant_ids_input,

views_input,

conversion_value_input,

],

outputs=[variant_id_output, propensity_output],

)

onnx_model = helper.make_model(

ir_version=10, graph=graph, opset_imports=[helper.make_opsetid("", 21)]

)

This model works correctly when conversion values per action are roughly fixed or normally distributed. When conversion values have outliers, consider the following improvements:

  • Store the median conversion value per variant in the model instead of using the conversion_value input directly.

  • Store the number of views and conversions at a fixed past date per action so they can be subtracted from current counts, allowing the agent to adapt more quickly to changing behavior.

  • When behavior is expected to remain stable, decay epsilon over time so the model explores less and serves the best variant more often.

Train a model based on an existing A/B test

The previous models pick one overall best variant. In practice, the best variant often depends on the individual visitor.

If you have previously run an A/B test without a model, you already have the data needed to train one. The interactions_viewed profile property records which variants each profile viewed, and the interactions_converted property records which they converted on. The profile_to_feature_dict method converts a profile into a feature dictionary.

There are two common ways to structure training:

  • Single learner: Add the viewed variant ID to the input features and train one model that predicts conversion probability across all actions. The model can learn who tends not to convert across all actions, reducing the number of conversions needed.

  • Multiple learners: Train a separate model per variant. Each model uses all available data for its action to predict that action's conversion rate.

Retrieve the data

Use profile_to_feature_dict to convert a profile into a feature dictionary:

from blueconic.utils import profile_to_feature_dict

def profile_to_rows(profile, property_ids, variant_ids):

"""

Convert a profile to one row per viewed variant

"""

context = profile_to_feature_dict(profile, property_ids)

viewed = set(profile.get_values("interactions_viewed"))

converted = set(profile.get_values("interactions_converted"))

# yield a row for each variant viewed by this profile

for variant_id in variant_ids:

if variant_id not in viewed:

continue

# yield a tuple with

# * a dictionary containing the training data (including the variant ID)

# * whether a conversion was observed for this variant

yield (variant_id, context, variant_id in converted)

You can then retrieve all profiles that viewed one of the configured variants. You may also want to add an attribution window to exclude profiles that may still convert in the near future.

When retrieving the data, also request the interactions_viewed and interactions_converted profile properties — we use these to determine which variants were viewed and converted.

PROFILE_PROPERTY_IDS = bc.get_blueconic_parameter_values("Profile properties", "profile_property")

if not PROFILE_PROPERTY_IDS:

raise ValueError("Please configure profile properties")

actions = []

contexts = []

rewards = []

# retrieve the profile data

for profile in bc.get_profiles(

# only retrieve profiles that viewed at least one of the variants

filters = blueconic.get_filter("interactions_viewed").contains_any(

*VARIANT_IDS

),

properties = PROFILE_PROPERTY_IDS + [

"interactions_viewed", "interactions_converted"

],

count = 0

):

for action, context, reward in profile_to_rows(

profile, PROFILE_PROPERTY_IDS, VARIANT_IDS

):

actions.append(action)

contexts.append(context)

rewards.append(reward)

This produces three lists:

  • actions contains the variant ID for each view

  • contexts contains a profile feature dictionary for each view

  • rewards is a boolean indicating whether the view led to a conversion

Tip: For larger datasets, write the rows to disk (for example with sqlite3) instead of keeping everything in memory.

Before feeding the contexts to a machine-learning algorithm, convert them to feature vectors. A DictVectorizer is sufficient:

import numpy as np

from sklearn.feature_extraction import DictVectorizer

# convert the data to feature vectors using a DictVectorizer

vectorizer = DictVectorizer(dtype=np.float32)

context_vectors = vectorizer.fit_transform(contexts)

# delete the contexts list to free up some memory

del contexts

Train a single model

A single learner for all actions has two advantages: actions can share each other's data during training, and inference time does not grow meaningfully with the number of variants.

When using a single learner, feature interactions are important. There are two common ways to ensure interactions occur:

  • compute interaction features manually (between the context and the actions)

  • use a model that creates interaction features automatically, such as a decision-tree ensemble

This example uses a GradientBoostingClassifier, which generates feature interactions automatically — eliminating the need to compute them manually. The actions are one-hot encoded and concatenated to the context vectors before training.

import numpy as np

import scipy.sparse as sp

from sklearn.preprocessing import OneHotEncoder

from sklearn.ensemble import GradientBoostingClassifier

# create a One-Hot Encoder for the actions

encoder = OneHotEncoder(dtype=np.float32)

# create a mapping between variant index and the action feature vector

# we will need this mapping to create our ONNX model later

action_mapping = encoder.fit_transform(

np.reshape(VARIANT_IDS, (len(VARIANT_IDS), 1))

).toarray()

# create the action vectors to train the model

action_vectors = encoder.transform(np.reshape(actions, (len(actions), 1)))

# train a classifier on the concatenated context vectors

# and one-hot encoded actions

clf = GradientBoostingClassifier()

clf.fit(sp.hstack([context_vectors, action_vectors]), rewards)

Next, build the ONNX model. Start with the inputs and outputs:

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

profile_input = helper.make_tensor_value_info(

name="profile", elem_type=TensorProto.FLOAT, shape=[None]

)

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

Then, in the graph:

  • tile the profile tensor so each variant in variant_ids has its own row

  • convert the variant IDs to a one-hot encoded tensor

  • concatenate the tiled profile tensor and the variants tensor

from onnx import helper, numpy_helper, TensorProto

context_action_features_nodes = [

# create a constant containing an integer 0

# we need this to unsqueeze the profile input

helper.make_node(

op_type="Constant",

inputs=[],

outputs=["int_0"],

value=helper.make_tensor(name="int_0", data_type=TensorProto.INT64, vals=[0], dims=[1]),

),

# create a constant containing an integer 1

# we need this to construct the axes for tiling the profile input

# and to gather the conversion probabilities of the classifier

helper.make_node(

op_type="Constant",

inputs=[],

outputs=["int_1"],

value=helper.make_tensor(name="int_1", data_type=TensorProto.INT64, vals=[1], dims=[1]),

),

# create a constant containing the one-hot encoded variant vector

# for each variant

helper.make_node(

op_type="Constant",

# add a row of zeroes for unknown variants

value=numpy_helper.from_array(

np.vstack([action_mapping, np.zeros(action_mapping.shape[1])], dtype=np.float32)

),

inputs=[],

outputs=["action_mapping"],

),

# count the number of variant IDs

helper.make_node(

op_type="Shape",

inputs=["variant_ids"],

outputs=["number_of_variants"],

),

helper.make_node(

op_type="Cast",

inputs=["number_of_variants"],

outputs=["number_of_variants_float"],

to=TensorProto.FLOAT,

),

# map the variant IDs to variant indices

helper.make_node(

op_type="LabelEncoder",

domain="ai.onnx.ml",

default_int64=len(VARIANT_IDS),

keys_strings=VARIANT_IDS,

values_int64s=np.arange(len(VARIANT_IDS)),

inputs=["variant_ids"],

outputs=["variant_indices"],

),

# retrieve the action vectors for the variants

helper.make_node(

op_type="Gather",

inputs=["action_mapping", "variant_indices"],

outputs=["action_tensor"],

),

# tile the profile tensor

# so that there is a separate row for each variant in variant_ids

helper.make_node(

op_type="Unsqueeze",

inputs=["profile", "int_0"],

outputs=["profile_2d"],

),

helper.make_node(

op_type="Concat",

inputs=["number_of_variants", "int_1"],

outputs=["tile_axes"],

axis=0,

),

helper.make_node(

op_type="Tile",

inputs=["profile_2d", "tile_axes"],

outputs=["profile_tensor"],

),

# concatenate the profile tensor and the action tensor

# this gives us the input for the classifier

helper.make_node(

op_type="Concat",

inputs=["profile_tensor", "action_tensor"],

outputs=["context_action_pairs"],

axis=1,

)

]

Then convert the trained classifier to ONNX:

from skl2onnx import to_onnx

from skl2onnx.common.data_types import FloatTensorType

# convert the classifier to ONNX

classifier_nodes = to_onnx(

model=clf,

initial_types=[

("context_action_pairs", FloatTensorType([clf.n_features_in_]))

],

# disable returning a dictionary,

# we want the propensity output to be a number

options={"zipmap": False}

).graph.node

The remaining steps are:

  • add the converted classifier nodes to the graph

  • select the variant with the highest conversion probability

variant_selection_nodes = [

# gather the second column of the probabilities output

# since it contains the probability of a conversion

helper.make_node(

op_type="Gather",

inputs=[classifier_nodes[-1].output[0], "int_1"],

outputs=["conversion_probabilities"],

axis=1

),

# get the variant index with the highest conversion probability

helper.make_node(

op_type="ArgMax",

inputs=["conversion_probabilities"],

outputs=["variant_index"],

axis=0,

keepdims=0

),

# turn the variant index into a variant ID

helper.make_node(

op_type="Gather",

inputs=["variant_ids", "variant_index"],

outputs=["variant_id"],

axis=0,

)

]

Finally, assemble the full ONNX graph and model:

graph = helper.make_graph(

name = "Single classifier",

nodes = [

*context_action_features_nodes,

*classifier_nodes,

*variant_selection_nodes

],

inputs = [variant_ids_input, profile_input],

outputs = [variant_id_output]

)

onnx_model = helper.make_model(

ir_version=10,

graph=graph,

opset_imports=[

helper.make_opsetid("", 21),

helper.make_opsetid("ai.onnx.ml", 4),

],

)

Train multiple models

To train a separate model for each variant:

import numpy as np

import scipy.sparse as sp

from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import StandardScaler

models = []

actions_np = np.asarray(actions)

rewards_np = np.asarray(rewards)

# scale the vectors so that they can be used with logistic regression

scaler = StandardScaler(with_mean=False)

scaled_contexts = scaler.fit_transform(context_vectors)

# train a logistic regression model for each action

for variant_id in VARIANT_IDS:

# this example assumes that each variant_id exists in actions_np

mask = actions_np == variant_id

clf = LogisticRegression()

clf.fit(scaled_contexts[mask], rewards_np[mask])

models.append(clf)

You can try other classifiers instead of LogisticRegression, but keep inference performance in mind, especially when there are many variants.

In the ONNX model we need to:

  1. Feed the profile input to a Scaler that applies the same scaling as the StandardScaler

  2. Feed the scaled tensor to each trained model

  3. Concatenate the results from all the models

  4. Pick the variant with the highest predicted value

skl2onnx can convert the StandardScaler and LogisticRegression for you, but since both map almost directly to existing ONNX operators we do the conversion manually here.

from onnx import helper, numpy_helper, TensorProto

variant_ids_input = helper.make_tensor_value_info(

name="variant_ids", elem_type=TensorProto.STRING, shape=[None]

)

profile_input = helper.make_tensor_value_info(

name="profile", elem_type=TensorProto.FLOAT, shape=[None]

)

variant_id_output = helper.make_tensor_value_info(

name="variant_id", elem_type=TensorProto.STRING, shape=[None]

)

# map the variant IDs to variant indices

label_encoder_node = helper.make_node(

op_type="LabelEncoder",

domain="ai.onnx.ml",

default_int64=len(VARIANT_IDS), # we'll add a default model for unknown variants

keys_strings=VARIANT_IDS,

values_int64s=np.arange(len(VARIANT_IDS)),

inputs=["variant_ids"],

outputs=["variant_indices"],

)

# convert the StandardScaler to ONNX

scaler_node = helper.make_node(

op_type="Scaler",

domain="ai.onnx.ml",

inputs=["profile"],

outputs=["profile_scaled"],

offset=scaler.mean_.tolist() if scaler.with_mean else [0.0] * len(scaler.scale_),

scale=(1.0 / scaler.scale_).tolist(),

)

unsqueeze_axis_node = helper.make_node(

op_type="Constant",

inputs=[],

outputs=["unsqueeze_axis"],

value=helper.make_tensor(

name="unsqueeze_axis", data_type=TensorProto.INT64, dims=[1], vals=[0]

),

)

# unsqueeze the scaled profile tensor to a 2D tensor

unsqueeze_node = helper.make_node(

op_type="Unsqueeze", inputs=["profile_scaled", "unsqueeze_axis"], outputs=["profile_scaled_2d"]

)

model_nodes = []

for i, clf in enumerate(models):

logistic_regression_nodes = [

# create the coefficients tensor

helper.make_node(

op_type="Constant",

inputs=[],

outputs=[f"{i}_coef"],

value=helper.make_tensor(

name=f"{i}_coef",

data_type=TensorProto.FLOAT,

dims=[1, len(clf.coef_[0])],

vals=clf.coef_[0].tolist(),

),

),

# create the bias tensor

helper.make_node(

op_type="Constant",

inputs=[],

outputs=[f"{i}_intercept"],

value=helper.make_tensor(

name=f"{i}_intercept",

data_type=TensorProto.FLOAT,

dims=[1],

vals=clf.intercept_.tolist(),

),

),

# perform linear regression

helper.make_node(

op_type="Gemm",

inputs=["profile_scaled_2d", f"{i}_coef", f"{i}_intercept"],

outputs=[f"{i}_logit"],

transB=1,

),

# logistic link to turn the logits into probabilities

helper.make_node(op_type="Sigmoid", inputs=[f"{i}_logit"], outputs=[f"{i}_proba"]),

]

model_nodes.extend(logistic_regression_nodes)

model_nodes.append(

# return a probability of 0 for unknown variants

helper.make_node(

op_type="Constant",

inputs=[],

outputs=[f"{len(VARIANT_IDS)}_proba"],

value=helper.make_tensor(

name=f"{len(VARIANT_IDS)}_proba", data_type=TensorProto.FLOAT, vals=[[0.0]], dims=[1, 1]

),

),

)

# concatenate the output of the different models

concat_node = helper.make_node(

op_type="Concat",

inputs=[f"{i}_proba" for i in range(len(VARIANT_IDS) + 1)],

outputs=["probabilities"],

axis=0,

)

# squeeze the probabilities to make them the same rank as the variant indices

squeeze_node = helper.make_node(

op_type="Squeeze", inputs=["probabilities"], outputs=["conversion_probabilities"]

)

# get the shape of the variant_ids

shape_node = helper.make_node(

op_type="Shape", inputs=["variant_ids"], outputs=["variant_ids_shape"]

)

# create a float tensor which will hold the probability per variant

constant_of_shape_node = helper.make_node(

op_type="ConstantOfShape",

inputs=["variant_ids_shape"],

outputs=["data"],

value=helper.make_tensor(name="value", data_type=TensorProto.FLOAT, dims=[1], vals=[0.0]),

)

# scatter the results to the right indices

scatter_node = helper.make_node(

op_type="ScatterElements",

inputs=["data", "variant_indices", "conversion_probabilities"],

outputs=["conversion_probabilities_for_variants"],

axis=0,

)

# get the variant index with the highest conversion probability

argmax_node = helper.make_node(

op_type="ArgMax",

inputs=["conversion_probabilities_for_variants"],

outputs=["variant_index"],

axis=0,

keepdims=0,

)

# turn the variant index into a variant ID

gather_variant_id_node = helper.make_node(

op_type="Gather",

inputs=["variant_ids", "variant_index"],

outputs=["variant_id"],

axis=0,

)

graph = helper.make_graph(

name="Multiple classifiers",

nodes=[

label_encoder_node,

scaler_node,

unsqueeze_axis_node,

unsqueeze_node,

*model_nodes,

concat_node,

squeeze_node,

shape_node,

constant_of_shape_node,

scatter_node,

argmax_node,

gather_variant_id_node,

],

inputs=[variant_ids_input, profile_input],

outputs=[variant_id_output],

)

onnx_model = helper.make_model(

ir_version=10,

graph=graph,

opset_imports=[

helper.make_opsetid("", 21),

helper.make_opsetid("ai.onnx.ml", 4),

],

)


Train an agentic AI model

The classifier approach has three limitations: it assumes a prior A/B test, it cannot adapt to changing behavior, and it does not cleanly separate cause and effect. Address these by building a contextual multi-armed bandit that:

  • Combines a trained classifier (the oracle) with Epsilon-Greedy into a reinforcement learning loop.

  • Emits a propensity on every decision.

  • Trains on the modelInput and modelOutput recorded in the view timeline event.

  • Weights samples by their inverse propensity.

Retrieve the data

Every view timeline event for a dialogue with an associated model has modelInput and modelOutput properties. Use these to build a training set that pairs each historical decision with the reward it produced:

import json

def profile_to_rows(profile, variant_ids):

"""

Convert a profile to one row per viewed variant

"""

rows = []

# profile.timeline_events is newest-first, so we reverse it to walk

# the events oldest-to-newest. That way every view is appended to

# `rows` before any conversion that might be attributed to it.

for timeline_event in reversed(profile.timeline_events):

variant_id = timeline_event.get_value("variantId")

# skip events that do not belong to the variants we are interested in

if variant_id not in variant_ids:

continue

if timeline_event.event_type_id == "conversion":

# find the most recent view of the same variant and mark it converted

for i in reversed(range(len(rows))):

row = rows[i]

if row[0] == variant_id:

# set the reward to True

row[3] = True

break

elif timeline_event.event_type_id == "view":

if timeline_event.get_value("modelInput") is None:

continue

rows.append((

variant_id,

# the context

json.loads(timeline_event.get_value("modelInput")),

# the propensity

json.loads(timeline_event.get_value("modelOutput"))[

"propensity"

][0],

# the reward

False,

))

return rows

A conversion is attributed to the most recent view of the same variant, matching BlueConic's last-touch attribution model.

actions = []

contexts = []

propensities = []

rewards = []

for profile in bc.get_profiles(

filters=blueconic.get_filter("interactions_viewed").contains_any(*VARIANT_IDS),

timeline_events_filter= blueconic.TimelineEventsFilter(

event_type_ids=["view", "conversion"],

event_properties=["variantId", "view.modelInput", "view.modelOutput"],

count=1000

),

properties=["visits"],

count=0,

):

for action, context, propensity, reward in profile_to_rows(

profile,

VARIANT_IDS

):

actions.append(action)

contexts.append(context)

propensities.append(propensity)

rewards.append(reward)

The resulting data has the same shape as in the previous A/B-test example, with the addition of propensities — which we will use as sample weights.

Train the oracle model

Train a classifier that predicts the probability of a conversion for each variant. Then combine the classifier with Epsilon-Greedy to serve the variant with the highest predicted conversion probability most of the time, and serve a random applicable variant the remaining epsilon percent of the time.

Weight samples by propensity

Some variants were shown more often than others in historical data. Inverse Propensity Weighting (IPW) corrects for this by weighting each sample by 1 / propensity, so views that were unlikely under the logging policy count more:

import numpy as np

propensities_np = np.asarray(propensities)

# clip the weights to a maximum value to avoid a small number of

# very-low-propensity samples dominating the training. The right value

# is a tradeoff: lower clips reduce variance but introduce more bias.

# 10.0 is a common default; tune it based on your propensity distribution.

weights = np.clip(1.0 / propensities_np, a_min=None, a_max=10.0)

clf.fit(scaled_contexts, rewards_np, sample_weight=weights)

A common refinement is Self-Normalized IPW (SNIPW), which rescales weights so they sum to the number of samples (weights *= len(weights) / weights.sum()). This is a biased but lower-variance estimator, and often produces more stable training when some propensities are very small.

Bootstrap a new model

On day one there are no logged decisions to train on. Use the random-with-propensity model as your starting point: it serves uniformly random variants and logs a propensity of 1/N per decision, which provides exactly the data needed to train an oracle.

Keep this random model live until each variant has accumulated enough views — a few hundred views per variant is a reasonable starting point, though the right number depends on conversion rate and the complexity of context features. Monitor min(views_per_variant) over time and switch once the threshold is crossed. After that, retrain the oracle on a schedule (for example, daily) so it keeps adapting as more data comes in.

Combine the building blocks above into a complete contextual multi-armed bandit: train an oracle on the IPW-weighted historical data, then wrap it in the Epsilon-Greedy ONNX graph so it explores epsilon percent of the time and emits a propensity output on every decision.


Test your model

Before putting a model into production, verify that it makes correct decisions and handles unexpected inputs without failing. Unit testing and off-policy evaluation are the main tools.

Unit testing

Use the onnxruntime package to execute ONNX models. In a unit test, you can add temporary additional outputs to inspect intermediate results in the computational graph.

Set up your testing environment:

import onnxruntime as ort

import ipytest

ipytest.autoconfig(raise_on_error=True)

Then add tests that validate the model behaves correctly:

%%ipytest

def test_calculated_probabilities():

onnx_model_copy = copy.copy(onnx_model)

onnx_model_copy.graph.output.append(

helper.make_tensor_value_info(

name="conversion_probabilities",

elem_type=TensorProto.FLOAT,

shape=[None, None]

)

)

session = ort.InferenceSession(onnx_model_copy.SerializeToString())

for context_vector in context_vectors:

actual_probabilities = session.run(

output_names = ["conversion_probabilities"],

input_feed = {

"variant_ids": VARIANT_IDS,

"profile": context_vector.toarray().squeeze()

}

)[0].T[0]

desired_probabilities = clf.predict_proba(

np.hstack([

np.tile(context_vector.toarray(), (len(VARIANT_IDS), 1)),

action_mapping

])

)[:,1]

np.testing.assert_almost_equal(

actual_probabilities,

desired_probabilities,

decimal=6

)

Validate these corner cases that can occur in production:

  • What happens when the order of variant_ids changes?

  • What happens when a variant is missing from variant_ids?

  • What happens when an unknown ID is added to variant_ids?

  • What happens when all values in the profile input are zero?

Off-policy evaluation

Off-policy evaluation uses logged data to compare the performance of different models and agents. Algorithms such as Self-Normalized Inverse Propensity Scoring (SNIPS) and Doubly Robust (DR) are common choices. These methods are useful for evaluating bandit algorithms before deployment.

When not all variants are always applicable — for example, when a model is shared across multiple dialogues with different "Who" settings — the applicable variant IDs also need to be logged. Add a variant_ids output to the model; it automatically contains the IDs of all applicable variants and is stored in the modelOutput property.


Monitor model performance

Monitor the model's performance, especially during the first few weeks. BlueConic provides standard insights to track Dialogue and variant performance. For additional metrics in AI Workbench, two options are available:

  • Any number passed to the properties of the update_status call can be plotted over time using the Notebook performance insight.

  • Any plot added to the output notebook can be displayed in a dashboard using the Notebook insights.


Next steps

  • Periodically retrain your models in AI Workbench or upload updated ONNX models from your data science environment to keep up with shifting visitor behavior.

  • Consider alternative bandit algorithms such as Thompson Sampling, which does not explore a fixed fraction of the time and can outperform Epsilon-Greedy when epsilon is difficult to tune.

  • Apply a lookback window when retrieving training data (for example, only the last seven days) so training reflects recent behavior.

  • When actions have very different conversion values, incorporate conversion value into the model using a hurdle model that multiplies predicted conversion probability by predicted conversion value.

Did this answer your question?