For customers that export large volumes of profile data on a recurring basis, BlueConic offers advanced functionality that allows you to export that data quickly and efficiently via the REST API v2, Python API, or BlueConic Connections. This functionality is enabled automatically anytime the number of profiles being requested for your export matches 95 percent or more of your total tenant population.
For instance, if your tenant contains 300 million profiles, this functionality will be triggered if you export a segment that has at least 285 million of those profiles.
Note: In addition to providing speed and efficiency, this functionality delivers data fast into existing data lakes enabling large-scale machine-learning training.
The following sections break down exactly how this faster export functionality works, your role in ensuring timely and error-free exports, and next steps to optimize your exports.
How it works
BlueConic uses Apache Cassandra queries for all customer export requests. If the number of profiles being exported falls below 95 percent of your total tenant population, then BlueConic queries Apache Solr first, which then retrieves the data from Cassandra. If the number of profiles meets or exceeds 95 percent, though, BlueConic retrieves that profile data directly from Cassandra using table scanning, which results in a faster export.
Here is a breakdown of exactly how BlueConic processes a request to export profiles:
Notes:
- Even when BlueConic bypasses Solr and goes directly to Cassandra, all profile properties (including any new ones added) are accounted for in the data retrieval.
- If a query fails, the system implements an auto-retry mechanism to reduce export errors and ensure resilience in data retrieval.
Observations
This faster export functionality was first introduced as part of Release 92 (April 2024). During testing, before and after release, we have observed that:
- For large profile exports, when you export 95 percent or more of those profiles, export times improve significantly using Cassandra table scanning as compared to the Solr export.
- When you export less than 95 percent of profiles, performance drops with table scanning because of the number of profiles that need to be discarded from the initial full export.
- Calling the REST API v2 directly is faster than exporting via any BlueConic Connection. This is because there is additional post-processing in a typical connection setup; in contrast, when running the export directly on your client, the processing takes place on your side.
Your role in influencing performance
BlueConic advises running these large exports only when necessary to ensure optimal performance.
If you do need to export a large number of profiles, the export will be quicker if your segment contains 95 percent of all profiles. Review these specifics about each export method to better meet that 95-percent threshold:
When using the REST API v2:
- Use GET /rest/v2/segments/allvisitors/profiles for the “All visitors” segment or GET /rest/v2/segments/{yoursegment}/profiles for a similar segment that will reach the 95-percent threshold.
- For best results, pass a
count
of 100,000 or 1 million. - If your request fails, use the cursor (available with every request) to start again from that page. Improved cursor capabilities allow for efficient batch requests.
For more information, review the BlueConic REST API v2 documentation site.
When using Python API via AI Workbench:
- Use get_profiles('allvisitors') or another segment that will reach the 95-percent threshold.
- Pass a
count
of 100k or 1M for best results.
For more information, review the BlueConic Python API documentation site.
When using BlueConic Connections:
- On the connection’s export goal page, select the “All visitors” segment or a similar segment that will reach that 95-percent threshold.
- Do not use filters (e.g., date filters) that will result in the number of profiles falling below that threshold.
For more information, review the article BlueConic Connections Overview.
Other considerations
To better manage your exports (and your expectations of how fast those exports will run), always consider what YOU have control over:
- Profile size: How many profile properties and data need to be captured.
- Inclusion of Timeline data: If Timeline events are included and what profile properties are retrieved.
- Network: Whether your request is made from a similar zone (e.g., us-east-1).
- Post-processing in Connections: Whether post-processing is involved, which could add to export times.
Tip: When performing profile exports in BlueConic, you can filter properties upfront to improve export efficiency. This is particularly beneficial when exporting a small number of properties. By passing the desired properties using the &properties=prop1,prop2
parameter, you can significantly reduce the export time.
Optimizing your exports
To better support your data export needs, you can reach out to BlueConic Support or your designated CSM to request lowering the threshold for the faster profile export functionality and also to review other potential optimizations based on your tenant setup.
When initiating any of these requests, outline your specific use case needs and any relevant details about your current export processes. BlueConic will work with you to implement the necessary changes and optimizations.