Help Center

Use case: Exporting data to Google BigQuery via Google Pub/Sub

With a BlueConic firehose connection, you can stream BlueConic events and profile data to Google Pub/Sub. From here, you can use this data in various ways, including sending the data from Google Pub/Sub to the analytics data warehouse tool (DWH) Google BigQuery.

Example: Streaming data from BlueConic to Google BigQuery

The example below illustrates one approach to streaming data from BlueConic through Google Pub/Sub to Google BigQuery. Here's an outline of the process:

  1. Set up a connection from BlueConic to Google Pub/Sub. 
  2. Prepare a Google BigQuery table to receive this information.
  3. Use Google Dataflow to connect a Google Pub/Sub topic to the BigQuery table.

The technology stack resembles this illustration:

How do I use BlueConic firehose connections to stream event data to Google Cloud Pub/Sub?

Note: For importing or exporting data in batch mode (but not BlueConic events) directly from BlueConic to Google BigQuery, you can also use the BlueConic Google BigQuery connection.

Before you get started, a few things to keep in mind:

  • Sending the data from Google Pub/Sub to BigQuery requires a virtual machine and temporary storage. Google will bill you for this.
  • Be prepared to ingest a lot of data, and to solve problems along the way. BlueConic cannot give in-depth support for issues with systems outside of BlueConic.

Prerequisites

To send profile and event data from BlueConic to Google Big Query, you'll need the following:

  • BlueConic installed and set up, with a Google Cloud Pub/Sub firehose connection configured.
  • A Google Cloud account with billing set up.

Step 1: Sending data from BlueConic to a Google Cloud Pub/Sub topic

Start by setting up a BlueConic Google Cloud Pub/Sub firehose connection. This will send data to a specific Pub/Sub topic. See Google Cloud Pub/Sub connection for details. When you set up the connection, record the name of the topic you’re sending data to. The example outlined below uses a topic named "bctobq" in the project "pubsub-example-221115".

Step 2: Setting up a Google BigQuery table

  1. Open the Google BigQuery window in the Google console.
  2. You need a BigQuery project, so you can either select an existing project or create a new one.
    How do I use Google BigQuery with BlueConic?
  3. Next, you need a dataset within this project; either select an existing dataset or create a new one.
    How do I update Google BigQuery datasets with profile data from BlueConic?
  4. Within the dataset, you need to create a new table.
    How to create a new table in Google BigQuery using BlueConic data streaming?
  5. You will need to create a table schema predefined for the data you want to retrieve. See Create and Use Tables in the Google BigQuery documentation for details on the standard format. But you will need to create a specific schema for the settings you’ve set up in the Google Cloud Pub/Sub connection in BlueConic, to account for the data that will stream through the firehose. One approach is to use a JSON example file with your BlueConic data, with schema autodetection in BigQuery.
    How to use Google BigQuery autodetection with BlueConic firehose event streaming for first-party customer profile data

Make note of your topic, project, dataset, and table names. This example uses these sample names:

Topic bctobq
Project pubsub-example-221115
Dataset bc
Table example
Bucket bc-pubsub

See Sample firehose messages for details on the JSON format the Google Cloud Pub/Sub connection will use to stream through the firehose.

Be aware that changes to BlueConic profile properties that are being exported are not automatically carried over to the BigQuery schema. Any added profile properties will also require changes to the BigQuery table.

Step 3: Connecting Google Cloud Pub/Sub to BigQuery through Google Dataflow

Next you configure storage for the data in BigQuery. The BigQuery table serves as the end point for firehose messages.

Set up temporary storage

  1. Open the Google storage console: https://console.cloud.google.com/storage/browser
  2. If there is already a bucket, you can choose to reuse this. Otherwise click Create Bucket to create a new bucket. The default options are sufficient for this purpose.
    How to create a new bucket in Google BigQuery via BlueConic firehose connections
  3. Write down the bucket name, for example "bc-pubsub".
    How to use BlueConic firehose event streaming with Google BigQuery

Enable the Google Dataflow API

The Google Dataflow API is not enabled by default. You can do this by going to: https://console.developers.google.com/apis/library/dataflow.googleapis.com or from the Google Cloud APIs and Services menu.

Set up the Dataflow

  1. Open the Google Dataflow console: https://console.cloud.google.com/Dataflow
    How to use the Google Dataflow console with BlueConic firehose connections to stream event data from BlueConic
  2. Select Create a job from a template.
  3. Enter a job name and select the Google Pub/Sub to BigQuery template.
    This should give you a list of fields.
  4. Enter the field names using the notes you’ve taken in the steps above. In our example it would be:
    Google Cloud Pub/Sub input topic projects/pubsub-example-221115/topics/bctobq
    BigQuery output table pubsub-example-221115:bc.example
    Temporary location gs://bc-pubsub/tmp

    Note that the input topic uses forward slashes (/) but the output table uses a colon (:) and a period (.).
    bigquery_dataflow.png
  5. Click the Run job button. This will create the job and start streaming the dataflow.

After you stream data to Google BigQuery, if you change something in BlueConic (add a profile property, for example), you will need to update your Google BigQuery table. In this case, we suggest you create a new BigQuery table.

Alternative method

Another way to move data through the Google Cloud Pub/Sub firehose stream is to send the data to GCS Text as a file and then send it to Google BigQuery. This would save you the steps of customizing the BigQuery table to accommodate changes in the BlueConic profile properties sent through the connection.

Learning more about BlueConic firehose connections

For details about BlueConic firehose connections and example firehose messages, see:

 

Was this article helpful?
0 out of 0 found this helpful