With a BlueConic firehose connection, you can stream BlueConic events and profile data to Google Pub/Sub. From here, you can use this data in various ways, including sending the data from Google Pub/Sub to the analytics data warehouse tool (DWH) Google BigQuery.
Example: Streaming data from BlueConic to Google BigQuery
The example below illustrates one approach to streaming data from BlueConic through Google Pub/Sub to Google BigQuery. Here's an outline of the process:
- Set up a connection from BlueConic to Google Pub/Sub.
- Prepare a Google BigQuery table to receive this information.
- Use Google Dataflow to connect a Google Pub/Sub topic to the BigQuery table.
The technology stack resembles this illustration:
Note: For importing or exporting data in batch mode (but not BlueConic events) directly from BlueConic to Google BigQuery, you can also use the BlueConic Google BigQuery connection.
Before you get started, a few things to keep in mind:
- Sending the data from Google Pub/Sub to BigQuery requires a virtual machine and temporary storage. Google will bill you for this.
- Be prepared to ingest a lot of data, and to solve problems along the way. BlueConic cannot give in-depth support for issues with systems outside of BlueConic.
Prerequisites
To send profile and event data from BlueConic to Google Big Query, you'll need the following:
- BlueConic installed and set up, with a Google Cloud Pub/Sub firehose connection configured.
- A Google Cloud account with billing set up.
Step 1: Sending data from BlueConic to a Google Cloud Pub/Sub topic
Start by setting up a BlueConic Google Cloud Pub/Sub firehose connection. This will send data to a specific Pub/Sub topic. See Google Cloud Pub/Sub connection for details. When you set up the connection, record the name of the topic you’re sending data to. The example outlined below uses a topic named "bctobq" in the project "pubsub-example-221115".
Step 2: Setting up a Google BigQuery table
- Open the Google BigQuery window in the Google console.
- You need a BigQuery project, so you can either select an existing project or create a new one.
- Next, you need a dataset within this project; either select an existing dataset or create a new one.
- Within the dataset, you need to create a new table.
- You will need to create a table schema predefined for the data you want to retrieve. See Create and Use Tables in the Google BigQuery documentation for details on the standard format. But you will need to create a specific schema for the settings you’ve set up in the Google Cloud Pub/Sub connection in BlueConic, to account for the data that will stream through the firehose. One approach is to use a JSON example file with your BlueConic data, with schema autodetection in BigQuery.
Make note of your topic, project, dataset, and table names. This example uses these sample names:
Topic | bctobq |
Project | pubsub-example-221115 |
Dataset | bc |
Table | example |
Bucket | bc-pubsub |
See Sample firehose messages for details on the JSON format the Google Cloud Pub/Sub connection will use to stream through the firehose.
Be aware that changes to BlueConic profile properties that are being exported are not automatically carried over to the BigQuery schema. Any added profile properties will also require changes to the BigQuery table.
Step 3: Connecting Google Cloud Pub/Sub to BigQuery through Google Dataflow
Next you configure storage for the data in BigQuery. The BigQuery table serves as the end point for firehose messages.
Set up temporary storage
- Open the Google storage console: https://console.cloud.google.com/storage/browser
- If there is already a bucket, you can choose to reuse this. Otherwise click Create Bucket to create a new bucket. The default options are sufficient for this purpose.
- Write down the bucket name, for example "bc-pubsub".
Enable the Google Dataflow API
The Google Dataflow API is not enabled by default. You can do this by going to: https://console.developers.google.com/apis/library/dataflow.googleapis.com or from the Google Cloud APIs and Services menu.
Set up the Dataflow
- Open the Google Dataflow console: https://console.cloud.google.com/Dataflow
- Select Create a job from a template.
- Enter a job name and select the Google Pub/Sub to BigQuery template.
This should give you a list of fields. - Enter the field names using the notes you’ve taken in the steps above. In our example it would be:
Google Cloud Pub/Sub input topic projects/pubsub-example-221115/topics/bctobq BigQuery output table pubsub-example-221115:bc.example Temporary location gs://bc-pubsub/tmp
Note that the input topic uses forward slashes (/
) but the output table uses a colon (:
) and a period (.
).
- Click the Run job button. This will create the job and start streaming the dataflow.
After you stream data to Google BigQuery, if you change something in BlueConic (add a profile property, for example), you will need to update your Google BigQuery table. In this case, we suggest you create a new BigQuery table.
Alternative method
Another way to move data through the Google Cloud Pub/Sub firehose stream is to send the data to GCS Text as a file and then send it to Google BigQuery. This would save you the steps of customizing the BigQuery table to accommodate changes in the BlueConic profile properties sent through the connection.
Learning more about BlueConic firehose connections
For details about BlueConic firehose connections and example firehose messages, see:
- About BlueConic firehose connections
- Sample firehose messages
- BlueConic to Google Cloud Pub/Sub connection
- Monitoring your firehose traffic data