Using pandas_gbq to import dataframe to BigQuery

How do I prepared and passed the Google Cloud Professional Data Engineer Certification Exam

Running BigQuery DAG in Airflow locally with Docker

Dark Light

Given data in a text file which is the format of a python dictionary, we can first turn it into a pandas dataframe and import it to BigQuery using pandas_gbq package. Here is how we can do it.

First of all we got our text file with the following structure, it is in python dictionary format with column name, index and the data stored in arrays.

{"columns":[ ... ], "index": [ 1,2,3, ... ], "data" : [[ ... ],[ ... ], ... ]}

We are going to use Google Colab as our python notebook platfrom for sake of simplicity. You can find the link to the notebook here.

First we need to import the pandas and the pandas_gbq package. Then we need to read the text file and use the encoding type utf-8-sig, storing the read data to variable named “data”.

As our data is a dictionary with columns, index and data, we need to set the orient parameter as split when reading it using the read_json function. Now store the dataframe as df.

Before we pass the data to BigQuery, we need to authenticate our access to Google Cloud Platform.

We need to aware that the field name with space is now allowed in BigQuery, so we need to replace the column names with space in our data with underscore. And finally we can import our data to BigQuery using to_gbq method, specifying our target dataset plus table name and also the GCP project_id.

After running the code, we can check in BigQuery and see the data is successfully imported.

Related Posts