Convert a shapefile to Polygon data in BigQuery

Top SQL concepts to know about

Creating a Cloud Dataflow pipeline

Dark Light

Geo-spatial operation on GIS shapefile to make geodata available on BigQuery table

1. Download the target shapefile from the authority that creates the shapefile map

LSOA of London shapefile: https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london

Important note: identify the coordinating system your shapfile is using, the above one is using British National Grid (EPSG: 27700) which need to be converted to (ESPG:4326)

2. Start a Google Compute Engine and a Cloud Storage bucket

Change VM to have access to all Cloud APIs when creating the GCE, ssh to your VM, install gdal and unzip to your VM

sudo apt-get update
sudo apt-get install gdal-bin unzip

Start the bucket with default setting and upload the zip file which containing your shapefile

Copy the zip file from the bucket to your VM, unzip it

sudo gsutil cp gs://gary-yiu-bucket-1/LSOA_2011_London.zip .
unzip LSOA_2011_London.zip

Convert to ESPG:4326 if you shapefile is not on this standard

ogr2ogr -t_srs EPSG:4326 LSOA_4326.shp LSOA_2011_London_gen_MHW.shp

3. Convert to GeoJSON-encoded CSV file

ogr2ogr -f csv -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from LSOA_4326" lsoa.csv LSOA_4326.shp

4. Upload the file to the BUCKET

gsutil -m cp *.csv gs://gary-yiu-bucket-1/

5. Load data to BigQuery

bq mk demo
bq load --autodetect --replace demos.london_lsoa gs://gary-yiu-bucket-1/lsoa.csv

6. Query the table and transform the GeoJSON to Polygon

   SELECT ST_GeogFromGeoJson(geom, make_valid => TRUE) AS polygon 
   FROM demos.london_lsoa

I prepared these polygon data so I can perform some analysis on crime rate in London which can be found here.

Related Posts