Friday, September 21, 2012

How to apply JOINS on Google BigTables with Google BigQuery?

This article is in continuation towards exploring Google BigQuery and Google BigTables. In last few post we learnt what Google BigQuery is and how to create Dataset and BigTables with Google BigQuery. We also learnt how to read data from BigTables. In this post we will learn how we can join two BigTables and read data from both tables.

Google BigQuery service allows to apply Join on BigTables. There are two type of JOIN that Google BigQuery services supports on BigTables.

  • Inner Join
  • Left Outer Join

Google BigQuery also expects that when we are joining two BigTables, one of the BigTable is relatively small in size. A Big Table is considered as small if the data size in the table is less than 7MB. If we try to join two BigTables with data size of more than 7 MB, the Join will fail.

We can apply row filter by using WHERE clause while joining two tables but it only supports AND condition. What it mean is we can apply multiple row filter criteria using AND condition but if we try to apply JOIN and put OR condition in the Where clause the Join will fail.

Let us create an example by joining two BigTables. We have following two CSV files with data of Employees and Departments.

I created two BigTables Employee and Dept by navigating to Google BigQuery webpage https://bigquery.cloud.google.com

The goal is to join these two BigTables and read Employee code, Employee name and Department name. To join these two tables, I have written following query.

SELECT [BigCompany.Employee.EmpCode], [BigCompany.Employee.EmpName], [BigCompany.Dept.DeptName] FROM [BigCompany.Employee] JOIN [BigCompany.Dept] ON [BigCompany.Employee.DeptCode] = [BigCompany.Dept.DeptCode]

I clicked on Run Query and results were expected. The two BigTables were successfully Join and resultset had data from both tables

So this is how we can apply Join on Big Tables.

Popular Posts

Real Time Web Analytics