- Using
TABLE_QUERY
with Legacy SQL - Using
TABLE_DATE_RANGE
with Legacy SQL - Using
_TABLE_SUFFIX
with Standard SQL
BigQuery offers users a number of powerful methods to allow searching and filtering based on the names of tables within a particular dataset using wildcard functions or the asterisk *
character.
Below we’ll explore methods of table wildcard filtering for both Legacy SQL and Standard SQL solutions.
Using TABLE_QUERY
with Legacy SQL
The TABLE_QUERY
function is a powerful method that effectively allows you to generate a secondary sub-query based on the name of the table (table_id
) to further hone your results.
For example, we’re using the [gdelt-bq:hathitrustbooks]
dataset, which contains a table for each year of publications ranging from 1800 to 2012. Therefore, if we want to query just the table range of the 1920s (1920 - 1929), we can use the TABLE_QUERY
function and within that expression use the REGEXP_MATCH
function to ensure we only query tables from the 1920s:
SELECT
BookMeta_Identifier,
BookMeta_Date
FROM
TABLE_QUERY([gdelt-bq:hathitrustbooks],
'REGEXP_MATCH(table_id , r"^192[\d]")')
ORDER BY
BookMeta_Identifier DESC
LIMIT
1000
And the results are as expected – a range of publications from the 1920s only.
[
{
"BookMeta_Identifier": "yul.11995746_000_00",
"BookMeta_Date": "1922"
},
{
"BookMeta_Identifier": "yul.11729715_000_00",
"BookMeta_Date": "1922"
},
{
"BookMeta_Identifier": "yul.11368340_000_00",
"BookMeta_Date": "1920"
},
{
"BookMeta_Identifier": "yul.11165435_000_00",
"BookMeta_Date": "1922"
},
{
"BookMeta_Identifier": "yale.39002088672077",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002084611509",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002056071179",
"BookMeta_Date": "1929"
},
{
"BookMeta_Identifier": "yale.39002044555168",
"BookMeta_Date": "1921"
},
{
"BookMeta_Identifier": "yale.39002040679681",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002040679657",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002040679632",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002040330061",
"BookMeta_Date": "1925"
},
...
]
Using TABLE_DATE_RANGE
with Legacy SQL
In cases where you have a series of daily tables (perhaps from partitioning them within BigQuery) which have a suffix of the date (in the required YYYMMDD
format), you can utilize the TABLE_DATE_RANGE
function to query a range of those daily tables only.
For example, if our bookstore
dataset has a series of daily tables with names in the format: bookstore.booksYYYYMMDD
, we can query specific daily tables from January 1st, 1920
to December 31st, 1929
with this query:
SELECT
title,
author
FROM
TABLE_DATE_RANGE(bookstore.books,
TIMESTAMP('1920-01-01'),
TIMESTAMP('1929-12-31'))
LIMIT
1000
BigQuery will automatically infer and generate the dated table names based on the prefix we provided as well as the TIMESTAMP
range, then it will query the data accordingly.
Using _TABLE_SUFFIX
with Standard SQL
If you’re explicitly using Standard SQL with BigQuery, you’ll need an alternative to functions like TABLE_QUERY
and TABLE_DATE_RANGE
. Thankfully, you can take advantage of the _TABLE_SUFFIX
psuedo column, which automatically contains matched values when you use the *
wildcard symbol within your table name in the query.
For example, if we wanted to replicate the same query as above using the Legacy SQL TABLE_QUERY
function – where we retrieve results only from tables between 1920
and 1929
– we’d use the following query in Standard SQL:
SELECT
BookMeta_Identifier,
BookMeta_Date
FROM
`gdelt-bq.hathitrustbooks.*`
WHERE
_TABLE_SUFFIX BETWEEN '1920' AND '1929'
ORDER BY
BookMeta_Identifier DESC
LIMIT
1000
The key is the *
wildcard character we placed within our table name in the FROM
clause. While _TABLE_SUFFIX
is (by definition) intended to represent the suffix (or final portion) of the full table name (as inmy_table*
), we can use it as in the above example to represent the entire table name and filter those names in the WHERE
clause.
Sure enough, the returned results are identical to the Legacy SQL example above:
[
{
"BookMeta_Identifier": "yul.11995746_000_00",
"BookMeta_Date": "1922"
},
{
"BookMeta_Identifier": "yul.11729715_000_00",
"BookMeta_Date": "1922"
},
{
"BookMeta_Identifier": "yul.11368340_000_00",
"BookMeta_Date": "1920"
},
{
"BookMeta_Identifier": "yul.11165435_000_00",
"BookMeta_Date": "1922"
},
{
"BookMeta_Identifier": "yale.39002088672077",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002084611509",
"BookMeta_Date": "1925"
},
{
"BookMeta_Identifier": "yale.39002056071179",
"BookMeta_Date": "1929"
},
{
"BookMeta_Identifier": "yale.39002044555168",
"BookMeta_Date": "1921"
},
{
"BookMeta_Identifier": "yale.39002040679681",
"BookMeta_Date": "1925"
}
]