CSV file in S3 from a Trino Query

2 min readApr 26, 2022

You want to generate a single CSV in a S3 bucket from a Trino SQL query result ?

Good idea ! We show here 2 options to do this with 2 simple examples of Python AWS Lambda functions.

Tests and examples are done with Starburst Galaxy, the Trino SaaS version, but it’s the same for a hosted Trino or Starburst Enterprise cluster.

We use 2 different options to generate a CSV file:

With a CTAS (Create table) query.
With a Python Pandas DataFrame object.

Architecture of the interaction between Starburst Galaxy and the AWS environment (Glue, S3, Lambda).

Code and examples are hosted in a GitHub repository.

Function with a CTAS query

The lambda_CTAS.py function uses a CTAS query to generate a single CSV file in a S3 bucket.

The CTAS query is like:

CREATE TABLE s3_catalog.tmp.your_file WITH (csv_separator = ',',external_location='s3://your_bucket/tmp/your_file', format='csv') as SELECT ....

Where tmp is an existing Schema in your Trino or Galaxy S3 Catalog (Glue or Hive), here named s3_catalog.

The extra steps into the function after the CTAS query run are to:

Add .csv suffix to the file name
Add columns name as header (from Columns name passed as function parameters)
Copy the file in your specific bucket folder
Remove the temporary table created (from the CTAS)

Important, you need also to set some properties to the session in order to generate only 1 CSV file not compressed.

session_properties:

scale_writers : true
writer_min_size : 1TB
task_writer_count : 1
s3_catalog.compression_codec : NONE

Function with a Pandas DataFrame

For the lambda_pandas.py function, we just load the SQL query output in a Pandas DataFrame object, and then we generate a CSV file in a bucket folder from the DataFrame.

No specific session properties need to be set but the query result must fit in-memory.

In the GitHub repository associated to this post, you can also find:

Layer package in order to use Trino Python library and Python pandas objects in these Lambda functions (about Lambda layer).
Python code to test the 2 functions.
Flask Web application (in the s3-file-application folder) using the 2 Lambda functions.

Have fun!

CSV file in S3 from a Trino Query

Function with a CTAS query

Function with a Pandas DataFrame

Written by Victor Coustenoble