How To Easily Connect Your Data to Snowflake with Snowpark

How To Easily Connect Your Data to Snowflake with Snowpark

Snowflake recently hosted Snowpark Day: a showcase of Snowflake’s developer-friendly framework. Snowpark gives developers the tools to flip, spin, and grind data using the gear they already know. The gear in this case is Python, Java, or Scala – the languages Snowpark supports; and the tricks are the ability to execute queries, load data, and build functions entirely on Snowflake. Data remain in Snowflake and your favorite libraries are available to use. What are the basics of Snowpark? And what are the possibilities?

Getting On the Lift

Imagine an organization that has recently created a Snowflake account. They’ve moved much of their data into their account, but they’re still pulling that data out for ML model inference and other analytical dataflows. They’ve learned about Snowpark and want to start refactoring their dataflows to use it so that data remain in Snowflake. This will reduce their dataflow processing times and cloud provider costs. They decide to start with a simple dataflow to test the solution.

This dataflow extracts granular shipment data and computes a report-ready dataset by joining with order data in an AWS S3 bucket; this dataset is then loaded back into Snowflake to be consumed. By refactoring their Python scripts that perform this dataflow to use Snowpark, they’ll be able to perform all data processing in their Snowflake account.

Refactoring starts by speaking with their Snowflake rep about enabling Snowpark in their account and installing the Snowpark Python package to their compute environment. Important note: Python 3.8.X is the only Python version currently supported.

They first import the Session class into their scripts:

					from snowflake.snowpark.session import Session

They establish a connection with their account:

					snowflake_param_dict =  {
    'account': <account name>,
    'user': <username>,
    'password': <password>,
    'role': <role; optional>,
    'warehouse': <warehouse; optional>,
    'database': <database; optional>,
    'schema': <schema; optional>}
session = Session.builder.configs(snowflake_param_dict).create()


And now they can execute Snowflake queries from their compute environment:

					session.sql(f"USE WAREHOUSE {wh_name}").collect()
session.sql(f"USE SCHEMA {db_name}.{schema_name}").collect()
session.sql(f"CREATE TABLE IF NOT EXISTS {table_name}").collect()

This organization has made it to the top of the lift and is ready to shred the slopes of Snowpark with some style. 

Hitting the Halfpipe

Our imaginary company is going to use pandas DataFrames in their Python scripts to transform data in Snowflake. Instead of pulling data down, Snowpark pushes code to Snowflake as SQL and executes it using Snowflake’s compute layer.

If you’re ready to hit the slopes with Snowpark and need help designing and building your organization’s next-level dataflows, we can help