6  Databricks Free Edition

The Databricks Free Edition provides free access to the Databricks serverless environment to see their vision of jupyter notebooks and to work with a Spark environment. The users do not incur any costs while using Databricks Free Edition.

Read more about the limitations of Free Edition at this FAQ. Of note,

6.1 Free Edition Setup

  1. Create an account at Databricks Free Edition
  2. Please use your university email (BYU-I emails can used with Google or Microsoft).
  3. Create a notebook and start exploring

6.1.1 Next Stage Setup

6.1.1.1 Serverless Cluster Setup

You can configure your serverless cluster using a databricks yml file. Below is an example of a databricks.yml file that can be used to set up a cluster with Polars and Lets-Plot libraries.

User Icon (Top Right) -> Settings -> Workspace admin -> Compute -> Workspace base environments for serverless compute -> Manage -> Create

environment_version: '4'
dependencies:
  - polars==1.37.1
  - lets-plot==4.8.0

6.1.1.2 Managing Users in your Free Edition Workspace

I recommend creating a team workspace and a personal workspace for your experience. One team member will need to create a Free Edition account and then set up the team workspace. Once the team workspace is created, additional users can be added. That user can then invite their own .byu.edu email to the team workspace.

User Icon (Top Right) -> Settings -> Workspace admin -> Identity and access -> Users -> Manage -> Add user -> Invite user by email

This should allow your team to work together in the same Free Edition workspace.

6.1.1.3 Spatial Setup

Databricks Free Edition does not support libraries that require native code compilation, so spatial libraries such as Geopandas and Shapely will not work. However, they do provide spatial sql functions that can be imported for use in Pyspark or used with Spark SQL.

# Example of using spatial functions in Databricks Free Edition

# Import spatial functions for Pyspark
from pyspark.databricks.sql import functions as dbf
# dbf.st_distance()

# Example of using spatial functions in Spark SQL
examp = spark.sql("SELECT st_distance(st_geomfromtext('POINT Z (0 0 310)'),st_geomfromtext('LINESTRING(-10 20,20 10)')) as dist")

6.2 Using Databricks notebooks