databricks

Install Python Packages on Databricks

Let’s use the same basic setup as in test python code, then use our knowledge from create python packages to convert our code to a package. And finally we will install the package on our Databricks cluster.

Run Databricks Notebooks from DevOps

Python packages are easy to test in isolation. But what if packaging your code is not an option, and you do want to automatically verify that your code actually works, you could run your databricks notebook from Azure DevOps directly using the databricks-cli.

Parameterize Databricks Notebooks

A databricks notebook that has datetime.now() in one of its cells, will most likely behave differently when it’s run again at a later point in time. For example: when you read in data from today’s partition (june 1st) using the datetime – but the notebook fails halfway through – you wouldn’t be able to restart the same job on june 2nd and assume that it will read from the same partition.

Enhance Your Databricks Workflow

With databricks-connect you can connect your favorite IDE to your Databricks cluster. This means that you can now lint, test, and package the code that you want to run on Databricks more easily:

Install databricks-connect

Databricks-connect allows you to connect your favorite IDE to your Databricks cluster. Install Java on your local machine. Uninstall any pyspark versions, and install databricks-connect using the regular pip commands, preventing any changes to be recorded to your virtual environment (prevents mutations to Pipfile and Pipfile.lock).

Test Code in Databricks Notebooks

Companies hire developers to write spark applications – using expensive Databricks clusters – transforming and delivering business-critical data to the end user. It is advised to properly test your software: enhance your databricks workflow. But if there is no time to set up proper package testing, there is always the hacker way of running tests right inside of Databricks notebooks.