Deploying to Production

Once you have designed a flow and automated updates to the flow, you can deploy it to a production environment.

Before jumping into the hands-on portion of the tutorial, watch the following video, which walks through an outline of the steps.


Development and Production environments

A development (or sandbox) environment is an environment where you test new analyses in your project. Failures in this environment are an expected part of its experimental nature.

A production environment is where serious operational jobs are run. This environment should be available whenever necessary and may serve external consumers for their day-to-day decisions, whether those consumers are humans or software. Failure is not an option in production, and the ability to roll-back to a previous version is critical.

Dataiku provides two dedicated nodes to handle development and production:

  • Dataiku Design Node is used for the development of data projects.
    • It provides capabilities for the creation of data pipelines amd models, plus the definition of how they are meant to be reconstructed. Projects developed in the Design Node are packaged and handed off to the Automation Node.
  • Dataiku Automation Node is used to import packaged projects defined in the Design Node and run them in the production environment.
    • When you make updates to the project in the Design node, you can create an updated version of the project package, import the new package into the Automation node, and control which version of the project runs in production.

Development work from the Design node flows to the Automation node, and while it is technically possible to make changes to a project in the Automation node, those changes don’t flow back to the Design node, so it’s best practice to do all development in the Design node.

Let’s Get Started!

In this tutorial, you will learn how the Design and Automation Nodes work together:

  • Packaging flows for deployment
  • Versioning flows
  • Deploying packages in a production environment

We will work with the fictional retailer Haiku T-Shirt’s data.


This tutorial assumes that:

Create Your Project

From the homepage of the Dataiku Design node, click +New Project > DSS Tutorials > Automation > Deployment (Tutorial).

For the purposes of this tutorial, the flow and automation scenarios are complete, and we simply need to package the flow and deploy it to the Automation node.

Packaging a Flow into a Bundle

A video below goes through the content of this section.

  • In order to package the flow into a bundle, from Settings menu in the top navigation bar, choose Bundles.
  • Click Create your first bundle and name it automation_v1.


Key concept: Bundle

A bundle is a snapshot of a complete DSS project.

The bundle includes the project configuration so that it can be deployed to a Dataiku DSS Automation node. In addition, sometimes, in your Flow, the data for some datasets (such as enrichment data) or models (that are retrained in the development and not the production environment) need to be transported to the production environment.

A bundle can contain data for an arbitrary number of datasets, managed folders and saved models.

A bundle thus acts as a consistent packaging of a complete flow. On the Automation node, you then activate a bundle to switch the project to a new version. Bundles are versioned, and you can revert to a previous bundle in case of a production issue with the new bundle.

You can set up multiple Automation nodes to create continuous delivery pipelines (for example with a pre-production automation node, a performance test one, and the production one).

For this tutorial, we will include the data for the Orders and Customers managed folders. In a real-life setting however, these primary data sources would be different on the development and production environments.

  • Click Create.
  • Download the bundle to your local system by selecting the bundle and clicking the Download button in the right-hand panel.

The following video goes through what we just covered.

Deploying a Bundle

A video below goes through the content of this section.

  • Log in to your Dataiku Automation node (be sure it’s an Automation node and not another Design node) and create a New project.
  • Import the bundle you just downloaded.


Connections mapping

Note that you may need to re-map connections to data sources that exist in the Design node to how they should or will exist on the Automation node. Dataiku DSS will prompt you if this is necessary.

In the Automation node, you need to have connections of the proper type, but their definition can change.

A simple example of this is a SQL database: you’ll have a production database separate from the development database, so when deploying the bundle, you’ll need to reattach the SQL datasets to the production database.

  • Choose to activate a bundle from the list, select automation_v1, and click Activate.

When creating a new project and activating its first bundle, the Dataiku Automation Node deactivates all of its scenarios to avoid unwanted data reconstruction unless explicitly requested.

  • Navigate back to the main project page, and click on the Automation link.
  • Activate the scenario by turning the Rebuild data and retrain model scenario auto-trigger to on.

The following video goes through what we just covered.

That’s it! The flow you set up in the Design node is now running in production.

Versioning a Flow

A video below goes through the content of this section.

Now, let’s say that we want to change the scenario in the Flow so that it runs on a monthly basis, rather than when the underlying data sources are updated. To do this, we will:

  • update the project on the Design node
  • repackage the project into a new bundle version
  • deploy the new bundle to the Automation node

Here are the detailed steps:

  • Open the original project on the Design node. Navigate to the Scenarios tab and open the Rebuild data retrain model scenario.
  • Turn off the existing trigger. Rather than deleting it, in case you want to switch back to this trigger later, click Add trigger, and select Time-based trigger.
  • Name the trigger Monthly rebuild and retrain. Make the frequency of the trigger Monthly and have it set to trigger on the 1st of each month. Save your scenario.
  • Navigate to the Bundles area. Click Create bundle and name it automation_v2.
  • Leave a descriptive release note for your colleague in charge of the production environment, like Changed the scenario trigger to be monthly. These changes are now visible in the bundle’s commit log and diff tabs (accessible in the upper right), both here on the Design node and when the bundle is redeployed on the Automation node.
  • Click Create, then download this bundle to your local system.

The following video goes through what we just covered.

Activate the New Bundle in the Automation Node

A video below goes through the content of this section.

  • In the Automation node, navigate to the Bundles area and click Import bundle.
  • Find the bundle you just created and import it.
  • Select the automation_v2 bundle and activate it.

The project in production has been updated to execute on a monthly basis. You can check this by navigating to the Scenarios and seeing that the changes you made on the Design node are visible in the Automation node. If you need to roll back to a previous version, simply select that version in the Bundles area and activate it.

The following video goes through what we just covered.

Next Steps

Congratulations! Deploying a Flow to production and managing versions of the flow is easy to do in Dataiku DSS.

See the next tutorial on deploying to scoring API to learn how to deploy your models for real-time scoring.