Cloning a Library from a Remote Git Repository

An important end goal of writing code is to be able to reuse it, whether within a Dataiku project, across projects within a Dataiku instance, or for projects external to Dataiku. To this end, you can define code Libraries within Dataiku DSS that contain reusable code, and you can connect these libraries to remote git repositories.

Prerequisites

  • Familiarity with code in Dataiku.
  • Familiarity with the basics of Git

Technical Requirements

Connect to a Remote Git Repository

Within any Dataiku project, navigate to Code > Libraries to the Library Editor.

../../_images/library-editor.png
  • Click Git > Import from Git.
  • Enter https://github.com/dataiku/dss-plugin-sample-correlations as the Repository.
  • Leave master as the branch to checkout
  • Enter python-lib as the Path in repository. This repository contains a plugin, and for this project library, we only want to retrieve the library that is part of the plugin. To retrieve the entire plugin, we can clone it from the remote Git repo to the Plugin editor.
  • Enter python/compute-corr as the Target path. This determines where in the project library the remote code will be stored.
  • Click Save and Retrieve.

You should now see the contents of the remote library in the Library Editor.

../../_images/library-cloned.png

The library functions can now be used in code in the Dataiku project by including an import statement such as:

from compute_corr import *

Pulling Updates from the Remote Repository

When code on the remote repository is updated, you can pull those updates to your local project library. From within the Library Editor:

  • Click Git > Manage references.
  • Click Update on each individual remote Git repository that you want to pull updates from
  • Alternatively, click Update All References to pull updates from every remote Git repo.
../../_images/library-update.png

Note

Changes made to your local Dataiku project library cannot be pushed back to the remote Git repository.