Using Jupyter Notebooks in DSS¶
Jupyter notebooks are a favorite tool of many data scientists. They provide users with an ideal environment for interactively analyzing datasets directly from a web browser, combining code, graphical output, and rich content in a single place.
Given their usefulness for doing data science, Jupyter notebooks are natively embedded in Dataiku DSS, and tightly integrated with other components.
Creating a Jupyter Notebook¶
Depending on your objectives, you can create a Jupyter notebook in DSS in a number of different ways:
- In order to create a blank notebook, navigate to the Notebook section from the Code menu (shortcut
G+N). Click + New Notebook. You will then have the choice of creating a code notebook for a variety of languages.
At this point, you can start a Jupyter notebook from a Python, R, or Scala kernel in the code environment of your choice. You will also be asked to choose a starter template. For example, will you be reading in a dataset from memory or using Spark?
- A second option simplifies reading in the dataset of interest using the Dataiku API. From the Flow, select the dataset and enter the Lab. Create a new code notebook.
The starter code of a notebook created in this manner will have already read in the chosen dataset to a
df variable, whether it may be a Pandas, R, or Scala dataframe.
- One last option is similar to the Lab route. From the Flow, select a dataset and create a Python, R, or Scala code recipe. You can then select the Edit in Notebook option. This will take you into a Jupyter notebook where you can interactively workshop the recipe before saving it back into the Flow.
Pre-defined Notebook Templates¶
Another useful feature of Jupyter notebooks in DSS is pre-defined code notebooks to kickstart common kinds of statistical analyses, such as dimensionality reduction, time series, or topic modeling. You can run these notebooks as given, or modify them to go deeper into an analysis.
Create one by entering the Lab and choosing a pre-defined option instead of a new one.
You can also create your own notebook templates through the plugin system.
For more information about pre-defined notebooks, please see the reference documentation.
Generating a Notebook from a Model¶
Finally, another interesting feature is the ability to create a Jupyter notebook directly from a trained machine learning model.
For explanatory purposes, you can export similar versions of models trained using the in-memory Python engine to a Jupyter notebook. You can access this feature from the caret menu next to the Deploy button.
For more information, please consult the reference documentation.
Jupyter notebooks are first-class citizens in DSS. They are in the toolbox of most of the data scientists, and they make a great environment for interactively analyzing your datasets using Python, R, or Scala.
To learn more about notebooks in DSS, including SQL notebooks, please see the reference documentation on code notebooks.