Data Exploration (video)

Summary

Summary

The goal of the Data Exploration phase is to investigate data with iterative visualizations and statistical summaries.

DSS allows users to quickly explore data through a visual UI or code.  This can be done from within the project’s Flow or in a separate workspace for experimentation, known as the Lab.

For any dataset, the Explore tab shows a tabular representation of a sample of the data.   Within the Explore tab, the Analyze tool produces a quick summary of the column’s distribution, statistics, and outliers.

The Charts tab contains a drag and drop interface, allowing users to produce quick visualizations of the data in the sample.

The Lab helps avoid overcrowding the Flow with unnecessary items that will not be used in production.  The Lab has Visual Analyses and Code Notebooks (Python, R, SQL, Scala, Hive or Impala). In either case, work in the Lab can be deployed to the Flow when ready.

DSS offers integrations with several popular IDEs (integrated development environments), including PyCharm, Sublime Text, RStudio, and VS Code.