Creating Charts in DSS¶
It is easy to create a variety of useful charts in DSS, as you have already seen in Tutorial: Basics.
This brief tutorial reviews the basics of creating visualizations in Dataiku DSS.
The fictional Haiku T-Shirts company wants to understand more about their typical order size. They know from experience that most customers order a single shirt, but they do occasionally get larger orders. What they don’t know is whether these larger orders constitute a significant portion of their business, and whether certain categories of shirts are more likely to be ordered in larger quantities.
How to create charts¶
Working with the haiku_shirt_sales data in a visual analysis, to create a chart in the Charts tab:
- Drag Count of records to the Y axis and nb_tshirts to the X Axis; drag category to the color droplet.
The resulting chart shows us that 10 equal-width bins loses a lot of information, because all orders of 1-5 shirts are clumped together.
So let’s break the display of nb_shirts down into raw values:
- Click on the nb_tshirts label and select None, use raw values.
- Create a filter to remove the value “Hoodies” as a category from the chart.
The vast majority of orders were for 1 shirt. From the perspective of number of orders, this is not a significant portion of Haiku T-Shirts’ business.
From the scale of the X axis, we can see that at least one person made an order of close to 40 T-shirts, but the total is too small to see on the chart, relative to the number of orders for 1 shirt.
In order to get a better view of the categories by order size:
- Click on the chart type selector and choose Stacked 100%.
- Drag tshirt_price and total to the Tooltip area. On the total dropdown, select Sum. This adds summary statistics to your tooltips.
Now we can easily see that the proportion of sales by category appears to differ by order size. By hovering over bars in the chart, we can see, for example, that while women’s black T-shirts account for a greater and greater proportion of sales as the order size increases from 1 to 5 shirts, the total value of the orders decreases.
Thus, whether these visual differences represent a statistically significant model that the Haiku T-Shirts company can exploit is a question we’ll leave for further analysis, because there is always a next step in data science!
Which data is used by charts, and where computations take place¶
There are two places where you can create charts in DSS:
- in a Visual Analysis (using the Lab)
- on a Dataset
Both visual analyses and datasets give you control over which data your chart is created with – sampled or complete.
We strongly recommend that, unless you have a relatively small dataset, you use a sample for building interactive charts in visual analyses. This is because a visual analysis is intended for exploration and quick visual feedback, and thus always uses the in-memory DSS engine.
When building charts on a dataset however, you can also use an in-database or in-cluster engine, depending on the location of the original data. Look at the following page for additional information on sampling and engines for charts.