Visualizing Time Series Data¶
Time series visualization is typically one of the first things to do when working with time series datasets. Visualization is useful for exploring and gaining a better understanding of time series datasets. DSS provides the ability to easily create and interact with time series charts. See Time series visualization for more details.
Let’s Get Started!¶
In this tutorial, you will learn to easily visualize time series data prior to performing analysis.
Begin by downloading the orders_by_date zip file and extracting the orders_by_date CSV file . This dataset should look familiar to you, as it is derived from the fictional Haiku T-Shirt’s order logs that you used in the Basics tutorial.
This tutorial assumes that you have obtained the orders_by_date.csv file containing the time series dataset.
Create Your Project¶
Begin by creating a new +Blank Project in Dataiku DSS. Name the project Time Series Basics, then upload the orders_by_date.csv time series dataset.
The dataset consists of four columns:
- order_date, which has been parsed
- tshirt_category, an identifier that labels each row as belonging to one of six tshirt categories, each category corresponding to a time series
- tshirt_quantity, the daily number of items sold in a category
- amount_spent, the daily amount spent on a tshirt category
The dataset also consists of six different time series (one for each value of the tshirt_category column). Each time series consists of two variables (or dimensions): tshirt_quantity and amount_spent. Note that the data is stored in long format.
Visualize the Time Series Dataset¶
To visualize time series data, you have the options of two kinds of aggregations for the X axis:
- Timeline, where you see your data during a lapse of time. You can choose between: a “Dynamic timeline” (automatic) or a “Fixed timeline” (based on year, quarter, month, …).
- Regroup, where you see your data aggregated by date elements, such as quarter of year, month of year, week of year, …
The automatic aggregation mode allows you to display arbitrarily large time series with aggregation pushed down to the database. This mode works with a parsed date column.
Create Line Plots¶
Create a line chart of the daily amount_spent for each time series. To do this,
- Open the orders_by_date dataset and go to the Charts tab.
- Select the Lines chart.
- Drag and drop “total_spent” as the Y variable, and “order_date” as the X variable. Notice that the “Display timeline” option appears and is enabled.
- Drag and drop “tshirt_category” as the categories to use for grouping.
Below the main chart in the display area is a timeline that is enabled by selecting the Display timeline option. This option is available for line charts when you use a date in the X axis, and it is useful for providing an overview of the whole data, the current zoom level, and an observation window into the data.
The line plot is quite noisy. Zoom into the main chart to adjust your view and to see more details about the data.
Notice that the vertical bars in the timeline adjust to show a smaller window that highlights the current interval selection in the main chart. You can also perform panning on the chart by dragging the selected interval left or right. Double clicking in the selected interval expands it to cover the whole data interval.
To change the aggregation key for the X axis, click the drop-down arrow next to “order_date (Automatic)” and select a value other than Automatic for “Date ranges”.
For example, lets create a chart that shows the total spent per Quarter of year.
- Click +Chart to add a new chart. Keep the default histogram plot.
- Drag and drop “amount_spent” as the Y variable, and “order_date” as the X variable. Notice that the “Display timeline” option is not available for the histogram plot.
- Drag and drop “tshirt_category” to use for grouping.
- Click the drop-down arrow next to “order_date (Automatic)” and select Quarter of year as the value for “Date ranges”.
The plot shows the data aggregated by quarter of year. Over the years, you can see that sales are typically lowest in the second quarter of the year.
You can explore the charts tool further to create visually stunning charts and to get more insight into your data.