Removing Duplicates

Excel can remove duplicate values, using all columns or a subset to determine uniqueness of a row. Duplicates are simply removed, with no way to recover them later.

../../../_images/excel-remove-duplicates.png

Dataiku’s Distinct recipe identifies and removes duplicate rows within a dataset. Additionally, it can track which rows had duplicates, and how many, in the original dataset. See the video below for an introduction to handling duplicates in Dataiku.


For more information on the Distinct recipe, please see the reference documentation.