Altair SmartWorks Analytics

 

Working with Duplicates

The Duplicates node allows you to keep, remove, or flag duplicate values in a column. It is applicable to fields of all data types.

Prerequisite

  • An Execution Profile with an active session linked to the workflow

Steps

  1. Produce a Data Frame node by importing a CSV or database table.

  2. From the Data Preparation group of the Nodes tabbed page, drag and drop the Duplicate node from the Palette to the workflow canvas. The Duplicate node has one input socket and one output socket. Connect the output socket of the Data Frame node to the input socket of the Duplicate node.

  3. Open the Node Viewer. You can double-click on the node or use the Open option provided in the node menu. 

  4. The Configuration tab displays by default. Specify the following details to configure the Duplicate node.

  5.  

    Property

    Description

    Input properties

    Displays the input table name for the Duplicates node and is non-editable.

    Output properties

    Specify the output table for the Duplicates node.

    Duplicate settings

    Select a setting (Remove duplicates, Keep duplicates, Flag all duplicates).

    • Remove duplicates removes all of the duplicate records.

    • Keep duplicates keeps the duplicate records.

    • Flag all duplicates flags the duplicate records as True – 1, or False – 0.

    Case sensitive

    Select the check box to use separate duplicate columns in case- sensitive values.

    Columns to check

    Select single/multiple group columns by clicking the icon (Select all, Unselect all) from the input data source. The preview table/data grid is displayed based on the input properties.

    Columns to include

    Select columns to be included from the input data for identifying duplicate records.

    Included column names

    Select the prefix or suffix for column names to be included in the duplicated records.  

     

  6. Check the code that will be executed for your specified Duplicates configuration by saving your current configuration and then clicking on the Code tab of the Duplicates Node Viewer. You can use the Code Editor to refine the code further.

  7.  

  8. Click Save to save the changes that you have made and then click Run to execute the Duplicates node. 

  9. Alternatively, click Discard to discard the changes that you have made.