Altair® Monarch®

 

Show/Remove Duplicate Rows

Show Duplicates and Remove Duplicates and allow you to easily show and/or remove duplicates from your table. Select Show Duplicates to determine the duplicate rows. Select Remove Duplicates to eliminate duplicate rows if you think these are not needed.

For instance you start with a table like this, where record pairs 1 and 2, and 7 and 8 are duplicates :

 

Show Duplicates will create a table like this:

 

Remove Duplicates will create a table like this:

 

You can show or remove rows that have duplicate values for all columns or just specific columns.  In our example for instance, you can remove only rows that have duplicate Customers, or duplicate Account Number and Customers.

Steps:

You can show only duplicates from the Prepare Window. To do so:

  1. Go to the Prepare window and select the table you want to transform.

  2. Select Transform on the Monarch Data Prep Studio Toolbar.

  3. The dialog box that displays allows you to select a transformation.

  4. Select Duplicates.

  5. The Duplicates dialog box displays:

     

  6. Enter the name of the resulting table.

  7. Select the Deduplicate Operation from the drop-down.

  8. To understand the different operations, let us use the following as an example:

     

    We have 2 sets of duplicates: Betty's Music Store has 2 duplicate records and Big Shanty Music has 3 duplicate records.

    • Remove Duplicates - Keep first row

    • With this operation, Monarch Data Prep Studio removes all duplicate records except one row for each duplicate set.  

      With the records above, the result will be:

       

    • Show Duplicates - Show all except first row

    • With this operation, Monarch Data Prep Studio displays all duplicate records except the first row for each duplicate set.  

      With the records above, the result will be:

       

    • Flag Duplicates - Mark with new column

    • With this operation, Monarch Data Prep Studio marks all records that have duplicates.

      With the records above, the result will be:

       

  9. Unselect/select the columns you want to use in evaluating duplicates.

  10.  

    If a column is not selected, this column is ignored even if it has duplicated values.

  11. Click the Case Sensitive box if you want to differentiate values based on case.

  12. For instance, if case sensitive is on, "Betty's Music Store" and "betty's Music Store" are considered distinct values and will not be considered as duplicate.

  13. Select the columns to be included in the resulting table.

    • Select Use All Columns if you want to display all columns.

    • Select Use Selected Columns, and then check the box beside the columns you want to include:

    •  

  14. Select OK.

  15. Monarch Data Prep Studio applies the deduplication and creates a new table.