Altair SmartWorks Analytics

 

Exporting Data

Tables in a Workflow Canvas may be exported to a CSV file or JDBC database using the Export Text or Export to Database nodes. The resulting CSV file may be stored in some specific location in S3, a local folder, or the SmartWorks Analytics Library. An Export node has one input socket and no output sockets.

Prerequisites

  • An Execution Profile with an active session linked to the workflow

  • A Data Frame node

  • A valid export location added as an internal connection to the Execution Profile linked to the Workflow (if exporting data to a database)

Exporting Data Using the Export Text Node

Steps

  1. Open the Nodes tab in the Palette and then expand the Connect section.

  2. Drag and drop the Export Text node from the Palette to the Workflow Canvas.

  3. Connect the output socket of the Data Frame node to the input socket of the Export Text node.

  4. Configure the node by opening the Node Viewer. You can double-click on the node or use the Open option provided in the node menu.

  5.  

    In the Configuration tab, provide the following details:

    Property

    Description

    Input Properties

    The name of the table to be exported. This field is automatically populated from the Data Frame node and cannot be edited.

    Export Settings

    • Connection Profile – Use the dropdown list provided to select a connection profile (S3, Local, Library)

    • Folder Location - (for Apache Spark) Enter the path to the folder location in which you wish to export your CSV file. Alternatively, click the Browse button to launch the File Browser. Use this browser to select a location/folder in which to export your table as a CSV file and then click Open. Note that the contents of the File Browser will depend on your chosen Connection Profile.

    •  

      The Apache Spark engine autogenerates a name for the exported file and places it in the export folder you selected.  

    • File Location - (for Pandas) Provide a file location (name) to which your CSV file should be exported. You can use the Open browser shown above to navigate to the folder in which the exported file should be placed. The Pandas engine uses the file name you provided and exports the CSV file to the location you specified.   

    • Number of Rows – Enter the number of records you wish to export. If this field is left blank, all of the records in the file will be exported

    • Escape Character – Specify an escape character if necessary

    • Delimiter – Use the dropdown list provided to select a column delimiter; by default, the comma (,) is selected

    • Text Qualifying Character – Use the dropdown list provided to select a text qualifying character; by default, the quotation mark (“) is selected  

    • Text Encoding – Use the dropdown list provided to select a text encoding type; by default, UTF_8 is selected

    • Column Names in First Row – Tick the box provided if the first row of the file contains the columns names; this setting is checked by default

    • Single/Multi File – (for Apache Spark) Tick the box provided if you wish the export the table as a single or multiple files

    • If Folder Exists – Tick one of the boxes provided to instruct the application what to do if the file already exists in the specified folder (Skip, Overwrite, Append). By default, Skip is selected.

    • NOTES:

      • Overwrite: In Pandas

        • If file doesn't exist in the file location - The records are exported and shown in the CSV file

        • If file exists in the file location - The data export overwrites the existing data of the file

      • Overwrite: In Apache Spark

        • Creates a folder if one doesn’t exist

        • Delete any existing files in the folder

        • Creates one or multiple file parts in the folder

      • Append: In Pandas

        • Creates a file if one does not exist with the name provided in the application

        • Adds the new data to the end of the existing file.

      • Append: In Apache Spark

        • Creates a folder if one does not exist with the name provided in the application

        • Creates one or multiple file parts in the folder without affecting any file(s) already present in the folder

    Columns to Include

    Select columns to include in the output table from the input data. All columns of the text file, along with the corresponding data types, are displayed in the grid. Users can check/uncheck which columns should be brought into the Workflow Canvas. By default, all columns are selected.

     

     

    A data preview representing your specifications above displays.

     

  6. To check the code that will be executed for your specified Export configuration, save your specifications and then click on the Code tab of the Export Text Viewer. You can also use the tab that displays to refine the code further.

  7.  

  8. To complete the Export Text node configuration, click Save. To cancel your changes and return to the Workflow Canvas, click Discard or simply close the Export Text Node Viewer. To execute the Export, click the Run button.

Exporting Data Using the Export Database Node

Steps

  1. Open the Nodes tab in the Palette and then expand the Connect section.

  2. Drag and drop the Export to Database node from the Palette to the Workflow Canvas.

  3. Connect the output socket of the Data Frame node to the input socket of the Export to Database node.

  4. Open the Node Viewer. You can double-click on the node or use the Open option provided in the node menu.

  5. In the Configuration tab, specify the following details to configure the Export To Database node.

  6.  

    Property

    Description

    Input Properties

    The name of the table to be exported. This field is automatically populated from the Data Frame node and cannot be edited.

    Export Settings

    • Connection Profile - Select a connection profile from the list of existing database profiles.

    • Table Name - Click on the Browse button to launch a table browser that you can use to locate the database table to which your data should be exported.

    •  

      You can select a specific schema from the Select Schema list or search for a specific table name to narrow your search.

    • If the table exists in the database, select an option (Error, Overwrite, Append). 

    • NOTES:

      • Error

        • If the table exists, no export takes place when the operation is run.

        • If the table does not exit, the export takes place when the operation is run.

      • Overwrite

      • When the export is run, the exported table overwrites all records regardless of the table structure. For example, if a table with 4 columns + 10 rows exists, and you overwrite it with a table with 2 columns + 2 rows, the resultant table will have 2 columns + 2 rows.

      • Append

      • The exported table is added as new rows at the end of the table. Exports of fewer or equal columns are allowed for the same table schema.

    • Number of Rows - Specify the number of rows to be exported from the database. If unspecified, all rows are exported by default. 

    Columns to Include

    Select columns to be included from the input data source for exporting data. The column grid displays all columns of the input data source with data type and the preview table displays records based on the selected columns.

     

  7. Click Save to save the configuration settings.

  8. To check the code that will be executed for your specified Export to Database configuration, save your specifications and then click on the Code tab of the Node Viewer. You can also use the tab that displays to refine the code further.

  9.  

  10. Complete the Export to Database configuration by clicking Save. To cancel your changes and return to the Workflow Canvas, click Discard or simply close the Node Viewer. To execute the export operation, click the Run button.

  11.