Cloudera Impala

The Cloudera Impala connector is used to access Impala databases.

NOTE: Starting 16.2, this connector is deprecated. The Database connector or JDBC Database connector should be used. Existing workbooks will continue to operate for this 16.2 release, but connectivity will need to be migrated for subsequent releases.

Using Cloudera Impala

  1. Launch the Connect to Data dialog and then select Cloudera Impala.

The Impala Connection dialog displays.

  1. Provide the hostname, user ID, and password required to connect to the Impala database you wish to access. If the port you wish to use is different from the default port, change the default value to the correct one.

  2. Specify other connection options if desired, making sure to separate individual connection strings by a semi-colon.

The following table lists the connection string attributes supported by the Cloudera Impala Wire Protocol driver.

Attribute (Short Name)

Default

ArraySize (AS)

50000

AuthenticationMethod (AM)

0 (User ID/Password)

Database (DB)

default

DataSourceName (DSN)

None

DefaultLongDataBuffLen (DLDBL)

1024

DefaultOrderByLimit (DOBL)

-1 (Disabled)

Description (n/a)

None

EnableDescribeParam (EDP)

0 (Disabled)

GSSClient (GSSC)

native

HostName (HOST)

None

KeepAlive (KA)

0 (Disabled)

LoginTimeout (LT)

30

LogonID (UID)

None

MaxVarcharSize (MVS)

2147483647

Password (PWD)

None

PortNumber (PORT)

None

ProxyUser (PU)

None

RemoveColumnQualifiers (RCQ)

0 (Disabled)

ServicePrincipalName (SPN)

None

StringDescribeType (SDT)

-9 - SQL_WVARCHAR

TransactionMode (TM)

0 (No Transactions)

UseCurrentSchema (UCS)

0 (Disabled)

 

  1. You can either:

  1. The easiest way to select a table and/or view to load is by choosing from a set of predefined tables and views. To do so, ensure that the Tables & Views radio button is selected. If you wish to manually construct a SQL query to pull and load data, ensure that the Query button is selected. Once either a table or view or a query has been selected, the OK button at the bottom of the dialog is enabled.

  2. Click Load Tables to load a list of predefined tables or views. This list can be filtered by entering an appropriate string in the Search Tables search box.

You can also add a duplicate column.

  1. Select a table to display the available columns in the Search Columns list. Once a table has been selected, the Query text box is updated to reveal the results of a SELECT * FROM TABLE query. Any other selection made updates the Query text box accordingly.

  2. Select the columns to add to your data table checking their corresponding Output Column box.

  3. If you wish to parameterize a specific column, check the Parameterize checkbox and, in the dropdowns that display, select the desired value.

If the data returned is to be aggregated, check the Aggregate checkbox. The following aggregation methods are possible:

The time zone of input parameters and output data is, by default, unchanged.  Changing the time zone is supported by using the Timezone list box based on the assumption that data are stored in UTC time  and outputs are presented in the selected time zone.

  1. Check the box for Enable on-demand queries if you would like to enable this function.

  2. Click OK to confirm the selection and retrieve the record set into Panopticon Designer (Desktop).

The flat record set corresponding to the executed SQL is returned from the source database and displayed in Data Prep with the database name as the title and all fields listed displayed in Data Source Preview.

  1. If you wish to make changes to your fields, you may do so now and then click OK when you are finished. If you do not wish to make any changes to your data, simply select the OK button.

The data set you specified is added as a new data table.