Introduction

Visual Data Discovery is performed through workbooks. A workbook is a collection of:

Dashboards (Visual Layouts)
Data Tables (Data Query and Schema Definitions)
Actions (Contextual Interaction Definitions)
Overall styling

Dashboards themselves consist of dashboard parts including: Visualizations, Legends, Filters, Action Controls, Labels, and Images.

Data tables output both data schemas and data conduits, and define the queries and source data repository definitions, in order to retrieve data. They themselves do not store data, but are simply the conduit to which data flows through.

The core of the product is the processing of data, which can range from Real Time Streaming Datasets, which are retrieved asynchronously, to static and historical datasets which are retrieved synchronously on a defined periodic basis. It is assumed that data is never at rest, and as a consequence data refresh is an automatic operation across all datasets.

Data sources can be connected to directly, with data retrieved on the fly as it is required.

Alternatively, on slower underlying data sources, the data can be extracted locally on a scheduled, or ad hoc basis. This locally extracted data can then be queried, minimizing query latency, but increasing the risk of stale data.

Data can be accessed in a number of methods depending on the need and source repositories capabilities:

Retrieve all data into memory

For example, retrieving an MS Excel spreadsheet.
Retrieve subsets into memory, which may be summarized, or parameterized

For example retrieving a summary view, and then retrieving a detailed dataset, based on the selection in the Summary view. This method provides very tightly controlled data retrieval times, but requires the paths through data to be pre-specified, with pre-defined data queries (including stored procedures).

Retrieve only required results into memory, by querying on demand, pushing aggregation & filtering tasks to underlying big data repositories, or queryable data extracts.

This is commonly known as a ROLAP implementation, where the product is dynamically writing data queries to the underlying data repository, and retrieving aggregated and filtered datasets. Given the on-demand nature of this method it is more suitable to exploratory data analysis, but requires dynamic query generation.

In the cases where there is too much data to retrieve into memory, data access can be direct to the underlying source, or through the data extracts created in the Panopticon Visualization Server. As the data extract supports on demand queries, summarization, and parameterization, it can become a more powerful option than a number of underlying data sources.

Data extracting is available for non-streaming data sources, and can be used across all workbooks.

Data, when retrieved through the Panopticon Designer (Desktop) platform can be enhanced through:

Casting – Changing Data types between String, Number, and Date/Time.
Display Title
Display Format – Setting a display format mask. (e.g. #,##0)
Pivoting
Unpivoting
Calculations (Text, Numeric, and Time Series)
Numeric Bucketing (Identify, Sign, Manual, Equal Density, and Equal Distance)
Time Bucketing (Decades -> Nanoseconds)
Ranking
Custom Grouping
Aggregation
Conflation – Time Barring
Parameterization
Auto row key addition

When displayed it can also be aggregated, grouped, filtered, and shown hierarchically.

Finally the capabilities of R and Python are integrated into the data pipeline, where R & Python code can be run:

a. As a separated data source

b. As a data transform on an existing data source

Essentially this allows most data processing capabilities within these languages to be bound into the Panopticon Designer (Desktop) platform. An example is shipped with the Designer demonstrating more complex statistical capabilities utilizing Python, Numpy, and Scipy.