Best Practices on Working with R Transform in Panopticon

When applying a transform with R in Panopticon for small and simple cases, you have the option of typing or pasting the code directly into the Transforms window. However, when you have several different transforms, or when each transform is applied in several data tables, it is highly recommended to follow the steps outlined below on how to apply functional programming and the D.R.Y. principle (Don’t Repeat Yourself) to the R transforms in Panopticon. In particular, steps 3 to 4 describe how they can have a major impact on the speed performance and memory usage.

Best Practices on Working with R Transform in Panopticon

Save your code in R-files for R. This option gives you the freedom to work on the code in RStudio.
Instead of using an imperative coding approach, define one or several functions in the file, which when invoked, runs your code, takes a data frame as an input argument, and then returns the resulting data frame. If you are in an organization where several people will be working with R Transforms in Panopticon, potentially with several different code contributors and several different Rserve hosts, you should consider creating an internal use R-package containing all of the functions that your organization needs.
Any file with R code and any package that will be used in R Transforms or via the Rserve connector in Panopticon should be loaded as a prerequisite in starting up Rserve. This is achieved by specifying such R-code files and R-packages in the source and/or eval statements in the Rserve configuration file (refer to Rserve online documentation for further details).

If several R-files need to be loaded, you can enter several source statements in the configuration file. If several R-packages need to be loaded, you can enter several eval statements in the configuration file, each being a library() call. Alternatively, you can put all of the calls to source() and library() that you need in a single R-file, and then specify that R-file to a single source statement in the Rserve configuration file.

The benefit of loading functions, packages – and even datasets – via the Rserve configuration file is that, those resources will then reside in memory and will be immediately available to all Rserve connections. Any function, dataset or package not pre-loaded in this way will have to be loaded as part of each unique connection which will take time and separately consume memory.
Should you be in a position where you don’t have control over the Rserve instance and therefore cannot add source or eval statements to the Rserve configuration file, you can still achieve some code management rationalization by defining your own functions, save those in an R-file, and load this R-file with a source() call in each of your transforms. Thereby, each of your R transform scripts can be minimized to just the function call you need. However, note that with this approach, the file you source will be loaded each time a connection to Rserve is made.
To further rationalize the management of the path to your R-code file, which you may need to enter in several R transform scripts, you can parameterize the path to your R-code file by creating a global parameter in Designer under Tools > Options > Parameters. This gives you one single place to edit the path to your R-code file, should it need to be changed.

NOTES:

- If there is a need to apply different transforms to different data sets, you can solve this by defining several different functions in your code file.
- For very similar functions, avoid repeating the same code in a file by factoring out the common parts and placing them in a separate function, which can be invoked by the other functions.
- For a transform that needs to have different outputs based on certain conditions or variables, this can be controlled by adding another input parameter to the function. Depending on the argument given to that parameter, you can make the function do things differently by evaluating a condition. In addition, this argument can – if you want to – be supplied via a Panopticon parameter and thereby be put under a dashboard end-user control.

Example Code in R

File: my_transform_code.R

# minimal example function

add_one = function(df, colname) {

df[colname] = df[colname]+1

return(df)

}

Panopticon R transform window code:

source(file.path("{my_R_code_path}","my_transform_code.R"))

# data set is loaded in dataframe named ‘my_data_frame’

add_one(df = my_data_frame, colname = "my_column_name")

# the function returns a data frame

# which is picked up by Panopticon