Advanced Data Transformations

SmartLoader exposes a java and Groovy framework that provides for advanced data transformation capabilities during data loading. Users can leverage this capability to access java classes and methods to manipulate the data being loaded. This requires a good working knowledge of java and Groovy scripting language

It is recommended that a Groovy Console be installed and used to validate all Groovy expressions prior to including them in the configuration files

SmartLoader API Reference

SmartLoader exposes a set of methods that deliver information about the file loading environment. This information can be furthur consumed in java and Groovy expressions in configuration files to implement advanced data transformation tasks.

List of API functions:

-getHostname()                          Returns network hostname of executing node

-loadLookup(Map<String, Object> args)   Loads `filePath` as lookup source
Parameters args is a Map:
    filePath  : "Path to lookup file, String, mandatory",
    delimiter : "Column delimiter of lookup file, String, optional, default [,]"",
    quote     : "Quote character of lookup file, String, optional, default [']",
    header    : "Header line, Sting, optional, will be extracted from the first line if not set",
    skipHeaderLineCount: "Number of lines to skip from the beginnings of the file, Integer, optional, default [0]",
    encoding  : "Lookup file encoding, String, optional, default [UTF-8]",
    keyColumn : "Name of key column, String, mandatory",
    nullString: "Value to treat as null if found in lookup, String, optional, default []"
Returns:
    Lookup Object with methods:
        String[] lookup(String key)                           Returns array of
                                                              columns of lookup file by `key`;

        String lookupAndExtractColumn(String key,             Returns value of `column`
                                      String column,          from lookup by `key`, r
                                      String defaultValue)    eturns `defaultValue` if lookup fails
Sample Code:
    Code sample to use header-less file "someLookupSourceFile.csv" that contains rows like "1|First", "2|Second" as a lookup:
    columns:
        - !field
            name: LOOKUP_ID
            size: 7
            type: Integer
        - !field
            generated: true
            name: LOOKUP_VALUE
            size: 7
            type: String
            expression: "someLookup.lookupAndExtractColumn(f('LOOKUP_ID'), 'lookupColumn', 'LookupFailedValue')"
    onStartExpression: |
        someLookup = loadLookup(filePath: 'someLookupSourceFile.csv',
                                delimiter: '|',
                                header: 'keyColumn|lookupColumn',
                                keyColumn: 'keyColumn')

-fileName(String filePath)                          Extracts base name of file defined as `filePath`
    Parameters:
        String filePath -- full path to file
    Returns:
        String -- base name of filePath

-fileExtension(String filePath)                      Extracts extension of file defined as `filePath`
    Parameters:
        String filePath -- full path to file
    Returns:
        String -- extension of filePath

-sequence(String sequenceName)                       Returns next sequencial
                                                     number for sequence `sequenceName`
    Parameters:
        String sequenceName -- name of the sequence
    Returns:
        Long -- next sequencial number

-f(String name, String defaultValue)                 Returns the value of the
                                                     column `name` in the current
                                                     processing row, if it exists,
                                                     or `defaultValue` otherwise
    Parameters:
        String name -- name of the column to extract value from
        String defaultValue -- default value to return if value of column `name` is empty
    Returns:
        String -- extracted/default value

-coalesce(Object... values)                          Evaluates the arguments in
                                                     order and returns the current
                                                     value of the first expression
                                                     that initially doesn't evaluate
                                                     to NULL. For example,
                                                     coalesce(null, null, 'third_value', 'fourth_value');
                                                     returns the third value because
                                                     the third value is the first value that isn't null.
    Parameters:
        Objects of any type
    Returns:
        First not null Object

-decode(Object expression, Object... searchResultPairs)   Compares expression
                                                          to each search value one by one.
                                                          If expression is equal to a search,
                                                          the function returns the corresponding result.
                                                          If no match is found, the function returns
                                                          default. If default is omitted, the function
                                                          returns null.
    Parameters:
        Object expression -- The value to compare
        Object... searchResultPairs -- pairs of search values compared against expression
                                       and values to return if expression is equal
                                       to search. Last searchResultPairs object without pair
                                       is optional and considered as default value if no matches found.
    Returns:
        Object -- Result object of searchResultPair of first match
    Notes:
        decode is similar to the IF-THEN-ELSE and CASE expression:
        CASE expression WHEN search THEN result
          [WHEN search THEN result]
          [ELSE default];

-regex(String stringSource, String stringPattern)    The same as regex(stringSource, stringPattern, stringDefaultValue),
                                                     stringDefaultValue is set to empty string

-hash(String string)                                 Applies CRC64 hash function to the `string`

-filterCurrentRow()                                  Filters out current row

-setEncryptDecryptPassword(String password)          Sets `password` as encryption/decryption
                                                     password for encryptAES/decryptAES functions
  Note: run this function once in `onStartExpression` block

-encryptAES(String value)                            Encrypts `value` with AES
                                                     using secret key set by setEncryptDecryptPassword function
    Parameters:
        String value -- value to encrypt
    Returns:
        String -- binary result of AES encrypt operation, encoded as base64

-decryptAES(String value)                           Decrypts `value` with AES
                                                    using secret key set by
                                                    setEncryptDecryptPassword function
    Parameters:
        String value -- value to decrypt, binary encoded as base64
    Returns:
        String -- result of AES decrypt operation

There are several objects exposed by smartloader for use in groovy expressions. Objects set does vary between execution blocks. You can find what is exposed from the list below:

  • expression: String FILE_NAME, Map<String,String>ARGS

  • onStartExpression: String FILE_NAME, Map<String,String>ARGS

  • onEndExpression: String FILE_NAME, Map<String,String>ARGS,SmartLoaderAuditData AUDIT

SmartLoaderAuditData is the object exposing audit information using the following set of methods:

  • String getBuildVersion() – returns application version

  • String getTableName() – returns target table name

  • long getLoadingTime() – returns loading time, in seconds

  • String getFileName() – returns source file name

  • long getReadRows() – returns number of rows read from source

  • long getDbRejectedRows() – returns number of rows rejected by database

  • long getDbLoadedRows() – returns number of rows loaded into database

  • long getProcessRejectedRows() – returns number of parsed rejected rows

  • long getFilteredRows() – returns number of rows filtered by running filterCurrentRow()

  • long getFileSize() – returns source file size, in bytes

  • long getFileExtractedSize() – returns source file size after extraction, in bytes