Advanced Data Transformations
SmartLoader exposes a java and Groovy framework that provides for advanced data transformation capabilities during data loading. Users can leverage this capability to access java classes and methods to manipulate the data being loaded. This requires a good working knowledge of java and Groovy scripting language
It is recommended that a Groovy Console be installed and used to validate all Groovy expressions prior to including them in the configuration files
SmartLoader API Reference
SmartLoader exposes a set of methods that deliver information about the file loading environment. This information can be furthur consumed in java and Groovy expressions in configuration files to implement advanced data transformation tasks.
List of API functions:
-getHostname() Returns network hostname of executing node
-loadLookup(Map<String, Object> args) Loads `filePath` as lookup source
Parameters args is a Map:
filePath : "Path to lookup file, String, mandatory",
delimiter : "Column delimiter of lookup file, String, optional, default [,]"",
quote : "Quote character of lookup file, String, optional, default [']",
header : "Header line, Sting, optional, will be extracted from the first line if not set",
skipHeaderLineCount: "Number of lines to skip from the beginnings of the file, Integer, optional, default [0]",
encoding : "Lookup file encoding, String, optional, default [UTF-8]",
keyColumn : "Name of key column, String, mandatory",
nullString: "Value to treat as null if found in lookup, String, optional, default []"
Returns:
Lookup Object with methods:
String[] lookup(String key) Returns array of
columns of lookup file by `key`;
String lookupAndExtractColumn(String key, Returns value of `column`
String column, from lookup by `key`, r
String defaultValue) eturns `defaultValue` if lookup fails
Sample Code:
Code sample to use header-less file "someLookupSourceFile.csv" that contains rows like "1|First", "2|Second" as a lookup:
columns:
- !field
name: LOOKUP_ID
size: 7
type: Integer
- !field
generated: true
name: LOOKUP_VALUE
size: 7
type: String
expression: "someLookup.lookupAndExtractColumn(f('LOOKUP_ID'), 'lookupColumn', 'LookupFailedValue')"
onStartExpression: |
someLookup = loadLookup(filePath: 'someLookupSourceFile.csv',
delimiter: '|',
header: 'keyColumn|lookupColumn',
keyColumn: 'keyColumn')
-fileName(String filePath) Extracts base name of file defined as `filePath`
Parameters:
String filePath -- full path to file
Returns:
String -- base name of filePath
-fileExtension(String filePath) Extracts extension of file defined as `filePath`
Parameters:
String filePath -- full path to file
Returns:
String -- extension of filePath
-sequence(String sequenceName) Returns next sequencial
number for sequence `sequenceName`
Parameters:
String sequenceName -- name of the sequence
Returns:
Long -- next sequencial number
-f(String name, String defaultValue) Returns the value of the
column `name` in the current
processing row, if it exists,
or `defaultValue` otherwise
Parameters:
String name -- name of the column to extract value from
String defaultValue -- default value to return if value of column `name` is empty
Returns:
String -- extracted/default value
-coalesce(Object... values) Evaluates the arguments in
order and returns the current
value of the first expression
that initially doesn't evaluate
to NULL. For example,
coalesce(null, null, 'third_value', 'fourth_value');
returns the third value because
the third value is the first value that isn't null.
Parameters:
Objects of any type
Returns:
First not null Object
-decode(Object expression, Object... searchResultPairs) Compares expression
to each search value one by one.
If expression is equal to a search,
the function returns the corresponding result.
If no match is found, the function returns
default. If default is omitted, the function
returns null.
Parameters:
Object expression -- The value to compare
Object... searchResultPairs -- pairs of search values compared against expression
and values to return if expression is equal
to search. Last searchResultPairs object without pair
is optional and considered as default value if no matches found.
Returns:
Object -- Result object of searchResultPair of first match
Notes:
decode is similar to the IF-THEN-ELSE and CASE expression:
CASE expression WHEN search THEN result
[WHEN search THEN result]
[ELSE default];
-regex(String stringSource, String stringPattern) The same as regex(stringSource, stringPattern, stringDefaultValue),
stringDefaultValue is set to empty string
-hash(String string) Applies CRC64 hash function to the `string`
-filterCurrentRow() Filters out current row
-setEncryptDecryptPassword(String password) Sets `password` as encryption/decryption
password for encryptAES/decryptAES functions
Note: run this function once in `onStartExpression` block
-encryptAES(String value) Encrypts `value` with AES
using secret key set by setEncryptDecryptPassword function
Parameters:
String value -- value to encrypt
Returns:
String -- binary result of AES encrypt operation, encoded as base64
-decryptAES(String value) Decrypts `value` with AES
using secret key set by
setEncryptDecryptPassword function
Parameters:
String value -- value to decrypt, binary encoded as base64
Returns:
String -- result of AES decrypt operation
There are several objects exposed by smartloader for use in groovy expressions. Objects set does vary between execution blocks. You can find what is exposed from the list below:
expression: String FILE_NAME, Map<String,String>ARGS
onStartExpression: String FILE_NAME, Map<String,String>ARGS
onEndExpression: String FILE_NAME, Map<String,String>ARGS,SmartLoaderAuditData AUDIT
SmartLoaderAuditData is the object exposing audit information using the following set of methods:
String getBuildVersion() – returns application version
String getTableName() – returns target table name
long getLoadingTime() – returns loading time, in seconds
String getFileName() – returns source file name
long getReadRows() – returns number of rows read from source
long getDbRejectedRows() – returns number of rows rejected by database
long getDbLoadedRows() – returns number of rows loaded into database
long getProcessRejectedRows() – returns number of parsed rejected rows
long getFilteredRows() – returns number of rows filtered by running filterCurrentRow()
long getFileSize() – returns source file size, in bytes
long getFileExtractedSize() – returns source file size after extraction, in bytes