Getting Started

Running SmartLoader

The SmartLoader application is a containerized application using docker. When executed for the first time, it will create the necessary folder structure and necessary files.

AWS Users

This documentation provides sample commands to execute the container on a local machine. For ECS and EKS users, they must use and configure task definitions to execute the container and must refer to the AWS documentation to implement the various command line and environment variables configurations discussed in this documentation for the AWS service they are using.

Setting Working directory

SmartLoader requires a working directory that needs to be set and must reside outside the container. The container is build with a local working directory named /data. This directory must be mapped to a location outside the container when executing the container.

Two options exists:

Mapping to a local directory

Create a local working directory and map the /data directory to it. For example, if the local working directory is named \smartloader-data then SmartLoader must be launched using the following command:

docker run –rm -v “$(pwd)/smartloader-data”:/data 709825985650.dkr.ecr.us-east-1.amazon.com/inovvo/smartloader

Mapping to external storage

SmartLoader supports the usage of external storage such as an S3 bucket. Setting the working directory can be accomplished using the Command Line argument or setting an environment variable workDir. Below is an example to lunch SmartLoader and setting the working directory to an S3 bucket called mybucket using the command line argument:

docker run –rm 709825985650.dkr.ecr.us-east-1.amazon.com/inovvo/smartloader -workDir ‘s3://bucketname/out/’

Note: the use of the –rm switch is optional. Please refer to Docker documentation.

First Run

From here on, we are assuming that the working directory has been set using one of the options described above and we’ll use the following shortened command to refer to the execution of SmartLoader. For ECS and EKS users, please follow the corresponding service instructions to configure and execute the container:

docker run –rm dockername

where dockername = 709825985650.dkr.ecr.us-east-1.amazon.com/inovvo/smartloader

Launch SmartLoader using the following command:

docker run –rm dockername

This will cause SmartLoader to setup its environment by creating the following items:

  • incoming - Folder where SmartLoader will search for files for processing

  • archive - Folder where SmartLoader will move files after successful upload

  • rejected - Folder where SmartLoader will write wrong rows and files that cannot be uploaded

  • configs - Folder that contains configuration files used for uploading data for different file types

  • logs - Folder that contains all logs generated by the application

  • default.yml - Main configuration file for SmartLoader.

SmartLoader Configurations

SmartLoader reads configuration files created using YAML, a serialization language used as a format for configuration files. YAML files are text-based, can be edited using standard text editors and cannot contain special characters such as tabs. all parameters present in YAML configuration files are case sensitive. More information about YAML files can be found at http://yaml.org

Configuring default.yml

The first step in running SmartLoader is to configure the database connection. The database settings are set in the default.yml configuration file that is peresent in SmartLoader root folder. Open the configuration file using a text editor and set the following properties:

dataDir: Folder where the incoming and archive folders are located. Default is “.” for SmartLoader execution folder

dbPassword: Database password enclosed in double quotes

dbUrl: Database connection information using the following format enclosed in double quotes: “jdbc:{database}://{database_host}:{port}/{databaseName}”

Parameter Description
{database}: Can be replaced with vertica or mysql
{database_host}: The server name where the database is hosted
{port}: The database port number
{databaseName}: The name of the database where SmartLoader will be loading the data from
the CSV file(s). The database has to be created prior to running SmartLoader

linesToAnalyze: Number of lines SmartLoader will analyze from the input file to learn about its file structure. Default is 1000

instanceName: The name of the instance in the database to use for loading the data

moveToArchive: Setting that allows SmartLoader to move successfully loaded files to the archive folder. Default is true

parallelism: The number of threads SmartLoader will use while executing. Default is 2

A detailed description of all SmartLoader configuration properties can be found in the Configurations section.

Loading a CSV file

SmartLoader looks for CSV files in the incoming folder. As a good practice, It is important to know the content and structure of the CSV file(s). Place a CSV file in the incoming folder and run the following command:

docker run –rm dockername -generateConfigOnly

This will cause SmartLoader to examine the CSV file and autogenerate a config file with the same name of the CSV file with an yml extension. The file will be created in the configs folder.

For Example, if you have a CSV file called myfile.csv that contains the following structure:

A,B
1,2
3,4
5,6

(A comma-delimited file with a header)

SmartLoader will create a config file called myfile.yml and place it in the configs folder. The file will have the following structure:

---
columns:
    - !field
        name: A
        size: 7
        type: Integer
    - !field
        name: B
        size: 7
        type: Integer
configFileName: default.yml
createTable: true
dataDir: .
dbPassword: myPassword
dbUrl: "jdbc:mysql://localhost:3306/tutorial"
dbUser: root
delimiter: ","
header: 1
instanceName: smartloader
linesToAnalyze: 1000
loadingMode: BY_CATEGORY
moveToArchive: true
nullStringPlaceholder: ""
parallelism: 2
processId: Smart Loader
workDir: .

SmartLoader detected the two fields in the source CSV file and guessed their size and type. Users can make edits to the field information to make any changes such as changing field names.

SmartLoader will always try to match a config file with the CSV file being loaded. if the file being loaded is called myfile.csv, then SmartLoader will expect to find myfile.yml in the configs folder, otherwise, it will throw an error. We will later cover how to configure smartLoader to load multiple csv files with the same structure but different filenames utilizing regular expression.

Run the following command to load the CSV file:

docker run –rm dockername

This will cause SmartLoader to do the following:

  1. Read the CSV file located in the incoming folder and match it with a yml config file in the configs folder

  2. Create a log file in the logs folder using the csv filename with a .log extension

  3. If SmartLoader cannot import part or the entire content of the CSV file, it will create CSV file in the rejected folder that either contains the rejected records from the CSV file or the entire CSV file if the file could not be loaded. The rejected file will also contain error messages

  4. Archive the source CSV file by moving it to the archive folder

  5. Create a table called myfile where the data from the text file will be loaded