Upload Groovy Integration

Overview

When dealing with complex file uploads, you can use a Groovy class or code snippets. Groovy runs within the JVM and is controlled by the DIH framework, so it can access the same classes, objects, and environmental components as DIH. This allows you to manipulate the file upload data while it is being uploaded, reducing the need for temporary or staging areas.

There is no parameter for Groovy. It should be coded in any of the following:

Executable Code Block with Groovy

The coded Groovy applies to all uploads for any task that uses the Executable.

To add a Groovy to an Executable, select com.ria.converter.api.executor.file.FileUploadWorkEntryExecutor in the Executable Class field and add the Groovy code to the first Code Block.

If a task linked to an executable contains Groovy code in the Data Block, the Groovy code in the Code Block will not be executed.

Task Data Block with Groovy

The coded Groovy applies only to the task where it is coded.

The coded Groovy is specified with the following parameters:

parameterName = value[;]
The parameterName must be one of the supported Parameters. The value for the parameter can be specified over multiple lines, but the parameter name must start in the first column. A ; at the end of the line is optional.

A parameter specified in a Data Block overrides any value for the same parameter in the Parameters definition.
groovy = … groovy code …
There is no parameter for Groovy but a Groovy class or snippet. For more information, see Upload Groovy methods.

Upload Groovy class

The Groovy class must be coded according to the language specifications documented here.

The Groovy class definition requires a name, which can be any valid class name. Only the methods that are relevant to the solution are required. For example, if records do not need to be joined, the startRecordJoin(String record) and joinToLastRecord(String record) can be omitted.

At runtime, a Groovy class is instantiated as a Groovy object and can have instance variables accessible to all the methods for that Task’s execution.

During file uploads, Groovy methods are called at an appropriate time to validate and manipulate the data before it gets written to the target table.

If the solution doesn’t need instance variables and the methods are independent pieces of functionality, you can omit the class keyword and only define the methods. For example, the following is a valid Groovy snippet of just one method, which doesn’t have an explicit class definition:

def bindState(column, record) {
    // Default STATE_CODE to IL
    return 'IL'
}

Upload Groovy imports

As with Java, in a Groovy class, external 3rd party classes must be defined using import statements before the class definition. For example, to use the Strings class from the Google Guava library, you need to add the following import statement:

import com.google.common.base.Strings
class UploadMyFile {
    ...
}

The Strings class can be used in the Groovy class without the need to qualify it. For example:

def bindMyIdColumn(String column, String record) {
    return Strings.padStart(record.substring(1,5),10,'0')
}

Without the import, the Strings class would have to be fully qualified wherever it is used. The above example would look much more lengthy:

 def bindMyIdColumn(String column, String record) {
     return com.google.common.base.Strings.padStart(record.substring(1,5),10,'0')
}

By default, Groovy automatically and dynamically imports commonly used external classes, such as the java.lang.* classes, to make objects like String and Integer available. For more information, see Apache Groovy.

Some DIH classes are commonly required by an upload Groovy class. In addition to the default imports, the following are included in the DIH package, so they do not have to be imported.

com.ria.converter.api.runtime.TaskWorkExecutorParameters
com.ria.converter.api.runtime.ThreadLogger
com.ria.core.util.Utils
java.text.SimpleDateFormat
org.apache.commons.csv.CSVRecord

Upload Groovy methods

For file upload, the following optional methods can be implemented or coded:

If the filePath contains a wild card, and multiple files were selected, startFile, endFile, and all the methods listed in between are called for each file.

startUpload

The startUpload method is called when a new file upload task starts and receives the TaskWorkExecutorParameters object. It does not return a value.

def startUpload(TaskWorkExecutorParameters parameters){
    //Add logic here
}

The TaskWorkExecutorParameters include all run parameters, such as filePath, csvDelimiter, etc., and any custom parameters. If a parameter is required by another method at a later stage in the upload process, it can be saved to an instance variable in this method.

The following are the main methods of the TaskWorkExecutorParameters object:

String getRequiredStringParameterValue(String parameterId)
String getOptionalStringParameterValue(String parameterId, String defaultValue)
String getNullableStringParameterValue(String parameterId)
Integer getIntegerParameterValue(String parameterId, Integer defaultValue)
Boolean getBooleanParameterValue(String parameterId)
Boolean getBooleanParameterValue(String parameterId, Boolean defaultValue)
Boolean hasParameterValue(String parameterId)

startFile

The startFile method is called at the start of each file identified by the filePath parameter.

def startFile(File file){
    //Add logic here
}

If the filePath contains a wild card, and multiple files were selected. This method is called for each file. It returns false to reject the file immediately. Otherwise, it returns true.

The File object can be saved to an instance variable for use in the other methods.

acceptFile

The acceptFile method checks if the record contains data indicating that the file should be uploaded. It returns a Boolean value.

def acceptFile(String record){
    //Add logic here
}

The acceptFile and the rejectFile methods are called before the file is reopened and processed for upload. If the acceptFile method is present but the rejectFile method is not, the acceptFile method is called for each record in the file until it returns true to accept the file.

If the acceptFile and rejectFile methods are both present, the rejectFile method has precedence. The acceptFile method is called for each record until the end of the file or until the rejectFile method returns true.

If the acceptFile method and acceptFileOnMatch parameters are specified, the regular expression configured in the parameter will be evaluated first before the method is called. Depending on the evaluation, the acceptFile method may not be invoked.

rejectFile

The rejectFile method checks if the record contains data indicating that the file should not be uploaded. It returns a Boolean value.

def rejectFile(String record){
    //Add logic here
}

The rejectFile method is called for each record in the file until it returns true to reject the file.

The acceptFile and the rejectFile methods are called before the file is reopened and processed for upload.

If the rejectFile method and rejectFileOnMatch parameters are specified, the regular expression configured in the parameter will be evaluated first before the method is called. Depending on the evaluation, the rejectFile method may not be invoked.

The rejectFile method can also be used to track or do pre-upload handling of the records by always returning false.

selectRecord

The selectRecord method checks if the record should be selected for upload. It returns a Boolean value true to include the record and false to exclude it.

def selectRecord(String record){
    //Add logic here
}

If the selectRecord method and selectOnMatch parameters are specified, the regular expression configured in the parameter will be evaluated first before the method is called. Depending on the evaluation, the selectRecord method may not be invoked.

excludeRecord

The def excludeRecord(String record) method checks if the record should not be selected for upload. It returns a Boolean value true to exclude the record and false to include it.

def excludeRecord(String record){
    //Add logic here
}

If the excludeRecord method and excludeOnMatch parameters are specified, the regular expression configured in the parameter will be evaluated first before the method is called. Depending on the evaluation, the excludeRecord method may not be invoked.

startRecordJoin

The startRecordJoin method checks if the record is the first in a series of records that need to be joined. For example, when the data continues over multiple records. It returns a Boolean value true to join the record to subsequent records managed by the joinToLastRecord method.

def startRecordJoin(String record){
    //Add logic here
}

If the startRecordJoin method and joinMatchers parameters are specified, the regular expression configured in the parameter will be evaluated first before the method is called. Depending on the evaluation, the startRecordJoin method may not be invoked.

joinToLastRecord

The joinToLastRecord method checks if this record must be joined to the previous record. It returns a Boolean value true to join the record to the previous record.

def joinToLastRecord(String record){
    //Add logic here
}

The joinToLastRecord method is called after the startRecordJoin method has returned true and will continue to be called as long as the joinToLastRecord method returns true.

bindExampleColumn

The bindExampleColumn method gets invoked to provide a value for a column. It returns a string value.

def bindExampleColumn(String column, String record){
    //Add logic here
}

The string column must be declared in the columnSelectors parameter using the COLUMN{bindExampleColumn} syntax. The string record should be a record after any joins have been executed.

For example, the following populates the AMOUNT column with 101.11.

columnSelectors = AMOUNT{bindAmount}
groovy =
    def bindAmount(column, record) {
        return '101.11'
    }

editBeforeInsert

The editBeforeInsert method is used to immediately process the data, including modifying column values and adding new rows before it is inserted into the table. It returns a Boolean value true to insert the returned data.

def editBeforeInsert (List recordColumns, Object inputRecord){
    //Add logic here
}

The list recordColumns is a list that contains a hash map of the columns specified in the csvHeader and columnSelectors parameters, along with their string values. Upon entry, this list will always contain only the one record to be added to the index 0. You can create additional records by adding them to the list.

The inputRecord argument is the record from the file, including all field values, even those not mapped in the csvHeader or columnSelectors parameters.

Note that the usage of the inputRecord argument depends on the type of upload.

inputRecord in a CSV upload

For a CSV upload, inputRecord is an object of type CSVRecord.

The inputRecord is typically used to retrieve the value of a field by calling the get(n) method, where n represents the index of the field starting from 0. For example, inputRecord.get(0) will retrieve the value of the 1st field, inputRecord.get(1) will retrieve the value of the 2nd field, and so on.

inputRecord in a FIXED upload

For a FIXED format upload, the inputRecord is a String object that contains the complete, unformatted input record. Field values can be extracted from it using any of the Groovy or Java string methods, such as substring. For example, inputRecord.substring(0, 10) will retrieve the first 10 characters of the input record.

Modify a record

To modify a record passed into the editBeforeInsert method, access the map at the list index 0 and make any necessary changes. For example, columnValues.get(0).get('CUSTOMER_NUMBER') retrieves the string value for the CUSTOMER_NUMBER column.

Column names are always specified in uppercase. For example, CUSTOMER_NUMBER.

Modify a column

The columnValues is a hash map containing the columns and their respective string values. You can retrieve the string value using columnValues.get('COLUMN_NAME') and update a column’s value using columnValues.put('COLUMN_NAME', 'new value'). The new value must be a string and will be converted to the column’s data type before insertion.

The column names in the map should always be in uppercase, regardless of how they were originally specified in the csvHeader or columnSelectors parameter.

Do not make any changes to the map itself, such as adding or removing fields, because it will cause issues with the bind variables in the prepared statement.

For example, the columnValues.get(0).put('CUSTOMER_NUMBER', 'new-value') updates the value of the CUSTOMER_NUMBER column to new-value.

If there are additional columns that have not been mapped in the csvHeader or columnSelectors parameter, you can add them using the put method, as shown in the example above. Any column added in this way must be a valid column name on the target table and must be added on the very first call to this method. This allows the executable to determine the proper SQL bindings for the subsequent records.

endFile

The endFile method is called at the end of each file identified by the filePath parameter. It does not return a value.

def endFile(File file){
    //Add logic here
}

If the filePath contains a wild card, and multiple files were selected. This method is called for each file.

endUpload

The endUpload method is called at the end of the upload. It does not return a value.

def endUpload(TaskWorkExecutorParameters parameters){
    //Add logic here
}

You can specify any finalization tasks using this method.