File Upload

Overview

The FileUploadWorkEntryExecutor uploads a text file that can be a CSV, Fixed, EBCDIC, or XLSX file to a database table. For CSV files, the supported predefined formats are documented here.

Note: The DIH package includes the Groovy file format that enables file upload customization by coding a Groovy script to handle the input file, format the file, and send the records to DIH to upload to the database.

CSV files

CSV files are files with fields separated by commas in between and with an optional header record with the names of the fields.

If a header record is present, it can be used for the target table column names. For example, the following is a CSV file with a valid header record:

CUST_NO,CUSTOMER_NAME,EFF_DT,AMOUNT
“00000001”,”Jane Doe”,20210101,100
“00000002”,”John Doe”,20210201,200

The file upload could use the first record to determine the column names for each field. However, if a header record is not present or a customer header definition is preferred, you need to define the csvHeader parameter to specify the column names. For example:

csvHeader = CUST_NO, CUSTOMER_NAME, EFF_DT(TO_DATE(?,’YYYYMMDD’), AMOUNT;
csvHeader = CUST_NO,, EFF_DT(TO_DATE(?,’YYYYMMDD’), AMOUNT(TO_NUMBER(?));

The second line in the example will omit the CUSTOMER_NAME column because of the two consecutive commas after CUST_NO.

Fixed files

Fixed-position files contain columns of fixed length. The FileUploadWorkEntryExecutor requires start and length values, which can be defined with the fields or derived from a header record if the positions of the header names coincide with their associated column values.

For example, the following header record is acceptable because each field name, CUST_NO, CUSTOMER_NAME, EFF_DT, starts in the same column as its corresponding value. There is also at least one space between the column names, which allows the FileUploadWorkEntryExecutor to distinguish the end of one column and the start of the next.

CUST_NO     CUSTOMER_NAME   EFF_DT          AMOUNT
00000001    Jane Doe        2021010100000   100
00000002    John Doe        2021020100000   200

If a header record is not present or a customer header definition is preferred, you need to define the csvHeader parameter to specify the column names, start position, length, and optional formatting for each field. For example:

{% code overflow=“wrap” %}

csvHeader = CUST_NO(0, 8), CUSTOMER_NAME(10, 17), EFF_DT(27, 8, TO_DATE(?,’YYYYMMDD’), AMOUNT(35,8,TO_NUMBER(?));

{% endcode %}

If the header record has the correct positioning but some of the fields need to be formatted, you can specify the header names without the start and length positioning, but with an optional format argument. For example, the AMOUNT is specified without position arguments below.

{% code overflow=“wrap” %}

csvHeader = CUST_NO, CUSTOMER_NAME, EFF_DT(27,8,TO_DATE(?,’YYYYMMDD’), AMOUNT(TO_NUMBER(?));

{% endcode %}

For fixed files, if at least one field name has no position parameters in the csvHeader parameter, a header record must be present for the start and length to be derived.

Target tables

The target table should be manually created and is required to have appropriate column definitions before running an upload.

The fields in the input file are automatically converted to the data types of their corresponding target columns. For dates and timestamps, the default format in the database, such as NLS_DATE_FORMAT on Oracle, is only valid if the input field format conforms to that. Otherwise, it can be explicitly formatted. For example, the TO_DATE(?,’YYYYMMDD’) as used below.

{% code overflow=“wrap” %}

csvHeader = CUST_NO, CUSTOMER_NAME, EFF_DT(27,8,TO_DATE(?,’YYYYMMDD’), AMOUNT(TO_NUMBER(?));

{% endcode %}

Note: If you do not wish to automate the data type conversion, you can disable it by setting the parameter disableColumnTypeConversion to true.

Parameters

This executable class accepts the following parameters:

Parameter	Description
`filePath`	Path of the input file that is accessible to the runtime server. The path can contain the `*` and `?` wildcard characters. If the wildcard path matches more than one file and `fileListCount` is `> 1`, the files will be uploaded in succession and appended to the same table. For more information, see File upload data blocks. Mandatory: Yes
`tableName`	Name of the target table. For more information, see Target tables. Mandatory: Yes
`truncateTable`	If set to true, deletes the contents of the table before loading by performing a `DELETE FROM` function to avoid implicit commits and to allow rollbacks when an error occurs. Mandatory: No Default: `false`
`fileListCount`	Maximum number of files to upload if a wildcard `filePath` matches more than one file. Mandatory: No Default: `1` Set to `-1` for unlimited files.
`fileListOrder`	Determines the order of the files if a wildcard `filePath` matches more than one file. Mandatory: No Default: `a` Set to `d` for descending order.
`ignoreFileNotFound`	If set to `true`, a file upload task continue to completion even if a file is not found. Mandatory: No Default: `false`
`moveToDir`	Specifies a directory where successfully uploaded files are moved. Do not define this parameter if `moveToPath` parameter is specified. If a move fails for any reason, it is considered a fatal error and the job will stop. If the job is on a schedule, it will be deactivated to prevent repeated, potentially harmful executions.
`moveToPath`	Specifies a file path where successfully uploaded files are moved. Unlike the `movetoDir` parameter, you should define the `moveToPath` parameter with a full path. You can also use this parameter to rename the succesfully uploaded file. The full path can include an input file name placeholder `{}`, internal `{$...}`, and parameter variables `{#...}`. For more examples, see moveToPath examples. Do not define this parameter if `moveToPath` parameter is specified.
`fileFormat`	Specifies the format of the input file. Possible values are: `CSV` `FIXED` Mandatory: No Default: `CSV`
`csvFormat`	Only applies to `CSV fileFormat`. Specifies the format of the CSV file. Possible values are: `Default` `Excel` `InformixUnload` `InformixUnloadCsv` `MongoDBCsv` `MongoDBTsv` `MySQL` `Oracle` `PostgreSQLCsv` `PostgreSQLText` `RFC4180` `TDF` For more information on these CSV formats, see Enum CSVFormat.Predefined. Mandatory: No Default: `Default`
`csvDelimiter`	Only applies to `CSV fileFormat`. Specifies the delimeter for CVS files. Mandatory: No Default: `,`
`csvHeader`	Applies to both `CSV` and `FIXED` file formats. Specifies the column header names if the first record in the file does not have them, or if they are not to be used. If this parameter is not defined, the first record in the file must have the column names. For formats, see csvHeader formats.
`csvIgnoreEmptyLines`	Skips empty lines, including lines containing only the delimeters, in the input file, if set to `true`. Mandatory: No Default: `false`
`csvSkipHeader`	Skips header if the csvHeader parameter is specified, if set to `true`. If the csvHeader is empty, or if a field name is specified without position arguments for `FIXED` file formats, the input file must have a header in the first record, and the task ignores this parameter. Mandatory: No Default: false
`trimNonAscii`	When set to `true`, this parameter makes the file upload remove characters such as tabs, Ctrl-z characters (x'A0'), non-breaking spaces, etc. from the input. Only leading and trailing non-ASCII characters are trimmed. Embedded non-ASCII characters are replaced by spaces to preserve the size and positioning of the ASCII portion of the input as much as possible.
`disableColumnTypeConversion`	Disables automatic conversion of Java types to match the column types on tableName. All values are treated as strings.
`acceptFileOnMatch`	Specifies a regular expression to match against each record in the input file. The file is processed if a match is found unless `rejectFileOnMatch` is also specified. If both `acceptFileOnMatch` and `rejectFileOnMatch` are specified, `rejectFileOnMatch` takes priority. This means no record must match `rejectFileOnMatch` for `acceptFileOnMatch` to take effect. If either `acceptFileOnMatch` or `rejectFileOnMatch` is specified, the input file will be read once to validate against these regular expressions. If accepted, it will be re-opened and processed from the beginning for the upload. Note: The Groovy method `acceptFile(String record)` can be coded instead of, or in addition to, this regular expression to programmatically check each record.
`rejectFileOnMatch`	Specifies a regular expression to match against each record in the input file. The file is not processed if a match is found. If both `acceptFileOnMatch` and `rejectFileOnMatch` are specified, `rejectFileOnMatch` takes priority. If either `acceptFileOnMatch` or `rejectFileOnMatch` is specified, the input file will be read once to validate against these regular expressions. If accepted, it will be re-opened and processed from the beginning for the upload. Note: The Groovy method `rejectFile(String record)` can be coded instead of, or in addition to, this regular expression to programmatically check each record.
`selectOnMatch`	Specifies a regular expression to filter records from the file before uploading. For example, the `^01.` expression will only upload records that start with `01`. Note: The Groovy method `selectRecord(String record)` can be coded instead of, or in addition to, this regular expression to programmatically check each record.
`excludeOnMatch`	Specifies a regular expression to filter records from the file before uploading. For example, the `^01.` expression will exclude records that start with `01`. Note: The Groovy method `excludeRecord(String record)` can be coded instead of, or in addition to, this regular expression to programmatically check each record.
`joinMatchers`	Specifies one or more regular expressions to filter the records that need to be joined. For syntax and examples, see joinMatchers syntax and examples. Note: The Groovy method `startRecordJoin(String record)` and `joinToLastRecord(String record)` can be coded instead of, or in addition to, this regular expression to programmatically check each record.
`joinDelimiter`	Only applies if the joinMatchers parameter is specified. Specifies the character separating joined records. Mandatory: No Default: `csvDelimiter` parameter value
`maxErrorCount`	Specifies the maximum allowable errors during file upload. Error messages are recorded in the `UA_RUN_ERROR` table for error and reconciliation reporting. Note: There are some cases where the task fails regardless of this setting. For example, a task fails if the CSV parser is unable to parse the file. Mandatory: No Default: `0` Set to `-1` to allow any number of errors.
`columnSelectors`	Specifies column names with regex expressions or Groovy bind methods for the values. Multiple expressions must be separated by a `,`. Supplements the `csvHeader` parameter by extracting values from the input data, which are not simply comma-delimited or fixed-positional, using regular expressions or Groovy methods. For syntax, see columnSelectors syntax.
`nullifyColumnOnNoMatch`	Only applies when one or more `columnSelectors` is specified. Not applicable to Groovy methods. When set to `true`, nulls a column if a regex does not match anything and it has no default specifier. Default: `false` An error message is displayed if no value can be determined.
`uploadPoolSize`	This is a performance setting and should be used carefully. Specifies the number of parallel threads to use to execute upload statements. Unlike the threads of multi-threaded executables in general, these upload threads are not distributed across the Hazelcast cluster. Instead, they are all directed to a single JVM, even if there are multiple cluster members. Only the database insert statements are threaded based on this parameter. If Groovy methods are called during the upload process, they will be executed in a single-parent thread. This means that if SQL statements are executed from Groovy, they will not be multi-threaded and can still be a bottleneck. The `com.ria.converter.upload.maxPoolSize` property defines the maximum value that this parameter can be set to and is itself restricted by the size of the Apache DB connection pool. The `com.ria.converter.upload.maxPoolSize` property and the `uploadPoolSize` parameter cannot be greater than the value of `com.ria.core.dbcp.maxTotal` minus 6. For example, if `com.ria.core.dbcp.maxTotal` is set to `20`, `uploadPoolSize` cannot be greater than `14`. If the parameter is set to `0` or `1`, no threading takes place, and all statements are executed in the parent thread using one database session. Default: value of `com.ria.core.dbcp.maxTotal` minus 6

For more information on other variables and syntax that can be used, see Common executable features.

moveToPath examples

Example	Description
`/archive/input-file-{$currentDateTime}.csv`	Inserts the current date and time in `YYYYMMddHHmmss` format into the file name.
`/archive/{}`	The `{}` is replaced by the file name specified in the `filePath` parameter. If a wildcard `filePath` matches more than one file, the `{}` is replaced by each matching file name, and all the matching are copied and renamed.
`/archive/{}-{$currentDateTime}`	The `{}` is replaced by the file name specified in the `filePath` parameter. The current date and time are appended to it.
`{}- {$jobName}-{$currentDateTime}`	The `{}` is replaced by the file name specified in the `filePath` parameter. The DBMS job name and the current date and time are appended to it.
`/archive/{#fileName}`	The `{#fileName}` is replaced by the fileName parameter value. The fileName parameter can be specified on the Executable or the Task.

csvHeader formats

Note: The csvHeader parameter applies to CSV and FIXED file formats.

Column headers

For CSV file formats, header names are positional. The first name is for the first column of data, the second name is for the second column of data, and so on.

Positional commas can be used to exclude columns from the input file. For example, NAME,,ADDRESS selects the 1st and 3rd columns and inserts them into the NAME and ADDRESS columns of the target table. The 2nd column is skipped.

For FIXED file formats, header names must be defined with an available header record or a start and length argument.

Header record provides the position arguments and can be used to format some or all of the input fields with functions enclosed in (). For example, NAME(INITCAP(?)) gets the start position and length of NAME from the input header record and then applies the INITCAP function when populating the NAME column.
The start argument is an offset from position 0, which means that the first character on a record starts at position 0. For example, NAME(0,30),ADDRESS(30,100) selects the first 30 characters for the NAME column, and the next 100 or until the end of the record for the ADDRESS column.

The length can be omitted to select until the end of the record. For example, ADDRESS(30) selects the characters from position 30 to the end of the record for the ADDRESS column.

Data type conversion

For CSV file formats, if the disableColumnTypeConversion parameter is set to false, the input values are automatically converted to the target column data types. However, you can explicitly format them by adding a DB formatting function with arguments in () after the field name. For example, EFFDT(to_date(?,’RRMMDD’)) converts the EFFDT field to the RRMMDD format. The ? in the function call is the bind placeholder for the field value.

For FIXED file formats, the default data type conversion can be overridden by defining a formatting function after the length argument, or the start argument if length is omitted. For example, EFFDT(130,6,TO_DATE(?,’RRMMDD’)) selects the date from position 130 for a length of 6 and converts it to a date format for the EFFDT column. Another example, EFFT(130,TO_DATE(?,’RRMMDD’)) does not define the length value, which is only acceptable for the last field on the record. Without a specific length, it selects data until the end of the record.

Target column

For CSV and FIXED file formats, a target column can be set to a literal value by specifying the value in round brackets after the column name. For example, EFFDT(TO_DATE(‘1900-01-01’,’YYYY-MM-DD’)) sets all the rows of the EFFDT column on the target table to have a value of 1900-01-01.

The literal value can be anything accepted as a database literal, but it cannot be a numeric digit because a number is interpreted as a start position. If you need to define a number, enclose it in quotes, for example, AMOUNT(‘100’). If a number needs additional formatting or transformation, use it with a function, for example, AMOUNT(TO_NUMBER(‘100’)).

joinMatchers syntax and examples

The syntax for the joinMatchers parameter is /regex/+/regex/+…, /regex/+/regex/+...,.

The regular expression must be specified between //. The /regex/ separated by + are the expressions used to filter the records that need to be joined. Multiple join expressions can be separated with a ,.

The joinMatchers parameter allows you to upload a record from a file, followed by one or more continuation records, into a single database row. For example, in a BAI file used for cash management balance reporting, a record type of 16 can be followed by one or more type 88 records, which are continuations of the record type 16. Use the /^16,195./+/^88./ regular expression to look for records that start with 16 and join it to any 88 record that immediately follows the 16 record.

Other records could be joined to the 16 and 88 records by appending + and additional regular expressions between //.

Other completely different record joins could be configured by appending , and the join regular expression. For example, to also append 03 record type and any immediately following 88 records, use the /^16,195./+/^88./, /^03,195./+/^88./ regular expression.

columnSelectors syntax

Regular expressions

The syntax for regular expressions is the following:

column/regex/group(SQL-func)|default, ...

Parameter	Description
`column/regex`	Mandatory: Yes
`group`	Regex group number to use for the value `(default, 0)`. Mandatory: No
`(SQL-func)`	SQL function to apply to the extracted value. Mandatory: No
`\|default`	Specifies the default value assigned to the column if a match is not found. Mandatory: No

Groovy methods

The syntax for Groovy methods is the following:

column{bindMethod}(SQL-func), ...

Parameter	Description
`column`	Specifies the name of the column before the first {. The record is the input record after any joins, as configured in joinMatchers, have been done. Mandatory: Yes
`{bindMethod}`	Specifies the bind method name in `{}` and will be invoked as `bindMethod(String column, String record)`. The bind method must return a value for the specified `column`. Mandatory: Yes
`(SQL-func)`	SQL function to apply to the returned value. Mandatory: No

File upload data blocks

Data blocks are defined within the file upload Task and can be used to specify upload parameters for multiple uploads within a single Task. With the selectOnMatch and excludeOnMatch parameters, Data Blocks allow files to be split into multiple record types or tables.

Without a Data Block, the file upload uses the parameter definitions for a single upload of a file or group of files that match a wildcard.

To perform multiple uploads in a single Task, you can create a Data Block for each upload with its own upload parameters. Any parameters that are not specified in a Data Block will be taken from the parameters defined for the Task. The parameters in the Data Block override the defaults that are specified as Parameters.

Data Blocks are also used to code Groovy methods for a specific or complex tasks. For more information, see Upload Groovy integration.

EBCDIC files

For integration interfaces, inbound data files can be sent as binary transferred mainframe files. This means:

The files are encoded in EBCDIC and not in ASCII.
The records are laid out in fixed-length blocks, with no end-of-record characters, such as linefeed, between logical records within a block.
The files can include packed decimal fields with two numeric digits per byte.

Packed decimals

The fields defined as COMP-3 in the COBOL copybooks are in packed decimal format. These cannot be directly converted from EBCDIC to ASCII because they must be interpreted as numeric fields, with each byte containing two digits, except for the right-most byte, which contains a digit and a sign.

Changes required in File Upload Task

The following components can be used in a Data Block.

Component	Description
File Format	For an EBCDIC file to be accepted, set `fileFormat = EBCDIC`.
Character Set	Converts the EBCDIC value to ASCII. If not specified, the default character set is `ibm500`.
Record Length	Specifies the length of a single record in the file. For example, `recordLength = 378`.
CSV Header	Specifies the column names, since a fixed record format is implied in EBCDIC. To support packed decimals, the length option must accept a suffix of `p` or `P` to indicate that the field is a packed decimal value. For example, the CSV header of a zip code in packed format is defined as: `csvHeader=customer_number(0,19), effdt(19,6,to_date(?,‘rrmmdd’)), zip(83,5p), web(117`)
SelectOnMatch	Uploads specific record types when a file has multiple record types. If you want to match records with a packed decimal field, provide hexadecimal characters in the regular expression. For example, for a record type 10 with a packed decimal field, you can use `selectOnMatch= ^.{25}\x01\x0f.`. For a record type 20, you can use `selectOnMatch= ^.{25}\x02\x0f.` . The 25 in both examples is the starting position of the record type field. For non-packed fields, you can use the following regular expression for Account number 233 and deposit cat D: `selectOnMatch= ^.{21}233D.*`.
ExcludeOnMatch	Use the same regular expression as described above, but for omitting the matched records.

File Upload

Overview

CSV files

Fixed files

Target tables

Parameters

moveToPath examples

csvHeader formats

joinMatchers syntax and examples

columnSelectors syntax

File upload data blocks

EBCDIC files

Packed decimals

Changes required in File Upload Task

Was this page helpful?

How can we improve this topic?

Your thoughts...