File Upload
Overview
The FileUploadWorkEntryExecutor uploads a text file that can be a CSV, Fixed, EBCDIC, or XLSX file to a database table. For CSV files, the supported predefined formats are documented here.
CSV files
CSV files are files with fields separated by commas in between and with an optional header record with the names of the fields.
If a header record is present, it can be used for the target table column names. For example, the following is a CSV file with a valid header record:
CUST_NO,CUSTOMER_NAME,EFF_DT,AMOUNT
“00000001”,”Jane Doe”,20210101,100
“00000002”,”John Doe”,20210201,200
The file upload could use the first record to determine the column names for each field. However, if a header record is not present or a customer header definition is preferred, you need to define the csvHeader parameter to specify the column names. For example:
csvHeader = CUST_NO, CUSTOMER_NAME, EFF_DT(TO_DATE(?,’YYYYMMDD’), AMOUNT;
csvHeader = CUST_NO,, EFF_DT(TO_DATE(?,’YYYYMMDD’), AMOUNT(TO_NUMBER(?));
The second line in the example will omit the CUSTOMER_NAME column because of the two consecutive commas after CUST_NO.
Fixed files
Fixed-position files contain columns of fixed length. The FileUploadWorkEntryExecutor requires start and length values, which can be defined with the fields or derived from a header record if the positions of the header names coincide with their associated column values.
For example, the following header record is acceptable because each field name, CUST_NO, CUSTOMER_NAME, EFF_DT, starts in the same column as its corresponding value. There is also at least one space between the column names, which allows the FileUploadWorkEntryExecutor to distinguish the end of one column and the start of the next.
CUST_NO CUSTOMER_NAME EFF_DT AMOUNT
00000001 Jane Doe 2021010100000 100
00000002 John Doe 2021020100000 200
If a header record is not present or a customer header definition is preferred, you need to define the csvHeader parameter to specify the column names, start position, length, and optional formatting for each field. For example:
{% code overflow=“wrap” %}
csvHeader = CUST_NO(0, 8), CUSTOMER_NAME(10, 17), EFF_DT(27, 8, TO_DATE(?,’YYYYMMDD’), AMOUNT(35,8,TO_NUMBER(?));
{% endcode %}
If the header record has the correct positioning but some of the fields need to be formatted, you can specify the header names without the start and length positioning, but with an optional format argument. For example, the AMOUNT is specified without position arguments below.
{% code overflow=“wrap” %}
csvHeader = CUST_NO, CUSTOMER_NAME, EFF_DT(27,8,TO_DATE(?,’YYYYMMDD’), AMOUNT(TO_NUMBER(?));
{% endcode %}
For fixed files, if at least one field name has no position parameters in the csvHeader parameter, a header record must be present for the start and length to be derived.
Target tables
The target table should be manually created and is required to have appropriate column definitions before running an upload.
The fields in the input file are automatically converted to the data types of their corresponding target columns. For dates and timestamps, the default format in the database, such as NLS_DATE_FORMAT on Oracle, is only valid if the input field format conforms to that. Otherwise, it can be explicitly formatted. For example, the TO_DATE(?,’YYYYMMDD’) as used below.
{% code overflow=“wrap” %}
csvHeader = CUST_NO, CUSTOMER_NAME, EFF_DT(27,8,TO_DATE(?,’YYYYMMDD’), AMOUNT(TO_NUMBER(?));
{% endcode %}
disableColumnTypeConversion to true.
Parameters
This executable class accepts the following parameters:
| Parameter | Description |
|---|---|
filePath | Path of the input file that is accessible to the runtime server. The path can contain the * and ? wildcard characters. If the wildcard path matches more than one file and fileListCount is > 1, the files will be uploaded in succession and appended to the same table. For more information, see File upload data blocks.Mandatory: Yes |
tableName | Name of the target table. For more information, see Target tables. Mandatory: Yes |
truncateTable | If set to true, deletes the contents of the table before loading by performing a Default: |
fileListCount | Maximum number of files to upload if a wildcard filePath matches more than one file.Mandatory: No Default: 1Set to -1 for unlimited files. |
fileListOrder | Determines the order of the files if a wildcard filePath matches more than one file.Mandatory: No Default: aSet to d for descending order. |
ignoreFileNotFound | If set to true, a file upload task continue to completion even if a file is not found.Mandatory: No Default: false |
moveToDir | Specifies a directory where successfully uploaded files are moved. Do not define this parameter if If a move fails for any reason, it is considered a fatal error and the job will stop. If the job is on a schedule, it will be deactivated to prevent repeated, potentially harmful executions. |
moveToPath | Specifies a file path where successfully uploaded files are moved. Unlike the movetoDir parameter, you should define the moveToPath parameter with a full path.You can also use this parameter to rename the succesfully uploaded file. The full path can include an input file name placeholder {}, internal {$...}, and parameter variables {#...}. For more examples, see moveToPath examples.Do not define this parameter if moveToPath parameter is specified. |
fileFormat | Specifies the format of the input file. Possible values are:
Mandatory: No |
csvFormat | Only applies to
For more information on these CSV formats, see Enum CSVFormat.Predefined. |
csvDelimiter | Only applies to CSV fileFormat.Specifies the delimeter for CVS files. Mandatory: No Default: , |
csvHeader | Applies to both CSV and FIXED file formats. Specifies the column header names if the first record in the file does not have them, or if they are not to be used. If this parameter is not defined, the first record in the file must have the column names. For formats, see csvHeader formats. |
csvIgnoreEmptyLines | Skips empty lines, including lines containing only the delimeters, in the input file, if set to true.Mandatory: No Default: false |
csvSkipHeader | Skips header if the csvHeader parameter is specified, if set to true.If the csvHeader is empty, or if a field name is specified without position arguments for FIXED file formats, the input file must have a header in the first record, and the task ignores this parameter.Mandatory: No Default: false |
trimNonAscii | When set to Embedded non-ASCII characters are replaced by spaces to preserve the size and positioning of the ASCII portion of the input as much as possible. |
disableColumnTypeConversion | Disables automatic conversion of Java types to match the column types on tableName. All values are treated as strings. |
acceptFileOnMatch | Specifies a regular expression to match against each record in the input file. The file is processed if a match is found unless If both If either |
rejectFileOnMatch | Specifies a regular expression to match against each record in the input file. The file is not processed if a match is found. If both If either |
selectOnMatch | Specifies a regular expression to filter records from the file before uploading. For example, the ^01. expression will only upload records that start with 01. Note: The Groovy method selectRecord(String record) can be coded instead of, or in addition to, this regular expression to programmatically check each record. |
excludeOnMatch | Specifies a regular expression to filter records from the file before uploading. For example, the ^01. expression will exclude records that start with 01. Note: The Groovy method excludeRecord(String record) can be coded instead of, or in addition to, this regular expression to programmatically check each record. |
joinMatchers | Specifies one or more regular expressions to filter the records that need to be joined. For syntax and examples, see joinMatchers syntax and examples. |
joinDelimiter | Only applies if the joinMatchers parameter is specified. Default: |
maxErrorCount | Specifies the maximum allowable errors during file upload. Error messages are recorded in the UA_RUN_ERROR table for error and reconciliation reporting.Note: There are some cases where the task fails regardless of this setting. For example, a task fails if the CSV parser is unable to parse the file. Mandatory: No Default: 0Set to -1 to allow any number of errors. |
columnSelectors | Specifies column names with regex expressions or Groovy bind methods for the values. Multiple expressions must be separated by a ,.Supplements the csvHeader parameter by extracting values from the input data, which are not simply comma-delimited or fixed-positional, using regular expressions or Groovy methods. For syntax, see columnSelectors syntax. |
nullifyColumnOnNoMatch | Only applies when one or more columnSelectors is specified. Not applicable to Groovy methods.When set to true, nulls a column if a regex does not match anything and it has no default specifier.Default: falseAn error message is displayed if no value can be determined. |
uploadPoolSize | This is a performance setting and should be used carefully. Specifies the number of parallel threads to use to execute upload statements. Unlike the threads of multi-threaded executables in general, these upload threads are not distributed across the Hazelcast cluster. Instead, they are all directed to a single JVM, even if there are multiple cluster members. Only the database insert statements are threaded based on this parameter. If Groovy methods are called during the upload process, they will be executed in a single-parent thread. This means that if SQL statements are executed from Groovy, they will not be multi-threaded and can still be a bottleneck. The If the parameter is set to Default: value of |
For more information on other variables and syntax that can be used, see Common executable features.
moveToPath examples
| Example | Description |
|---|---|
/archive/input-file-{$currentDateTime}.csv | Inserts the current date and time in YYYYMMddHHmmss format into the file name. |
/archive/{} | The {} is replaced by the file name specified in the filePath parameter.If a wildcard filePath matches more than one file, the {} is replaced by each matching file name, and all the matching are copied and renamed. |
/archive/{}-{$currentDateTime} | The {} is replaced by the file name specified in the filePath parameter. The current date and time are appended to it. |
{}- {$jobName}-{$currentDateTime} | The {} is replaced by the file name specified in the filePath parameter. The DBMS job name and the current date and time are appended to it. |
/archive/{#fileName} | The {#fileName} is replaced by the fileName parameter value. The fileName parameter can be specified on the Executable or the Task. |
csvHeader formats
csvHeader parameter applies to CSV and FIXED file formats.
Column headers
For CSV file formats, header names are positional. The first name is for the first column of data, the second name is for the second column of data, and so on.
- Positional commas can be used to exclude columns from the input file. For example,
NAME,,ADDRESSselects the 1st and 3rd columns and inserts them into theNAMEandADDRESScolumns of the target table. The 2nd column is skipped.
For FIXED file formats, header names must be defined with an available header record or a start and length argument.
- Header record provides the position arguments and can be used to format some or all of the input fields with functions enclosed in
(). For example,NAME(INITCAP(?))gets the start position and length ofNAMEfrom the input header record and then applies theINITCAPfunction when populating theNAMEcolumn. - The start argument is an offset from position
0, which means that the first character on a record starts at position0. For example,NAME(0,30),ADDRESS(30,100)selects the first 30 characters for theNAMEcolumn, and the next 100 or until the end of the record for theADDRESScolumn.
The length can be omitted to select until the end of the record. For example,ADDRESS(30)selects the characters from position 30 to the end of the record for theADDRESScolumn.
Data type conversion
For CSV file formats, if the disableColumnTypeConversion parameter is set to false, the input values are automatically converted to the target column data types. However, you can explicitly format them by adding a DB formatting function with arguments in () after the field name. For example, EFFDT(to_date(?,’RRMMDD’)) converts the EFFDT field to the RRMMDD format. The ? in the function call is the bind placeholder for the field value.
For FIXED file formats, the default data type conversion can be overridden by defining a formatting function after the length argument, or the start argument if length is omitted. For example, EFFDT(130,6,TO_DATE(?,’RRMMDD’)) selects the date from position 130 for a length of 6 and converts it to a date format for the EFFDT column. Another example, EFFT(130,TO_DATE(?,’RRMMDD’)) does not define the length value, which is only acceptable for the last field on the record. Without a specific length, it selects data until the end of the record.
Target column
For CSV and FIXED file formats, a target column can be set to a literal value by specifying the value in round brackets after the column name. For example, EFFDT(TO_DATE(‘1900-01-01’,’YYYY-MM-DD’)) sets all the rows of the EFFDT column on the target table to have a value of 1900-01-01.
The literal value can be anything accepted as a database literal, but it cannot be a numeric digit because a number is interpreted as a start position. If you need to define a number, enclose it in quotes, for example, AMOUNT(‘100’). If a number needs additional formatting or transformation, use it with a function, for example, AMOUNT(TO_NUMBER(‘100’)).
joinMatchers syntax and examples
The syntax for the joinMatchers parameter is /regex/+/regex/+…, /regex/+/regex/+...,.
The regular expression must be specified between //. The /regex/ separated by + are the expressions used to filter the records that need to be joined. Multiple join expressions can be separated with a ,.
The joinMatchers parameter allows you to upload a record from a file, followed by one or more continuation records, into a single database row. For example, in a BAI file used for cash management balance reporting, a record type of 16 can be followed by one or more type 88 records, which are continuations of the record type 16. Use the /^16,195./+/^88./ regular expression to look for records that start with 16 and join it to any 88 record that immediately follows the 16 record.
Other records could be joined to the 16 and 88 records by appending + and additional regular expressions between //.
Other completely different record joins could be configured by appending , and the join regular expression. For example, to also append 03 record type and any immediately following 88 records, use the /^16,195./+/^88./, /^03,195./+/^88./ regular expression.
columnSelectors syntax
Regular expressions
The syntax for regular expressions is the following:
column/regex/group(SQL-func)|default, ...
| Parameter | Description |
|---|---|
column/regex |
Mandatory: Yes |
group |
Regex group number to use for the value |
(SQL-func) |
SQL function to apply to the extracted value. |
|default |
Specifies the default value assigned to the column if a match is not found. |
Groovy methods
The syntax for Groovy methods is the following:
column{bindMethod}(SQL-func), ...
| Parameter | Description |
|---|---|
column |
Specifies the name of the column before the first {. The record is the input record after any joins, as configured in joinMatchers, have been done. |
{bindMethod} |
Specifies the bind method name in |
(SQL-func) |
SQL function to apply to the returned value. |
File upload data blocks
Data blocks are defined within the file upload Task and can be used to specify upload parameters for multiple uploads within a single Task. With the selectOnMatch and excludeOnMatch parameters, Data Blocks allow files to be split into multiple record types or tables.
Without a Data Block, the file upload uses the parameter definitions for a single upload of a file or group of files that match a wildcard.
To perform multiple uploads in a single Task, you can create a Data Block for each upload with its own upload parameters. Any parameters that are not specified in a Data Block will be taken from the parameters defined for the Task. The parameters in the Data Block override the defaults that are specified as Parameters.
Data Blocks are also used to code Groovy methods for a specific or complex tasks. For more information, see Upload Groovy integration.
EBCDIC files
For integration interfaces, inbound data files can be sent as binary transferred mainframe files. This means:
- The files are encoded in EBCDIC and not in ASCII.
- The records are laid out in fixed-length blocks, with no end-of-record characters, such as linefeed, between logical records within a block.
- The files can include packed decimal fields with two numeric digits per byte.
Packed decimals
The fields defined as COMP-3 in the COBOL copybooks are in packed decimal format. These cannot be directly converted from EBCDIC to ASCII because they must be interpreted as numeric fields, with each byte containing two digits, except for the right-most byte, which contains a digit and a sign.
Changes required in File Upload Task
The following components can be used in a Data Block.
| Component | Description |
|---|---|
| File Format | For an EBCDIC file to be accepted, set fileFormat = EBCDIC. |
| Character Set | Converts the EBCDIC value to ASCII. If not specified, the default character set is ibm500. |
| Record Length | Specifies the length of a single record in the file. For example, recordLength = 378. |
| CSV Header | Specifies the column names, since a fixed record format is implied in EBCDIC. For example, the CSV header of a zip code in packed format is defined as:
|
| SelectOnMatch | Uploads specific record types when a file has multiple record types. If you want to match records with a packed decimal field, provide hexadecimal characters in the regular expression. For example, for a record type 10 with a packed decimal field, you can use For non-packed fields, you can use the following regular expression for Account number 233 and deposit cat D:
|
| ExcludeOnMatch | Use the same regular expression as described above, but for omitting the matched records. |