1. Description

Executes the given query for each row of provided CSV file. Allows to read from or write to CSV files.

2. Connection

Defines a connection to a single CSV file.

As an “url” parameter, please use an absolute path to the CSV file.

2.1. Parameters

Name

Mandatory

Default

Description

encoding

No

The system default encoding is used.

Specifies charset encoding of CSV files.

separator

No

The default value is ,.

The delimiter to use for separating entries when reading from or writing to files.

quote

No

The default value is ".

The character to use for quoted elements when reading from or writing to files. Use empty string to suppress quoting.

headers

No

The default value is true (first line contains column names).

Value of true means the first line contains headers.

Only valid for <query> elements.

eol

No

The default value is \n.

End-Of-Line suffix.

Only valid for <script> elements.

trim

No

The default value is true.

Value of true specifies that the leading and trailing whitespaces in CSV fields should be omitted.

skip_lines

No

The default value is 0 (no lines are skipped).

The number of lines to skip before start reading. Please note that if headers=true, the actual number of skipped lines is skipped_lines+1

null_string

By default strings are preserved, i.e. empty strings are not converted to nulls and null variables references are not expanded in the output, i.e. ${nullvalue}.

Specifies string token to represent Java null literal.

When querying a CSV file, cells with content equal to null_string are parsed to Java nulls.
When outputting content, if null_string is set, all the missing variables, or the ones with a null value are substituted with null_string.

Specify an empty string (null_string=) to automatically convert between nulls in memory and empty strings in files. For example: CSV line 1,,5 is parsed to a set of 3 variables with the following values {"1", null, "5"} as opposed to the default behaviour {"1","","5"}.

2.2. Connection Example

<properties>
path_to_file=/some/path/file.csv
</properties>
 
<connection id="csv_file" driver="csv" url="$path_to_file">
separator=;
quote="
</connection>

3. Query

The driver utilizes a query-by-example approach for CSV content filtering. If you want to read the whole CSV simply use an empty query element, otherwise, specify a set of lines containing comma-separated case-insensitive regular expressions.
Example:

,foo
,,^Bar$

This query-by-example filter selects rows where the third column equals "Bar" or the second column contains the "foo" substring.

The columns of the matched row can be referenced by name in nested queries/scripts. It is also possible to reference columns by an index, i.e. $1, $2, ...

IMPORTANT: Always use commas as a column separator inside <script> and <query> elements, regardless of the separator used in files being parsed or produced. This decision allows switching between different formats like tab or semicolon while keeping scripts and queries unchanged.

3.1. Query example

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Read csv example</description>
 
<connection id="csv_file" driver="csv" url="example.csv"/>
<connection id="logInfo" driver="log">
level=INFO
</connection>
 
<query connection-id="csv_file">
<!-- empty query element reads all csv values -->
<script connection-id="logInfo">
[$rownum]: '$id', '$username', '$timestamp'
</script>
</query>
<query connection-id="csv_file">
<!-- given csv structure: id, username, timestamp.
reads all csv lines, where username contains "foo",
or timestamp starts with 2023-->
,foo
,,^2023
<script connection-id="logInfo">
[$rownum]: '$1', '$2', '$3'
</script>
</query>
</etl>

4. Script

Specify a comma-separated set of columns.

IMPORTANT: Always use commas as a column separator inside <script> and <query> elements, regardless of the separator used in files being parsed or produced. This decision allows switching between different formats like tab or semicolon while keeping scripts and queries unchanged.

4.1. Script example

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Write csv example</description>
 
<connection id="csv_write_example" driver="csv" url="example.csv"/>
 
<script connection-id="csv_write_example">
id,username,timestamp
1,admin,2023-01-01 00:00:00
2,read-only,2023-08-02 10:07:50
</script>
</etl>

This script writes 3 lines to the output file.