Table of Contents

1. Description

Executes the given query for each row of the provided MS Excel file. Allows to read from or write to MS Excel files of the "XLS" and "XLSX" formats.

Read files from a local disk or over HTTP, HTTPS, or FTP. Writes only to a local file.

Reading large datasets (more than 100.000 rows) can take a long time. Please consider using CSV format instead, e.g. ETL: CSV to Neo4j.

2. Connection

Defines a connection to a single MS Excel file.

2.1. Parameters

Name

Example

Description

format

mandatory

format=XLSX

MS Excel format. Values: XLS, XLSX.

path

mandatory

path=/path/to/file.xlsx

path=ftp://name:password@some.server/file.xlsx

path=https://some.server/file.xlsx

Path to the imported MS Excel file.

2.2. Example

<connection id="excel" driver="excel">
format=XLSX
path=/path/to/file.xlsx
</connection>

3. Query

Reads from an MS Excel file. Provides values in the form of

  • $row.<column> or

  • $row.<column>.<text|url|formula|formatted>

to be used in subsequent queries/scripts.

Lines containing no values in all columns (from column 0 to max_columns) are skipped.

3.1. Parameters

Name

Example

Description

sheet

mandatory

sheet=Sheet 1

Name of the imported MS Excel sheet.

skip_rows

skip_rows=10

The number of rows in the sheet that will be skipped. Default: 0

max_columns

max_columns=10

The maximal number of columns to read. Default: 255

escape_chars

escape_chars='"

In any output values all characters in this parameter will be escaped using "\". E.g if in your MS Excel file is value Jack O'Neil and you have set escape_chars=' then the output from the query will be Jack O\'Neil

The escape_chars parameter can have multiple characters configured. E.g. using escape_chars='" will escape all single quotation marks and also all double quotes.

3.2. Examples

Query example: Queries an excel file and inserts values into a Gremlin database. 

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Import excel - gremlin example</description>
 
<connection id="gremlin" driver="graphConnection">
project_id=1
</connection>
 
<connection id="excel" driver="excel">
format=XLSX
path=/path/to/file.xlsx
</connection>
<connection id="logInfo" driver="log">
level=INFO
</connection>
 
<query connection-id="excel">
sheet=Nodes
skip_rows=1
max_columns=6
<script connection-id="gremlin">
g.addV('$row.C').property('name', '$row.B').property('url', '$row.F.url')
</script>
 
<script connection-id="logInfo">
Imported value ($rownum): '$row.B', '$row.C', '$row.F.url'
</script>
</query>
</etl>

4. Script

Writes entries into an MS Excel file. The file will be created if not present within the path. The sheet will also be created, if not present in the file.

4.1. Parameters

Name

Example

Description

sheet mandatory

sheet=Sheet 1

Name of the connected MS Excel sheet.

line_number mandatory

line_number=1

line_number=${rownum + 10}

Line number in the MS Excel sheet where the editing of cells will be executed (numbering starts with 1, which is the first line in the Excel sheet). Default: 1

Usually, the line_number parameter is defined as an incrementing number $rownum, which is the number of iterations of the parent query. $rownum numbering starts with 1. To start writing at line 10 and write every iteration into a new line please use ${rownum + 10}.

append_mode xlsx only

append_mode=true

Applicable only for XLSX files.
append_mode=true allows writing data to a new or an existing excel file/sheet. It’s slow, as entire file has to be read in-memory.
append_mode=false is much faster, but allows only writing data to a new file or sheet. It’s possible to write data also into an existing sheet but only outside of area already used by MS Excel (this allows to append new rows into an existing sheet but only if the sheet has not been edited).
When no value is provided the default value is true.

row.X mandatory

row.AB=example value

Contains value to be inserted, where X is the column character in the Excel sheet.

row.X.type

row.AB.type=NUMERIC

Type of the cell to be created. Possible values: STRING, BOOLEAN, NUMERIC, FORMULA, BLANK

4.2. Examples

Script example: Copy values from one MS Excel file (excel_1) into another (excel_2). The sheet will be created if it does not exist yet.

<query connection-id="excel_1">
sheet=input sheet
skip_rows=0
max_columns=3
<script connection-id="excel_2">
sheet=incrementing example
line_number=${rownum + 10}
row.A=$1
row.B=$2
row.C=$3
</script>
</query>

Script example: Insert cells into the 'test sheet' in the connected MS Excel file. The sheet will be created if it does not exist yet.

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
<description>Inserts entries into given excel file.</description>
 
<properties>
output=/path/to/excel/file/excel.xlsx
</properties>
<connection id="excel_in" driver="excel">
format=XLSX
path=$output
</connection>        
 
<script connection-id="excel_in">
sheet=test sheet
line_number=0
row.A=Test1
row.B=Test2
row.C=5
row.C.type=NUMERIC
</script>
 
<script connection-id="excel_in">
sheet=test sheet
line_number=1
row.A=Test3
row.B=Test4
row.C=6
row.C.type=NUMERIC
</script>
 
<script connection-id="excel_in">
sheet=test sheet
line_number=0
row.E=SUM(C1:C2)
row.E.type=FORMULA
</script>
 
</etl>