Table of Contents

Description

Uploads a CSV into SOLR instance configured in the hybrid data model configuration in the project's graph connection.

The hybrid model and integrated SOLR client needs to be enabled in the graph connection on the Application Settings page.

Connection

Defines connection to a SOLR instance.

Parameters

Parameter

Description

Default Value

Required

project_id

ID of the project with hybrid data model configuration to be used.


(tick)

upload_url

A part of the URL used by the Data Import Handler.

/upload

(error)

path_to_csv

Path to the CSV file to be uploaded. It must be accessible by GL on the filesystem.


(tick)

clear_data

Deletes all the documents from the SOLR instance.

false

(error)

type

Specifies, what SOLR configuration this driver should use (either "search" or "graph").

graph

(error)

separator

CSV file separator.

, (comma)

(error)

trim

If true, remove leading and trailing whitespace from values

false

(error)

header

Set to true if the first line of input contains field names. These will be used if the fieldnames parameter is absent. 

false

either header or fieldnames

fieldnames

Comma-separated list of field names to use when adding documents.


either header or fieldnames

skip

Comma separated list of field names to skip.


(error)

skip_lines

Number of lines to discard in the input stream before the CSV data starts, including the header, if present.

0

(error)

encapsulator

The "encapsulator" character is optionally used to surround values to preserve characters such as the CSV separator or whitespace. This standard CSV format handles the encapsulator itself appearing in an encapsulated value by doubling the encapsulator.


(error)

escape

The character is used for escaping CSV separators or other reserved characters. If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both.


(error)

keep_empty

Keep and index zero-length (empty) fields.

false

(error)

map

Map one value to another. The format is value:replacement (which can be empty). E.g: left:right


(error)

overwrite

If true (the default), check for and overwrite duplicate documents, based on the uniqueKey field declared in the SOLR schema. If you know the documents you are indexing do not contain any duplicates then you may see a considerable speed up setting this to false.

true

(error)

commit_within

Commit the document within the specified number of milliseconds. 

1000

(error)

rowid

Map the rowid (line number) to a field specified by the parameter's value, for instance, if your CSV doesn’t have a unique key and you want to use the row id as such. E.g.: rowid=entity_id


(error)

rowid_offset

Add the given offset (as an integer) to the rowid before adding it to the document. 

0

(error)


Query

Not available

Script

1) Deletes all documents from SOLR instance (if clear_data=true).

2) Uploads and imports a CSV from path_to_csv parameter.

Examples

Script example:  Simple import of some-data.csv.

Script example
<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
 
<description>Load CSV into SOLR instance</description>
 
<properties>
upload_url=/update
path_to_csv=/some/accessible/path/some-data.csv
</properties>
 
<connection id="solrImport" driver="solrCsvImport">
project_id=1
upload_url=$upload_url
path_to_csv=$path_to_csv
</connection>
 
<script connection-id="solrImport" />
 
</etl>


Script example:  Clears SOLR data, import some-data.csv. Uses "search" setting of SOLR hybrid model.

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
 
<description>Load CSV into SOLR instance</description>
 
<connection id="solrImport" driver="solrCsvImport">
project_id=1
upload_url=/update
path_to_csv=/some/accessible/path/some-data.csv
clear_data=true
type=search
</connection>
 
<script connection-id="solrImport" />
 
</etl>