Table of Contents

Description

Uploads a CSV into SOLR instance configured in the hybrid data model configuration in the project's graph connection.

The hybrid model and integrated SOLR client needs to be enabled in the graph connection on the Application Settings page.

Connection

Defines connection to a SOLR instance.

Parameters

Parameter	Description	Default Value	Required
project_id	ID of the project with hybrid data model configuration to be used.
upload_url	A part of the URL used by the Data Import Handler.	/upload
path_to_csv	Path to the CSV file to be uploaded. It must be accessible by GL on the filesystem.
clear_data	Deletes all the documents from the SOLR instance.	false
type	Specifies, what SOLR configuration this driver should use (either "search" or "graph").	graph
separator	CSV file separator.	, (comma)
trim	If `true`, remove leading and trailing whitespace from values	false
header	Set to `true` if the first line of input contains field names. These will be used if the `fieldnames` parameter is absent.	false	either header or fieldnames
fieldnames	Comma-separated list of field names to use when adding documents.		either header or fieldnames
skip	Comma separated list of field names to skip.
skip_lines	Number of lines to discard in the input stream before the CSV data starts, including the header, if present.	0
encapsulator	The "encapsulator" character is optionally used to surround values to preserve characters such as the CSV separator or whitespace. This standard CSV format handles the encapsulator itself appearing in an encapsulated value by doubling the encapsulator.
escape	The character is used for escaping CSV separators or other reserved characters. If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both.
keep_empty	Keep and index zero-length (empty) fields.	false
map	Map one value to another. The format is value:replacement (which can be empty). E.g: `left:right`
overwrite	If `true` (the default), check for and overwrite duplicate documents, based on the uniqueKey field declared in the SOLR schema. If you know the documents you are indexing do not contain any duplicates then you may see a considerable speed up setting this to `false`.	true
commit_within	Commit the document within the specified number of milliseconds.	1000
rowid	Map the `rowid` (line number) to a field specified by the parameter's value, for instance, if your CSV doesn’t have a unique key and you want to use the row id as such. E.g.: rowid=entity_id
rowid_offset	Add the given offset (as an integer) to the `rowid` before adding it to the document.	0

Query

Not available

Script

1) Deletes all documents from SOLR instance (if clear_data=true).

2) Uploads and imports a CSV from path_to_csv parameter.

Examples

Script example: Simple import of some-data.csv.

Script example

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
 
    <description>Load CSV into SOLR instance</description>
 
    <properties>
        upload_url=/update
        path_to_csv=/some/accessible/path/some-data.csv
    </properties>
 
    <connection id="solrImport" driver="solrCsvImport">
        project_id=1
        upload_url=$upload_url
        path_to_csv=$path_to_csv
    </connection>
 
    <script connection-id="solrImport" />
 
</etl>

Script example: Clears SOLR data, import some-data.csv. Uses "search" setting of SOLR hybrid model.

<!DOCTYPE etl SYSTEM "https://scriptella.org/dtd/etl.dtd">
<etl>
 
    <description>Load CSV into SOLR instance</description>
 
    <connection id="solrImport" driver="solrCsvImport">
        project_id=1
        upload_url=/update
        path_to_csv=/some/accessible/path/some-data.csv
        clear_data=true
        type=search
    </connection>
 
    <script connection-id="solrImport" />
 
</etl>