As enthusiastic fans of the Big Bang Theory sitcom, we enjoyed this year’s long winter evenings with our favorite TV characters. Again :) After returning to the office we were discussing various situations our fictional Caltech geniuses and their friends were facing. This way we came to the eleventh episode of the seventh season named "The Cooper Extraction", where Sheldon’s girlfriend Amy comes with the theory that none of Sheldon’s friends would be there if it wasn't for Sheldon. She even draws a diagram of the absent Sheldon premise to prove that without Sheldon the sequence of events that led to the creation of a happy relationship between Leonard and Penny or between Bernadette and Wolovitz would never have occurred (https://bigbangtheory.fandom.com/wiki/The_Cooper_Extraction?file=IWL1.png).

And here, we got inspired…

 

Generally, there are two possibilities regarding the source database (underlying Neo4j database):

  1. We can use the existing Neo4j database, that already contains requested data set with nodes and relationships in the defined data schema. Such a database could be easily integrated into a Graphlytic desktop, or with a certain level of expertise into Graphlytic server instance. 
  2. We can create a new Neo4j database with Graphlytic. There are three scenarios:
    • Manually create new characters (as nodes) and draw relationships among them directly in the Graphlytic graph visualization environment. This way we can create a new underlying Neo4j database using Graphlytic.
    • Use ETL job to import data set into Graphlytic. Administration rights are needed for this. ETL jobs can be created by editing XML. For more on how to run ETL jobs please refer to https://graphlytic.biz/doc/latest/ETL_jobs.html.
    • Use manual import of a set of characters and relationships into Graphlytic from any spreadsheet. Data (nodes or relationships) can be imported from CSV or XLS files

In this blog post, we will focus on the manual import.

First, we need to translate Amy’s diagram into a set of nodes that contains displayed characters and any set of properties. For this, we can use a simple spreadsheet. As an example, the following database properties can be mentioned – name, type (person/object/animal), gender, education level, and “works as/at”:

How to manually import nodes into Graphlytic from the spreadsheet.

Once the set of nodes is complete, we have to re-focus on Amy’s diagram. We need to go through depicted relationships carefully. Generally, when creating the set or relationships (arcs or edges), the point here is to reflect the drawing (in other words the reality) as is, not as we think it should be. So, in order to prepare the set of relationships equal to what has been drawn, we must gradually define all relationships shown. By creating relationships that have a designated start node (in our case “source_UID”) and an end (“target_UID) we create a so-called directed graph. In case we decide to add some database properties we need to keep the same point of view for the whole graph. We decided to add just one property - type:

How to manually import relationships into Graphlytic from the spreadsheet.

Having the set of nodes and set of properties prepared we’ll switch to Graphlytic environment to import the data - press button "Import data from file":

Button for manual import to Graphlytic

We start by importing the nodes. We use simple drag-and-drop for importing the spreadsheet:

Drag-and-drop the spreadsheet for importing the data set

This part is really simple. We just need to decide which properties to import. And define the first row to be a header:

Import of nodes into Graphlytic. Definition of properties.

When importing the relationships, we need a bit more manual effort. These steps have to be completed (all shown in attached print screens):

  • Again, we start with a button for manual import ("Import data from file"), plus using drag-and-drop for importing the spreadsheet with relationships into Graphlytic.
  • We click on Import as “Relationships”
  • We need to re-write “source_UID” to “UID” in the Column name. But first, we need to pick “source_UID” in Source node matching / Properties:

Re-writing “source_UID” to “UID” in the Column name

  • The same change will happen automatically in the field “Properties” in Source node matching (the “source_UID” will be changed to “UID”).
  • Now, we have to do the same with the “target_UID” in the Column name - re-write “target_UID” to “UID”. Again, first we pick “target_UID” in Target node matching / Properties:

Re-writing “target_UID” to “UID” in the Column name

Relationship types must be defined. Click at least one checkbox in the column Type.

  • Finally, we click on “Search and Manage Data” in Application Menu, where we have to re-index the full-text search configuration:

How to re-index the fulltext search configuration.

  • We define indexed properties:

Definition of indexed properties for re-indexing the fulltext search configuration.

  • And we get properly re-indexed full-text search:

Properly re-indexed fulltext search.

And here we go! :)

The Cooper Extraction visualization in Graphlytic