When are graph DBs indispensable?
Many fraud detection systems rely on checking for relationships between domain entities.
To fight review fraud, for example, we might want to discard reviews of an item made by the creator of that item, such as a movie. More subtly, we might also want to discard reviews that were made by his or her friends.
While the above are fixed-length relational queries, which are perfectly suitable in a relational DB, sometimes anti-fraud checks need to use variable-length relational queries. For example, we might want to discard any review from reviewers that are connected by two or more degrees of separation, regardless of the relationship – a friend, a co-writer, a spouse, and so on.
For this last class of queries, graph databases with their traversal-optimized data structures are often the only realistic solution – let alone the ease and flexibility that their graph-centric query languages allow when asking these sort of questions.
Furthermore, graph databases are great at calculating other important queries that return the importance of nodes with respect to centrality and other useful node and network metrics.
Support for graph databases in Siren Platform 10.2: The Neo4j connector
Siren Platform version 10.2 introduces Siren’s first level of support for graph databases.
This might be surprising: Have we not already demonstrated that to get the benefit of knowledge graphs, you don’t need to move all of the data to a graph database?
Yes we have.
In the Siren Platform, you can simply connect the tables of any RDBMS (or Elasticsearch or other back-end table) and, by using the definition of a simple data model, you can enjoy scaled navigation, examination, search, and visual link analysis. (See our videos for more information).
However, as discussed, there are use cases where graph databases are very useful. And for these we now have a connector.
Now, with Siren Platform you can:
- Connect to a Neo4J database and browse its data in Siren dashboards, with full-speed, set-to-set relational navigation, explorative link analysis, high quality textual discovery, and alerting.
- See Neo4J data interconnected with data of other systems. Do you have your users or assets mapped in Neo4J, your big web or infrastructure logs mapped in Elasticsearch, and your inventory in Oracle Database? Ask questions across data sets without moving the big data from where it is!
- Use Neo4J’s unique power to find the shortest path, cycles, and calculate graph metrics, all while using the power of the Siren back-end system (our supercharged Elasticsearch) for ultra-fast analytics and search.
Since our graph database support is in its early stages, there are some limitations that we’ll discuss toward the end.
How it works
A graph database like Neo4J uses a model known as a property graph. This is data stored in either edges or nodes, which can be considered much like database tables that are relationally connected to each other. In Neo4J, nodes or edges can contain JSON syntax, so the database allows complex fields.
Siren Platform version 10.2 includes a Neo4J datasource, which can be used for reflection jobs. This can be used to stay in sync with selected slices of Neo4J, where one slice represents a node or an edge type.
While you could set these reflection jobs manually, we provide a script to fully automate the process.
The procedure does the following:
- Analyzes the Neo4J schema to assess which types, relations, and which attributes for types and relations exist.
- Creates a series of reflection jobs. These are scheduled jobs that create for each relation or type a corresponding populated index in Siren Platform. These reflection jobs can be executed on a one-off basis or can be scheduled to automatically rerun.
- Creates the data model in Siren Platform:
- It creates an index pattern searches for each node or entity type. From here, it is easy to create analytic dashboards and visualizations.
- It populates the relations in the Siren data model with the same relations those entities and node records have in Neo4J. This enables relational navigation across dashboards and in the Link Analysis tool.
Once this procedure is executed, dashboards can be generated in Siren Platform from the data now contained in index pattern searches and graph analytics can be executed.
- A Neo4J database with data in it.
- A Siren Platform installation, such as the “c10”>Siren Community Edition. If you’re trying out Siren Platform for the first time, we recommend that you download the “no data no security” package.
- A command line interface with Node and NPM installed. Check that you have them installed by typing “node” and “npm” in your command line window. This means you are running this on Linux or you’re a quite advanced Windows user.
In this example, we will detect review fraud by using the following schema:
- Download and unzip the NeoNode package – for Windows, for Linux/macOS (OS X).
- Navigate to the directory where you saved the unzipped files.
- Open a command window in the directory as “c18”>admin/su.
- Run the command npm install. Ensure that you have node and npm installed.
- Download the following JDBC jar files and save them in the Elasticsearch node, in the elasticsearch/config/jdbc-drivers directory:
- Add the following line to the elasticsearch.yml file:
- Restart Elasticsearch.
- Create a new datasource in Siren Platform by connecting to Neo4J with JDBC:
- Open the config.json file (present in the NeoNode package) and set the connection properties for Neo4J and Siren.
Note: Currently, the Neo4J connection data needs to be entered twice; once here for the script and once for the Siren datasource configuration. Siren Platform version 10.3 will include a configuration wizard to remove the need for external command line operation.
- Run the application by using the command >node app. Leave the command window open and allow the script to run.
- In Siren Investigate, go to the >Data Reflections app and click >Datasource reflection jobs to view all created ingestion jobs (this list will be empty if none are set up).
- Run each reflection job by pressing the play button in the >Actions column
- After all jobs are complete, return to the command window where the Neo4j -Siren application is running.
- Type ‘Y’ and click >Enter to complete the job.
- Return to Siren Platform and go to the >Management app. Click the> Data Model tab, where you can see all indices that were created and all of the relationships that were found.From this screen, select icons for each search and enter a label in the Label when visualized in the graph browser field.
- Click on the data model graph tab to see the data model. It is recreated in Siren Platform automatically from the Neo4J structure.
Creating dashboards and navigating the graph with Siren Platform link analysis
Now that Index Pattern Searches and a relational data model is in place, all you have to do is to create dashboards and use the Link Analysis navigator.
To learn how to do this, we recommend that you complete our getting started tutorial. However, the quickest way is by using our automatic dashboard creation wizard:
- In the Management app, click the Data Model tab and select the search that you want to create a dashboard for.
- Click the Data tab and either select the fields that you’d like to create widgets for or simply click Autoselect Most Relevant.
- Click Create dashboard.
- Repeat steps 1-3 for other index patterns of interest. You might want to do this for what previously were edges, such as “acted_in”. This will give you the dashboard-to-dashboard navigation capabilities.
Now, you can explore Siren Platform’s unique link analysis capabilities:
And now… Let’s hunt down the fraudsters
As we mentioned earlier, the main reasons to use graph databases are to run arbitrary-length path queries efficiently or to execute graph metrics.
In fraud detection, an investigator might want to detect a suspect circle of users. For example, users that are connected to each other by at least two fraudulent hub nodes.
Neo4J is efficient at calculating large-scale graph metrics, such as centrality, page rank and others. These can be useful metrics to assess the relative importance of a node.
How can you avail of these capabilities in Siren Platform?
For pattern detection
You create a new data ingestion where the pattern-detecting query (in cypher) is used as the input.
This creates an index in Elasticsearch with a list of suspects. The query can then be re-executed manually or automatically by implementing scheduling.
In the example below, we specified a datasource query to identify “reviewers who are also authors”:
After this special data ingestion is performed, you create an index pattern search for this newly created index. In the data model, you link the keys to the records that have already been ingested, for example, link the table “suspect_circles” and the field “IDs” with the “persons” field ID.
These results are easy to use in Siren Platform, both in dashboard-to-dashboard relational navigation and on the graph. For example, simply import the query result index in the graph and expand it to reveal the records that are involved in the circles.
You can repeat the procedure for as many fraudulent patterns as required. And the data reflection jobs can be scheduled to run at set intervals.
For graph metrics to be shown in Siren Platform
You can modify the reflection jobs that you have created to produce a graph metric result, such as centrality, which is written as an extra field at the reflection phase. This can then be leveraged in dashboards or on the graph, for example, you can use it in a lens to make a node look bigger.
Conclusion: Using the best tool for the job
In the past two years, graph databases have gained more and more momentum. While, occasionally, one might encounter some very far-reaching graph database projects, it is clear that IT departments won’t be ditching Hadoop, Spark, or Elasticsearch for big data logs anytime soon – nor should they.
One of Siren Platform’s unique and exciting aspects is that it allows navigating knowledge graphs without the need for an extract-transform-load (ETL) process for graph databases.
In the Siren Platform, the graph is in the data you already have and you can also leverage the back-end system that you already use. For more information, see our blog post about Siren Platform’s “virtualization or reflection” capabilities.
For example, the following screenshot shows aggregates on call data records that are stored in a connected Elasticsearch index. In this case, no graph database was used:
In Siren Platform version 10.2, we are excited to share our support for graph databases with the popular Neo4J.
Siren Platform can now employ “the best tool for the job” for financial fraud detection, law enforcement, intelligence, and more.
For the end user, the boundary is invisible: one sees data which is in graph DBs simply connected to that which is in other back ends, both in the link analysis and in Siren’s signature “dashboard to dashboard” navigation.
We have a rich road map of improvement with respect to graph database support:
- Siren Platform version 10.3 (Q3 2019)
- The steps described above will be fully integrated in the UI.
- Introduction of experimental support for other graph databases.
- Siren Platform version 10.4 (Q4 2019):
- Ability to use live parametric queries in the UI as part of our forthcoming new generic web service support.
- Ability to use the graph database superpowers on data that was originally in other systems. This is made possible by pushing selected “slices of data”, such as the relational skeleton into a connected graph database.
Do you like what you see? Would you like to see support for another specific graph database?
We look forward to hearing from you and we support your success in your Enterprise Knowledge Graph project.