An Introduction to Siren Entity Resolution

Published: Thursday, September 15th, 2022

Entity Resolution is the AI capability to recognize that two or more records might be referring to the same real world entity (e.g. a person or company) or be significantly related. Siren ER integrates Senzing Entity Resolution software into the Siren platform allowing resolution of records from different data sources with different schemas in real time. Using Siren ER we can have a data model where real world entities () are represented by one Entity Table, entity2record, whose records connect back to any input records () that are deemed to be the same as that entity, and entity relations () are represented by another Entity Table, entityrels, which connects to two entities that are related by sharing certain details like addresses, but are not the same.

Siren ER consists of a pipeline with several components, that can take records in (or in the process of being ingested into) an Elasticsearch index like customer, and have the record represented in Entities and Entity Relations. The entity resolution is performed by Senzing Stream Loader, which stores all resolution data in a relational database, typically PostgreSQL. In order to smoothly facilitate high throughput and real-time processing and syncing of the database with Elasticsearch a messaging system is required, typically RabbitMQ. Additionally, two Logstash pipelines are used, one to forward data to RabbitMQ, and a second to sync the most recently updated data from PostgreSQL to Elasticsearch.

  1. When a customer record is ingested into Elasticsearch, we can use an Elasticsearch ingest pipeline to extract a Senzing-compatible JSON object using a script processor then send that object to Logstash with an HTTP request using the json-ws processor.
  2. A Logstash pipeline, configured with an http input plugin and a rabbitmq output plugin, receives the customer object and forwards it onto the RabbitMQ load-queue.
  3. Senzing Stream Loader consumes messages (customer objects) from the RabbitMQ load-queue.
  4. Senzing Stream Loader performs the entity resolution and updates the PostgreSQL database accordingly.
  5. Senzing Stream Loader adds messages about any entities affected during the resolution to the RabbitMQ info-queue, or if the record failed to process, the RabbitMQ failure-queue.
  6. A Logstash pipeline configured with a rabbitmq input plugin consumes messages from the RabbitMQ info-queue.
  7. The same Logstash pipeline uses a JDBC_streaming filter plugin to query PostgreSQL for the latest data for the resolved entities which were referenced in the info-queue messages.
  8. The Logstash pipeline overwrites Entity records in Elasticsearch with up to date data for the entities referenced in the info-queue messages, for example new records for a particular entity. If any entities are missing from PostgreSQL, for example if two entities are merged and one is removed,  they will be deleted from Elasticsearch.
  9. Entity records contain an array field of record ids  allowing a relation to the original records. Additionally, entities can be associated with other entities by Entity Relation records.

When data has been processed and synced back into entity2record and entityrels we can make relations between the appropriate records.

We can now start with one customer document in the Graph browser (Eddie Kusha), expand to see the Entity it has resolved to, and expand that to see any other customer records (Edward Kusha) that represent the same person. Then we can view related entities and their customer records, for example an entity with a Marsha Kusha customer record has the same SURNAME, ADDRESS and  EMAIL, but different DOB and SSN.

Adding all customer records to the Graph Browser and fully expanding their network we see here that Senzing has detected that 8 customer records are in fact 4 entities which are related through certain shared attributes.

Take a look at this video from our YouTube channel to see Siren Entity Resolution in action:


Real time entity resolution is a critical AI capability for investigators that need to identify records that represent the same or related real world entities.

By resolving user records to entity and entity relation records in Elasticsearch, Siren ER allows entities and their relations to be visualised as part of their existing data model.


Getting Started with Siren ER