Siren Federate™

The Siren Federate plug-in extends the Elasticsearch APIs adding high-performance and scalable joins.
Also adds "virtualization": see remote DB tables as local virtual indexes.

Join and Virtualization plug-in for Elasticsearch

Build enhanced Elasticsearch applications thanks to an extended API which adds high-performance relational join capabilities.

High Performance Joins
Relational join operations by extension of the Elasticsearch query language. Multiple Patents pending.

Virtualization
Map external JDBC datasources to Elasticsearch “Virtual Index”. No ETL required.

Join massive datasets in real time, inside Elasticsearch

Real-time, big data joins critically extend what Elasticsearch can do for your use cases. Developed for the needs of some of the most advanced organizations in the world, Siren Federate is now available to use in your applications too.

An example of a data model created with the Siren Platform and used to coordinate Siren Federate for joining across different Elasticsearch indices (native or virtual)

Extending the Elasticsearch DSL

Siren Federate™, extends the native Elasticsearch query language with semi, inner, and left joins.

The following example shows Siren’s enhanced Elasticsearch syntax executing a join across two indices. Here we would like to retrieve all the “articles” that mention “companies” whose name matches orient. The Siren Federate plugin introduces a new Elasticsearch filter named join.

$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/articles/_search?pretty' -d '{
   "query" : {
      "join" : {                      1
        "indices" : ["companies"],    2
        "on" : ["mentions", "id"],    3
        "request" : {                 4
          "query" : {
            "term" : {
              "name" : "orient"
            }
          }
        }
      }
    }
}'
					
1 The join query clause
2 The source indices ( companies )
3 The clause specifying the paths for join keys in both source and target indices
4 The search request that will be used to filter out companies

The command should return the following response with two search hits:

{
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "articles",
      "_type" : "article",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{ "title" : "The NoSQL database glut", "mentions" : ["1", "2"] }
    }, {
      "_index" : "articles",
      "_type" : "article",
      "_id" : "3",
      "_score" : 1.0,
      "_source":{ "title" : "How to determine which NoSQL DBMS best fits your needs", "mentions" : ["2", "4"] }
    } ]
  }
}
								

Find more examples in the docs.

Performance and Scalability

With years of patent-pending R&D, Siren Federate is highly optimized for low-latency, interactive response.

This enables innovative end user capabilities (Siren Investigate relational drill-downs, for example) as well as large real-time correlations for alerting and detection purposes.

Near-linear scalability with a number of nodes

Near-linear scalability with a number of cores per node

No significant JVM memory taken away from regular Elasticsearch operations (uses off-heap memory)

Advanced caching of intermediate computation (multi level) shines in multi user / interactive scenarios.

Performance benchmarks

Federate Virtualization: Map remote JDBC systems as “virtual” indexes

Siren Federate also adds “virtualization” capabilities, providing uniform access to remote data sources via JDBC, together with an optimized query push down into the same systems; this enables true single-pane-of-glass investigative analytics across all of your organization’s data, while keeping it where it is.

A closer look at Federate

Installs on new or existing clusters

Siren Federate is delivered as an Elasticsearch plug-in which can be simply added to existing deployments. The plug-in adds a new REST endpoint (/siren) where the extended Elasticsearch syntax API is provided with a new join query operator, thoughtfully integrated with both Search and Scroll APIs.

As no change happen to the original APIs, you can leverage the new capabilities your application by starting to query the new endpoint bit by bit, no big rewriting needed.

Distributed join strategies

Multiple distributed join strategies, each one catering to different scenarios, are included out of the box. The user can manually select which one to use or default back to the planner, allowing it to automatically decide what is the best solution given the scenario at hand.

Query plan optimization and execution

The Siren Federate query planner allows for multiple optimization steps such as the selection of the best join strategy based on statistical optimization, pushing search and aggregate operations down to the index, pushing join operations down to the remote data sources, and reusing computation across multiple query execution plans. Query operations are executed asynchronously and in parallel for better performance.

Query throttling, termination and cancellation

A priority-based query throttling mechanism to better control how cluster resources are shared across concurrent load is included out of the box. This resource-optimizing feature is further enhanced with the user’s ability to terminate a query early to limit the latency of complex queries and return partial results, as well as allowing for complex query plans to be canceled on demand to free up resources.

Memory management

Data is encoded into a columnar memory format and stored off-heap, reducing pressure on the JVM and enabling fast, efficient analytic operations. Data reading is done directly from the off-heap storage and decoded on-the-fly by means of zero-serialization; this removes any serialization overhead and zero-copy memory, reducing CPU cycles and memory bandwidth overhead.

Siren Federate’s memory management allows for granular control of how much off-heap memory can be allocated per node, per query, and per query operator, while also having the inherent capability of terminating queries when the memory circuit breaker detects too many off-heap memory requests. In addition, the garbage collector automatically releases intermediate computational results and reclaims off-heap memory for decreased memory impact.

Semi-join operations and semantic caching

Results of semi-join operations are cached and reused across queries for millisecond response times, even when joining massive datasets. A user can expect sustained performance when incorporating or removing datasets thanks to Siren Federate’s semantic caching model, whereby the system caches an entry associated with a semantic definition of the query. This same definition allows for the detection of query fragments that can be answered with previously cached results, with the model also permitting Siren Federate to detect data changes and automatically discard stale cached results.

Distributed, small footprint and security

Cache storage is distributed across nodes allowing for complete horizontal scaling in line with the number of available nodes. System efficiency is further enhanced by encoding the results in a compact bit-array structure, enabling thousands of cached semi-join operations with minimal memory overhead. Cache storage is tightly integrated with popular security plug-ins, protecting data while maintaining maximum performance.

More on federation and “Virtual Indexes”

Siren Federate, via JDBC connectors, allows mapping of remote tables to local “virtual” Elasticsearch indices.

Using the REST APIs, you can create “data sources connections” to RDBMS such as PostgreSQL, Oracle, Microsoft SQL Server and Mysql. You can also connect to virtual data warehouses like Dremio, Denodo, Spark, Impala, as well as other specialized systems such as Neo4j.

After establishing a connection, Federate can create virtual indices for the majority of the backends from tables which are mapped live via real-time query and result rewriting. This enables front-end clients to leverage the Elasticsearch APIs to communicate with a multitude of remote data sources in a uniform way.

Optionally: “reflect” remote data in Elasticsearch

Sometimes remote systems are inherently too slow for real-time virtualization. Or you may want to be able to perform Elasticsearch-unique operations on remote data.

In this case Federate provides reflection capabilities: automated, scheduled ingestion of the remote tables within Elasticsearch itself.

When virtualized tables are reflected, you can use all the Elasticsearch/Siren Federate operators, enabling full-text search, fast analytics, and large- scale joins. Read more on “virtualize or reflect”.

Elasticsearch + Federate = unprecedented investigative capabilities

Elasticsearch is arguably unparalleled for interactive search on huge amounts of structured, semi-structured and unstructured data. With Federate, you get the crucially missing ability to correlate, in real time, across data indices and visibility of data across backends.

Use cases

The Federate plug-in is used across industries to build enhanced Elasticsearch-based applications for mission-critical use cases.

IndustryFederate enabled use caseValue
Cybersecurity
& operational log monitoring
  • Get alerts if firewall traffic goes to malicious IPs
  • Correlate your LDAP directory with leaked credentials
  • Detect malware via correlation with malicious MD5s
Instant notifications. No preprocessing/materialization. Maximum flexibility.
Fraud
& financial crime
  • Continuously cross-check numbers and credentials
  • Rank cases and users by values of their transactions
  • Join data from reference datasets
Cross-check all your records when new information arrives. Find patterns across indices via correlations.
Intelligence
& law enforcement
  • Callers from area X at time 1 and area Y a time 2
  • Finding connections at scale between indices
  • Before/after event investigations at large scale
Must have features for moving-target investigations. Interactive correlations enable free-form investigations. Ask much more powerful questions than otherwise possible.
Enterprise knowledge search
and e-discovery
  • Rank results considering connected records
  • From “drill-downs” to “relational drill-downs”
  • Powerful notifications for “new connected content”
Exceptionally valuable features for user-focused, interactive e-discovery and advanced enterprise search.

Siren Federate

Read Siren Federate documentation

Need to license more nodes?

Looking for a UI to fully leverage the power of Federate?