Siren Federate™

Join massive datasets in real time inside Elasticsearch.

Real-time, big data joins critically extend what Elasticsearch can do for your use cases. Developed for the needs of some of the most advanced organizations in the world, Siren Federate is now available to use in your applications too.

Extending the Elasticsearch DSL

Siren Federate™, extends the native Elasticsearch query language with semi, inner, and left joins.

The following example shows Siren’s enhanced Elasticsearch syntax executing a join across two indices. Here we would like to retrieve all the “articles” that mention “companies” whose name matches orient. The Siren Federate plugin introduces a new Elasticsearch filter named join.

$ curl -H 'Content-Type: application/json' 'http://localhost:9200/siren/articles/_search?pretty' -d '{
		   "query" : {
			  "join" : {                      1
				"indices" : ["companies"],    2
				"on" : ["mentions", "id"],    3
				"request" : {                 4
				  "query" : {
					"term" : {
					  "name" : "orient"
					}
				  }
				}
			  }
			}
		}'
  1. The join query clause
  2. The source indices ( companies )
  3. The clause specifying the paths for join keys in both source and target indices
  4. The search request that will be used to filter out companies
{
			 ' "hits" : {
				"total" : 2,
				"max_score" : 1.0,
				"hits" : [ {
				  "_index" : "articles",
				  "_type" : "article",
				  "_id" : "1",
				  "_score" : 1.0,
				  "_source":{ "title" : "The NoSQL database glut", "mentions" : ["1", "2"] }
				}, {
				  "_index" : "articles",
				  "_type" : "article",
				  "_id" : "3",
				  "_score" : 1.0,
				  "_source":{ "title" : "How to determine which NoSQL DBMS best fits your needs", "mentions" : ["2", "4"] }
				} ]
			  }
			}

Siren Federate

Performance & Scalability

Employing patented specialized algorithms, Siren Federate is highly optimized for fully distributed operations on top of Elasticsearch, yielding to low-latency, interactive response.

This enables innovative end user capabilities ( Siren Investigate relational drill-downs, for example ) as well as large real-time correlations for alerting and detection purposes.

Nodes

Near-linear scalability with a number of nodes

Cores per node

Near-linear scalability with a number of cores per node

JVM memory

No significant JVM memory taken away from regular Elasticsearch operations (uses off-heap memory)

Advanced caching

Advanced caching of intermediate computation (multi level) shines in multi user / interactive scenarios.

A closer look at Federate – Elasticsearch Plugin

Installs on new or existing clusters

Siren Federate is delivered as an Elasticsearch plug-in which can be simply added to existing deployments. The plug-in adds a new REST endpoint (/siren) where the extended Elasticsearch syntax API is provided with a new join query operator, thoughtfully integrated with both Search and Scroll APIs.

As no change happen to the original APIs, you can leverage the new capabilities your application by starting to query the new endpoint bit by bit, no big rewriting needed.

Distributed join strategies

Multiple distributed join strategies, each one catering to different scenarios, are included out of the box. The user can manually select which one to use or default back to the planner, allowing it to automatically decide what is the best solution given the scenario at hand.

Query plan optimization and execution

The Siren Federate query planner allows for multiple optimization steps such as the selection of the best join strategy based on statistical optimization, pushing search and aggregate operations down to the index, pushing join operations down to the remote data sources, and reusing computation across multiple query execution plans. Query operations are executed asynchronously and in parallel for better performance.

Query throttling, termination and cancellation

A priority-based query throttling mechanism to better control how cluster resources are shared across concurrent load is included out of the box. This resource-optimizing feature is further enhanced with the user’s ability to terminate a query early to limit the latency of complex queries and return partial results, as well as allowing for complex query plans to be canceled on demand to free up resources.

Memory management

Data is encoded into a columnar memory format and stored off-heap, reducing pressure on the JVM and enabling fast, efficient analytic operations. Data reading is done directly from the off-heap storage and decoded on-the-fly by means of zero-serialization; this removes any serialization overhead and zero-copy memory, reducing CPU cycles and memory bandwidth overhead.

Memory management

Data is encoded into a columnar memory format and stored off-heap, reducing pressure on the JVM and enabling fast, efficient analytic operations. Data reading is done directly from the off-heap storage and decoded on-the-fly by means of zero-serialization; this removes any serialization overhead and zero-copy memory, reducing CPU cycles and memory bandwidth overhead.

Siren Federate’s memory management allows for granular control of how much off-heap memory can be allocated per node, per query, and per query operator, while also having the inherent capability of terminating queries when the memory circuit breaker detects too many off-heap memory requests. In addition, the garbage collector automatically releases intermediate computational results and reclaims off-heap memory for decreased memory impact.

Semi-join operations and semantic caching

Results of semi-join operations are cached and reused across queries for millisecond response times, even when joining massive datasets. A user can expect sustained performance when incorporating or removing datasets thanks to Siren Federate’s semantic caching model, whereby the system caches an entry associated with a semantic definition of the query. This same definition allows for the detection of query fragments that can be answered with previously cached results, with the model also permitting Siren Federate to detect data changes and automatically discard stale cached results.

Distributed, small footprint and security

Cache storage is distributed across nodes allowing for complete horizontal scaling in line with the number of available nodes. System efficiency is further enhanced by encoding the results in a compact bit-array structure, enabling thousands of cached semi-join operations with minimal memory overhead. Cache storage is tightly integrated with popular security plug-ins, protecting data while maintaining maximum performance.

Elasticsearch + Federate = unprecedented investigative capabilities

Elasticsearch is arguably unparalleled for interactive search on huge amounts of structured, semi-structured and unstructured data. With Federate, you get the crucially missing ability to correlate, in real time, across data indices and visibility of data across backends.

Siren Federate use cases

The Federate plug-in is used across industries to build enhanced Elasticsearch-based applications for mission-critical use cases.

Cybersecurity & operational log monitoring

  • Get alerts if firewall traffic goes to malicious IPs
  • Correlate your LDAP directory with leaked credentials
  • Detect malware via correlation with malicious MD5s

Value: Instant notifications. No preprocessing/materialization. Maximum flexibility.

Fraud & financial crime

  • Continuously cross-check numbers and credentials
  • Rank cases and users by values of their transactions
  • Join data from reference datasets

Value: Cross-check all your records when new information arrives. Find patterns across indices via correlations.

Intelligence & law enforcement

  • Callers from area X at time 1 and area Y a time 2
  • Finding connections at scale between indices
  • Before/after event investigations at large scale

Value: Must have features for moving-target investigations. Interactive correlations enable free-form investigations. Ask much more powerful questions than otherwise possible.

Enterprise knowledge search & e-discovery

  • Rank results considering connected records
  • From “drill-downs” to “relational drill-downs”
  • Powerful notifications for “new connected content”

Value: Exceptionally valuable features for user-focused, interactive e-discovery and advanced enterprise search.