We’re very excited to announce that today we’re making available Siren 10 beta 1.
(Yes, we skipped a few numbers, let’s see why)
What’s new in Siren 10 (beta 1)
Multiple back-end support, no ETL needed
Chances are not all your useful data is in Elasticsearch. The good news is that Siren 10 introduces the ability to work with data where it is already inside your organization. No ETL needed!
The Siren Federate layer can now map JDBC datasources as “Virtual Indexes”, which can be visualized and relationally analyzed exactly as it was previously possible with Elasticsearch only data.
You can, for example:
- Start from a Dashboard containing “Reviews” and do a full text search query (data on Elasticsearch)
- …relationally pivot to the list of items that were purchased (data on Oracle DB)
- … to then see which stores (locations) were involved (this time data is on Cloudera Impala – complex architecture indeed!)
- … and so on. Also in link analysis mode
All seamlessly, without the analyst having to worry about where the data is, with “relational filtering” buttons automatically generated by the underlying ontology (data model) connecting your records across indexes and across systems. In a picture:
Analytics and Join pushdown: your Big Data infrastructure at work
Siren federation technology makes full use of your existing DBs and Big Data infrastructure with the ability to translate analytic and join queries to the language supported by your existing DB or Big Data infrastructure and (transparently) resorting to in-Siren-cluster-nodes memory joins just as a very last resource.
This means that for most analysis the performance and scalability will be as good as your Native SQL system.
Backends supported out of the box in Beta 1 include:
- Microsoft SQL Server 2017
- Sybase ASE 15.7+
- Oracle 12c+
- Spark SQL 2.2+
Need more? Let us know.
Fully distributed in-cluster Elasticsearch joins: double the nodes, double the performance
With the new Siren Federate fully cluster distributed Elasticsearch join technology performance now scales with the size of the cluster. Implementing some of the most advanced and recent distributed computation algorithms, Siren Federate reaches a whopping 90% average efficiency in hardware usage as your cluster scales either in nodes but even in number of CPUs per node.
The following picture, from a forthcoming blog post specifically on this, summarizes the results of our join benchmark as the Elasticsearch cluster scales in nodes and cores.
Note that the Siren Federate distributed technology will also be made available as a standalone Elasticsearch plugin for use in your applications.
Improved relational model: OWL and Entity Identifiers
Siren now let users edit what is an OWL ontology to describe data links (a bit of theory 🙂 )
Making Siren we’ve always been quite aware of data graph formats like RDF, and the OWL dictionary to represent Data Models (called Ontologies when they also include more advanced restrictions in the class relationships).
This said, the Siren Platform stays very clear from asking anyone to convert their data into RDF and, instead, simply provides a layer where the relationships between your existing indexes (anything that you connect to Federate, so from DBs to indexes in Elasticsearch) are stored and coordinates the joins and analytic queries on your data in the format that it already has.
On the other hand, if instead of using RDF/OWL as a “record/field” format we simply use it to describe the relationships between your existing data then it makes a lot of sense and Siren 10 does this (under the hood) as it allows the users to define their datamodel using (internally) the OWL language (side bonuses: you can also then edit it with tools outside Siren, it’s easy to convert other datamodels into a well-known format, etc).
What’s new in practice
The first thing is that the new datamodel introduces the concept of “Entity Identifier” (EID).
Previously, in Siren, to be able to join between two indexes you had to specify that there existed a direct connection between them. E.g. if you had 2 logs which could be connected by the IP value, you would have specified a direct connection, thus creating a relational button between the two.
But what if you have many indexes having IPs (or anything else: MAC Addresses, UserIDs, URLs, Port Numbers, Transaction IDs, etc) that are in multiple roles (Source IP, Destination IP) and it might be useful to join from any of these roles and indexes to any other role and index?
Our new relational model allows this. Automatically.
For example, in this configuration, we have defined the IP concept as an EID and tied it in with other indexes where “IPs” show up. For each connection we specify the name of the relation that is what is the role of the IP in that index (is it the “source” IP in that log or the “blocked” IP?)
Just with this configuration, you can now have buttons that explore the ontology and show you all possible matches across your data. At this point, one click and you’ll be pivoting to the target dashboard, with the right relational filter applied.
E.g. to go see the records of the Apache logs where the Agent IP matches the Destination IP in the current log, just navigate from “Destination IP” as per picture:
EIDs are obviously great for anything that identifies “things” across indexes but does not have an index per se (otherwise you’d pivot to it). Things like Phone Numbers, but also Tags, Labels from standalone indexes, etc. In practice a single excel spreadsheet can be seen as a “knowledge graph” if you consider labels as identifiers that interconnect records. Here is an example with EIDs (Tissue and Organism) in a Life Science deployment.
What’s coming next?
In the next weeks you’ll see small updated versions of beta 1 (builds).
We’re aiming at a full release in 4 to 6 weeks which will include:
Big Data aggregations on graph edges. Millions of logs summarized instantaneously on the graph browser, with automatic detection of all possible fast aggregations across indexes.
High-Availability alerting. Siren Alert (previously Sentinl) gets distributed high-availability alerting capabilities and is capable of generating alerts originating from DBs or any other JDBC source too (with cross joins and all).
After this, some of the features coming in the next releases include:
Optional 1 click caching/ingestion. There are many case where it is preferable to ingest/cache the remote data locally. With 1 click caching/ingestion, selected JDBC data will be transparently moved to the local nodes to enjoy, for example, no pressure on the remote DBs and/or to have the best in class search thanks to our Elasticsearch based nodes.
high performance molecular search And other components specifically for the Life science edition. Watch for a blog post on this very soon.
Where to find it?
We have started today making Beta 1 available to early adopters, but come later next week as we’ll roll it out on our support portal. First ones to know when will be those on our mailing list, so sign up (bottom of the page)!
Meet us at Strata in March
Siren will be exhibiting at Strata Data Conference, San Jose (California), March 6th-8th 2018.
See you there.
Also published on Medium.