Runtime Fields in a Cyber Threat Investigation - A Detailed Walkthrough in Siren Federate 30

There has been a spike in state-sponsored cyber attacks, with a 300% increase targeting users in NATO countries, compared to 2020.

Cyber threats are on the rise, and it is becoming increasingly challenging to protect critical infrastructure and data. The EU Cybersecurity Agency (ENISA) recently issued an alert about several Advanced Persistent Threat (APTs) actors conducting malicious cyber activities against businesses and governments in the EU and according to the latest data from Google, there has been a spike in state-sponsored cyber attacks, with a 300% increase targeting users in NATO countries, compared to 2020.

Cybersecurity investigators are responsible for analyzing massive amounts of data to identify patterns, detect anomalies, and uncover malicious activity. This requires advanced techniques, tools, and skill sets, as well as the ability to explore and navigate complex data.

One of the most significant challenges investigators face is dealing with data that is not in the form and shape that they need to explore it. In many cases, the data is spread across multiple indices or sources, making it challenging to correlate and analyze. In other cases, the data schema might not match the investigators’ needs, requiring them to transform it to enable further analysis. This transformation can be a time-consuming and resource-intensive process, as it often involves reindexing the data to facilitate exploration.

However, with the introduction of the runtime fields feature in Siren Federate 30, investigators can now transform and join data at runtime, without the need to modify the original data source. This new capability enables investigators to specify a new runtime field and join on it in the same query, even for indices that cannot change their mapping. This feature is particularly useful for cases where time and resources are limited, and the original data source cannot be modified.

In this detailed walkthrough, we showcase a real-world use case where we demonstrate how to use the runtime fields feature to join composite keys and uncover vital information about a cyber attack. We show how a security specialist can explore logs of information about all Linux machines in an enterprise network to detect malicious activity. By using the runtime field feature, the security specialist can join data from multiple indices and sources, without the need to reindex or modify the original data. This enables the security specialist to quickly and easily identify compromised hosts and the processes that were running at the time of the attack, allowing them to respond effectively and mitigate any damage.

Overall, the runtime fields feature in Siren Federate 30 is a powerful tool for cybersecurity investigators, allowing them to transform and join data at runtime, without the need to modify the original data source. This new capability significantly enhances the ability of investigators to detect and respond to cyber threats quickly and effectively, making it an essential tool in the fight against cybercrime.

Improving support for runtime fields

Starting with version 28, Siren Federate has expanded its support for runtime fields, allowing users to specify and use them at query time. This enhancement has a significant impact on the analyst or user experience, as it streamlines the data exploration process and fosters more dynamic and flexible analysis.

In previous versions, using runtime fields in a query required administrative intervention, as the mapping needed to be updated globally for all users. This often led to delays and reduced efficiency, as analysts had to wait for administrators to make the necessary changes before they could proceed with their analysis.

With the ability to specify new runtime fields and join on them in the same request, analysts can now efficiently create and use custom fields without impacting the original data source or requiring any global changes to the mapping. This promotes a more agile and responsive investigative process, as analysts can quickly adapt their queries to meet evolving requirements or focus on specific aspects of the data.

Known limitations

While Siren Federate’s runtime fields feature offers significant benefits, it is important to be aware of certain limitations that may impact its usage in specific scenarios. These limitations primarily involve the compatibility of join strategies with runtime fields.

Siren Federate supports three join strategies: HASH_JOIN, BROADCAST_JOIN, and INDEX_JOIN. The HASH_JOIN and BROADCAST_JOIN strategies are fully compatible with runtime fields, allowing users to leverage the feature without any issues. However, the INDEX_JOIN strategy presents a constraint when it comes to runtime fields.

With the INDEX_JOIN strategy, joins involving runtime fields are only supported on the right side of the join. This limitation arises due to the nature of runtime fields, which by definition, do not have indexed data. As a result, analysts must take this constraint into account when planning their queries and designing their join operations.

Use case: Joining composite keys

In this use case, we demonstrate how you can use runtime fields at query time to perform a join on composite keys. A composite key is a join key that is created by concatenating values from two fields.

The scenario

An enterprise has experienced a cyber attack. The enterprise stores logs of information about all Linux machines in the enterprise network. This information is stored across several Elasticsearch indices. They are separate because they are collected by different monitoring systems. The indices used in this scenario are as follows:

firewall_logs: containing TCP connections to internal hosts.
process: contains Linux process information, such as PID, hostname, and user.
tcp_connections: contains the opened ports of processes.

The enterprise is informed that there is an attack on the network. The attack consisted of installing malicious code on the target machines, and when executing, ports were opened for further connection. The enterprise hired a security specialist to perform the forensic analysis of the attack. The security specialist needs to know which hosts have been compromised and what processes they were running.

The solution

Firstly, declare three indices. Declare one index for firewall_logs, another index for process, and declare an index for tcp_connections. Then, for the purpose of the use case, add a dummy dataset into the indices.

You can easily declare the indices in Siren Investigate or you can use DevTools.

Declare the index for firewall_logs containing at least:

Source IP
Destination IP
Source port
Destination port

You can use DevTools or simply REST request, to declare the index as follows:

PUT /firewall_logs
{
   "mappings": {
       "properties": { 
         "src_ip":      { "type": "keyword" },
         "dst_ip":      { "type": "keyword" },
         "src_port":    { "type": "keyword" },
         "dst_port":    { "type": "keyword" }
        }
    }
}

Declare the index for process containing at least:

PID
Hostname
Process owner
Process name

You can use DevTools to declare the mapping of process as follows:

PUT /process
{
   "mappings": {
       "properties": { 
         "pid":      { "type": "keyword" },
         "hostname": { "type": "keyword" },
         "user":     { "type": "keyword" },
         "process":  { "type": "keyword" }
        }
    }
}

Declare an index for tcp_connections containing:

PID
Hostname
The hostname IP
Port this process opens

***Declaring the mapping for tcp_connections***

You can use DevTools to declare the mapping for tcp_connections as follows:

PUT /tcp_connections
{
   "mappings": {
       "properties": { 
         "pid":      { "type": "keyword" },
         "hostname": { "type": "keyword" },
         "ip":       { "type": "keyword" },
         "port":     { "type": "keyword" }  
        }
    }
}

Use DevTools to add a dummy dataset into our indices as follows:

POST /_bulk
{ "index" : { "_index" : "process" } }
{"pid": "24", "hostname": "valkyrie.syren.eo", "user": "root", "process": "apache"}
{ "index" : { "_index" : "process" } }
{"pid": "15", "hostname": "valkyrie.syren.eo", "user": "root", "process": "ls" }
{ "index" : { "_index" : "process" } }
{"pid": "58", "hostname": "vishnu.syren.eo", "user": "admin", "process": "ls"}
{ "index" : { "_index" : "process" } }
{"pid": "32", "hostname": "vishnu.syren.eo", "user": "admin", "process": "ls"}


POST /_bulk
{ "index" : { "_index" : "tcp_connections" } }
{"pid": "24", "hostname": "valkyrie.syren.eo", "port":"88", "ip":"171.211.213.120" }
{ "index" : { "_index" : "tcp_connections" } }
{"pid": "24", "hostname": "valkyrie.syren.eo", "port":"80", "ip": "171.211.213.120"}
{ "index" : { "_index" : "tcp_connections" } }
{"pid": "15", "hostname": "valkyrie.syren.eo", "port":"32", "ip":"171.211.213.120" }
{ "index" : { "_index" : "tcp_connections" } }
{"pid": "15", "hostname": "valkyrie.syren.eo", "port":"24", "ip": "171.211.213.120"}
{ "index" : { "_index" : "tcp_connections" } }
{"pid": "58",  "hostname": "vishnu.syren.eo", "port":"22", "ip":"129.11.189.37"}
{ "index" : { "_index" : "tcp_connections" } }
{"pid": "58",  "hostname": "vishnu.syren.eo", "port":"24", "ip":"129.11.189.37"}
{ "index" : { "_index" : "tcp_connections" } }

POST /_bulk
{ "index" : { "_index" : "firewall_logs" } }
{"src_ip": "106.155.3.18", "dst_ip": "171.211.213.120", "dst_port": "32", "src_port":"4258"}
{ "index" : { "_index" : "firewall_logs" } }
{"src_ip": "106.155.3.18", "dst_ip": "129.11.189.37", "dst_port": "24", "src_port":"4149"}
{ "index" : { "_index" : "firewall_logs" } }
{"src_ip": "126.152.7.13", "dst_ip": "171.211.213.120", "dst_port": "80", "src_port":"4657"}
{ "index" : { "_index" : "firewall_logs" } }
{"src_ip": "156.135.9.34", "dst_ip": "129.11.189.37", "dst_port": "22", "src_port":"6587"}

While taking ownership of the existing logs, the security specialist tries to join existing firewall logs to Linux machines. Fortunately, they can use tcp_connections logs but if only the IP of the Linux machines is used to join, a single firewall log will be linked to many processes. We know that only one process can open a port on the Linux machine, so the forensic specialist uses a runtime field to make a single field combining the Linux machine IP and the port used for the connection. The security specialist creates a new runtime field combining dst_ip and dst_port for firewall_logs, and combining ip and port for tcp_connections. The built request is as follows:

POST siren/tcp_connections/_search
{
  "query": {
    "join": {
      "indices": ["firewall_logs"],
      "on": ["rt_tcp","rt_firewall"],
      "request": {
        "project": [
          {"field": {"name": "src_ip"}},
          {"field": { "name": "dst_port"}}
        ],
        "runtime_mappings": {
          "rt_firewall": {
            "type": "keyword",
            "script": {
              "source": "emit(doc[\"dst_ip\"].value + \":\" + doc[\"dst_port\"].value)"
            }
          }
        },
        "query": {
          "match_all": {}
        }
      }
    }
  },
  "runtime_mappings": {
    "rt_tcp": {
      "type": "keyword",
      "script": {
        "source": "emit(doc[\"ip\"].value + \":\" + doc[\"port\"].value)"
      }
    }
  },
  "script_fields": {
    "src_ip": {
      "script": "doc.src_ip"
    },
    "dst_port": {
      "script": "doc.dst_port"
    }
  },
  "fields": ["pid", "hostname", "port"]
}

We fetch the tcp_connections fields using the fields parameter. However, to fetch the fields from firewall_logs, we use script_fields and also add the project parameter for the firewall_logs fields in the request. The following is an extract of the response:

"hits": [
      {
        "_index": "tcp_connections",
        "_id": "...",
        "fields": {
          "src_ip": ["126.152.7.13"],
          "hostname": ["valkyrie.syren.eo"],
          "port": ["80"],
          "dst_port": ["80"],
          "pid": ["24"]
        }
      },
      {
        "_index": "tcp_connections",
        "_id": "...",
        "fields": {
          "src_ip": ["106.155.3.18"],
          "hostname": ["valkyrie.syren.eo"],
          "port": ["32"],
          "dst_port": ["32"],
          "pid": ["15"]
        }
      },
      {
        "_index": "tcp_connections",
        "_id": "...",
        "fields": {
          "src_ip": ["156.135.9.34"],
          "hostname": ["vishnu.syren.eo"],
          "port": ["22"],
          "dst_port": ["22"],
          "pid": ["58"]
        }
      },
      {
        "_index": "tcp_connections",
        "_id": "...",
        "fields": {
          "src_ip": ["106.155.3.18"],
          "hostname": ["vishnu.syren.eo"],
          "port": ["24"],
          "dst_port": ["24"],
          "pid": ["58"]
        }
      }
    ]

The security specialist found some suspicious connections, the src_ip 106.155.3.18 is connected to two host machines vishnu.syren.eo:24 and valkyrie.syren.eo:32 and with two different ports. We can reduce our request by filtering on the src_ip as follows:

POST siren/tcp_connections/_search
{
  "query": {
    "join": {
      "indices": ["firewall_logs"],
      "on": ["rt_tcp","rt_firewall"],
      "request": {
        "project": [
          {"field": {"name": "src_ip"}},
          {"field": { "name": "dst_port"}}
        ],
        "runtime_mappings": {...},
        "query": {
           "term": {"src_ip": {"value":"106.155.3.18"}}
        }
      }
    }
  },
  "runtime_mappings": {...},
  "script_fields": {...},
  "fields": ["pid", "hostname", "port"]
}

As expected, the result is the two suspicious connections only:

  "hits": [
      {
        "_index": "tcp_connections",
        "_id": "...",
        "fields": {
          "src_ip": ["106.155.3.18"],
          "hostname": ["valkyrie.syren.eo"],
          "port": ["32"],
          "dst_port": ["32"],
          "pid": ["15"]
        }
      },
      {
        "_index": "tcp_connections",
        "_id": "...",
        "fields": {
          "src_ip": ["106.155.3.18"],
          "hostname": ["vishnu.syren.eo"],
          "port": ["24"],
          "dst_port": ["24"],
          "pid": ["58"]
        }
      }
    ]

The security specialist now needs to identify what those processes are, so they want to join the process to tcp_connections and tcp_connections with firewalls_logs. The process index is now the target index. Joining with the IP is not enough because it will bring all processes from the machine. Joining with PID is not enough either because the same PID can belong to several IPs. Therefore, they decide to utilize the power of runtime fields again. Fortunately, we have everything we need in the logs.

The specialist now needs four runtime fields:

firewall_logs
- rt_firewall -> dst_ip:dst_port <- to join with tcp_connections
tcp_connections
- rt_tcp_ip_port -> ip:port <- to join with firewall_logs
- rt_tcp_hostname_pid -> hostname@pid <- to join with process
process
- rt_process_hostname_pid <- hostname@pid <- to join with tcp_connections

PUT siren/process/_search
{
  "query": {
    "join": {
      "indices": ["tcp_connections"],
      "on": ["rt_process_hostname_pid", "rt_tcp_hostname_pid"],
      "request": {
        "project": [
          {
            "field": {"name": "rt_tcp_hostname_pid", "alias": "rt_tcp_hostname_pid_proj"}
          },
          {
            "field": {"name": "rt_tcp_ip_port","alias": "rt_tcp_ip_port_proj"}
          },
          {
            "field": {"name": "rt_firewall_ip_port", "alias": "rt_firewall_ip_port_proj"}
          }
        ],
        "runtime_mappings": {
          "rt_tcp_hostname_pid": {
            "type": "keyword",
            "script": {
              "source": "emit(doc[\"hostname\"].value + \"@\" + doc[\"pid\"].value)"
            }
          },
          "rt_tcp_ip_port": {
            "type": "keyword",
            "script": {
              "source": "emit(doc[\"ip\"].value + \":\" + doc[\"port\"].value)"
            }
          }
        },
        "query": {
          "join": {
            "indices": ["firewall_logs"],
            "on": ["rt_tcp_ip_port", "rt_firewall"],
            "request": {
              "project": [
                {
                  "field": { "name": "rt_firewall", "alias": "rt_firewall_ip_port" }
                }
              ],
              "runtime_mappings": {
                "rt_firewall": {
                  "type": "keyword",
                  "script": {
                    "source": "emit(doc[\"dst_ip\"].value + \":\" + doc[\"dst_port\"].value)"
                  }
                }
              },
              "query": {
                "term": {
                  "src_ip": {"value": "106.155.3.18"}
                }
              }
            }
          }
        }
      }
    }
  },
  "runtime_mappings": {
    "rt_process_hostname_pid": {
      "type": "keyword",
      "script": {
        "source": "emit(doc[\"hostname\"].value + \"@\" + doc[\"pid\"].value)"
      }
    }
  },
  "fields": ["pid", "hostname", "user", "process"],
  "script_fields": {
    "rt_tcp_hostname_pid_proj": {
      "script": "doc.rt_tcp_hostname_pid_proj"
    },
    "rt_tcp_ip_port_proj": {
      "script": "doc.rt_tcp_ip_port_proj"
    },
    "rt_firewall_ip_port_proj": {
      "script": "doc.rt_firewall_ip_port_proj"
    }
  }
}

Like before, we can use the fields parameter in the request to fetch process values. To fetch projected fields coming from firewall_logs or tcp_connections logs, we can use the script_fields parameter. The following is an extract of the result:

 "hits": [
      {
        "_index": "process",
        "_id": "...",
        "fields": {
          "rt_tcp_hostname_pid_proj": ["valkyrie.syren.eo@15"],
          "hostname": ["valkyrie.syren.eo"],
          "process": ["ls"],
          "rt_tcp_ip_port_proj": ["171.211.213.120:32"],
          "rt_firewall_ip_port_proj": ["171.211.213.120:32"],
          "pid": ["15"],
          "user": ["root"]
        }
      },
      {
        "_index": "process",
        "_id": "...",
        "fields": {
          "rt_tcp_hostname_pid_proj": ["vishnu.syren.eo@58"],
          "hostname": ["vishnu.syren.eo"],
          "process": ["ls"],
          "rt_tcp_ip_port_proj": ["129.11.189.37:24"],
          "rt_firewall_ip_port_proj": ["129.11.189.37:24"],
          "pid": ["58"],
          "user": ["admin"]
        }
      }
    ]

The results reveal two instances of an “ls” process receiving TCP connections, which raises suspicion. This discovery prompts the investigator to broaden their inquiry, considering the possibility of other malicious processes operating within the network.

Without Siren Federate’s runtime fields feature, the analyst would have to go through a more time-consuming and resource-intensive process to achieve the same results. The key challenges they would face include:

Data Transformation and Re-indexing: The analyst would need to transform the data, possibly using an ETL (Extract, Transform, Load) process or writing custom scripts, to create composite keys by combining fields such as IP addresses and ports. After transforming the data, the analyst would need to re-index it to ensure that the newly created composite keys are indexed and available for querying. Both data transformation and re-indexing can be time-consuming processes, especially for large datasets, and may require additional computing resources.

Administrative Intervention: If the analyst wants to join data from multiple indices or sources without modifying the original data, they may need to request assistance from an administrator to update the mappings or make other changes to the data sources. This adds a dependency on administrative support and potentially delays the investigative process.

Inefficient Querying: Without the ability to specify runtime fields and join them on-the-fly, analysts might have to resort to less efficient query methods, such as performing multiple separate queries and then manually correlating the results. This approach can be cumbersome, slow, and error-prone, ultimately hindering the detection and response to potential threats.

In summary, without Siren Federate’s runtime fields feature, the process of identifying and investigating potential malicious processes would be significantly more difficult, time-consuming, and resource-intensive. The ability to specify runtime fields and join on them in the same request greatly streamlines the data analysis process, allowing analysts to efficiently and effectively uncover potential threats in the network.

Conclusion

In conclusion, the runtime fields feature in Siren Federate 30 provides a valuable solution to the pain points that investigators face when exploring and analyzing data for cyber threat investigations. The ability to join on composite keys at runtime without having to re-index the data is a significant benefit that enables investigators to quickly and easily identify patterns and potential breaches.

The core pain points that the runtime fields feature resolves include the need to transform and re-index data, which can be a time-consuming and resource-intensive process. The feature also enables investigators to join data from multiple indices and sources without the need for an administrator to intervene and update the mapping for changes to be accessible to all users. This makes it easier and more efficient for investigators to explore and navigate complex data, ultimately allowing them to identify and mitigate cyber threats more effectively.

Overall, the runtime fields feature in Siren Federate 30 is a powerful tool that offers significant benefits to investigators across a range of investigative intelligence scenarios, including law enforcement, digital forensics, corporate risk management, and OSINT, in addition to cyber threat investigations. By enabling investigators to transform and join data at runtime, without the need to modify the original data source, this feature significantly enhances their ability to explore and analyze complex data, ultimately allowing them to detect and respond to potential breaches quickly and effectively.