Striking a Balance Between Data & Cost

 

The LimaCharlie agents can generate a lot of data and by default it does not all get sent back. This article will cover the different mechanisms available to select the type of data you want, where it can be filtered down and how to best build detections with this in mind.

Where is the data?

There are several control points for data in LimaCharlie.

  1. The agent is where all events are generated. A subset of the events are sent up to the cloud, the others are stored locally for a short period of time.
  2. The cloud receives the events from the agents and processes them with various analytic systems (like D&R rules).
  3. The outputs are simply the various forwarding locations you've selected for your data.

How does the agent deal with events?

The agent has a list of events that need to be sent to the cloud. This list is dynamic and controlled by the various "exfil_" commands. By issuing an "exfil_add X", you tell the agent to start sending events of type X to the cloud. You can optionally set a time expiration per event type. Doing an "exfil_get" gives you a list of events sent back. A default "exfil list" is automatically sent to your agents.

When an event is generated in the sensor, if it's in the "exfil" list, it's sent to the cloud. If it is NOT in that list, it is kept in memory. For how long? That depends, the capacity is set to a maximum of about 5000 events or 10MB of memory, so a busier host will result in a shorter buffer.

These events in memory are not lost. You can access them mainly via the "history_dump" command. When the agent receives it, it will sent back to the cloud either the entire content of this memory, or if specified all events in memory of a specific typFilter Events After Analysis

Now that the cloud has received events, it can use them to perform analytics. This is where each event (and stream of events) goes through all the Detection & Response rules you've setup. The next step is to forward those events to the outputs you've configured (like a Syslog endpoint for example).

Many users of LimaCharlie want to apply D&R rules onto data, but not necessarily have that data forwarded to their outputs. Storage can be expensive and although data from the agent to the cloud is very efficiently transferred, the data from the cloud to your output is much more verbose (and friendly) JSON. This can result in large amounts of data. This JSON compresses EXTREMELY well, but it can still be cumbersome to receive it all.

The solution to this problem is to use the blacklist/whitelist parameters in each the output. Adding an event type in the blacklist tells the output NOT to forward any events of that type while having an event type in the whitelist means that ONLY events of that type should be forwarded.

How Should I Select the Relevant Data?

The following is a general rule of thumb:

If you have no interest in a particular event type, both for storage as well as analytics, make sure it's not in your "exfil list" by using a D&R rule on event CONNECTED that sends the relevant "exfil_del" command to the agent. This stops it at the source.

If you want to use the event type in detections, similarly use a D&R rule to send an "exfil_add" (if the event type is not already present by default) to your agent.

Next, use the blacklist in the outputs for the events which you don't want to store yourself. This will make sure they don't get forwarded to you.

Of course most SOCs have somewhat more complex setups. For example many will send high-value events to highly-available storage like Splunk or ELK, and will send all other events to colder storage like Amazon S3. This helps to get the relevant data for operational purposes while keeping the cost down.

How do Detections Interact with Events?

The final point of discussion is around the use of events by D&R rules. Some of the events generated by the agent are extremely verbose and it's unlikely you want to send them to the cloud at all times because of bandwidth usage. This is where the power of the D&R rules comes through.

It's often advantageous to write D&R rules in two stages. The first stage is a rule that uses the default events that are always sent to the cloud. Its role is not to determine whether something is "bad" beyond all doubt, but rather to operate as a filter.

This first stage usually uses a "report" Response clause with "publish" to false, indicating the alert is not meant for human consumption. The other Response elements of these first stage rules is to issue commands to the agent to gather the extra data necessary. For example, it may issue a "history_dump FILE_CREATE" to retrieve all recent file creation events from the agent. If a clear state needs to be set clearly between the first and second stage, it is common to apply a Tag to the agent, like "suspicious-exec", which is used by the second stage to determine which agents the rule should be evaluated against.

The second stage rule's job is to look for those file creation events (for example) occurring on agents with the "suspicious-exec" tag for whatever specifically bad behavior is targeted. If found, then the Response component uses the "report" action to trigger an alert.

Conclusion

There are many strategies around data in LimaCharlie. It's easy to get started without having to worry about, but there is also a lot of power behind it to help optimize with MSSPs' infrastructure.

We're always happy to discuss these strategies with you so drop us a line!

Happy hunting.