Anomaly detection in cybersecurity

Click For Summary

Discussion Overview

The discussion revolves around anomaly detection in cybersecurity, specifically focusing on data analytics techniques and machine learning applications. Participants share their experiences and methodologies for identifying anomalies in various types of cybersecurity data, including logs and network flows.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant discusses their experience using Splunk for anomaly detection, detailing their logic development that includes metrics such as volume, commonality, frequency, and variance.
  • Another participant inquires about the types of events being analyzed and emphasizes that the necessary data and features depend on the specific problem being addressed.
  • A participant lists the types of logs they analyze, including web server logs and internal network flows, and defines "entity" as the actor in an event, such as a user or machine.
  • One participant mentions their project involving a learning algorithm for detecting data exfiltration, noting challenges in feature selection and modeling.
  • Another participant shares their experience with machine learning features, highlighting connection count, upload bytes, and download bytes as useful metrics for identifying anomalies.

Areas of Agreement / Disagreement

Participants express varying perspectives on the metrics and features relevant to anomaly detection, with no consensus on a definitive approach or solution. The discussion remains open-ended with multiple competing views on the effectiveness of different methodologies.

Contextual Notes

Participants acknowledge limitations in their machine learning efforts, including challenges in achieving high accuracy and the dependency on specific problem contexts for feature selection.

stoomart
Messages
392
Reaction score
132
This question is primarily directed to @bapowell, but I encourage others to please add any thoughts or suggestions.

Brian, I just saw your bio while reading the CMB primers, and thought you may have some ideas on cybersecurity data analytics.

Some background: I've been in cybersecurity since 2000, and have been using Splunk for anomaly detection and investigation for just over a year now. Instead of opting for Splunk's SIEM package, I've been developing our anomaly detection logic from scratch, which has evolved over time to include any combination of the following:

volume (count)
commonality (count distinct entities)
frequency (relative time comparison)
variance (entity or population z-score)​

Am I missing any ways of looking at the data?

Variance detection was the last major evolution in my efforts, and now I am looking for the next one. I will say my reseach and testing in machine learning was a bit of a dud, since I could only ever achieve ~80% accuracy instead of high 90s like I was hoping for, but this may have been a limitation of my abilities.
 
Computer science news on Phys.org
Hi there. What kinds of events/activities are you analyzing? What is an example of an "entity"? My experience so far has been that the necessary data and interesting features are very much determined by the specific problem you're trying to tackle. I hesitate to make a generic list of metrics for this reason.

What kinds of problems have you tried to solve with machine learning? What's your background, if you don't mind my asking?
 
Last edited:
bapowell said:
Hi there. What kinds of events/activities are you analyzing?
Logs from web servers, perimeter security devices (fw, waf, ips), internal netflow, server logons, database access/audit/alert, endpoint security, software/hardware installs, and others in line with the CIS top 20 controls.
What is an example of an "entity"?
This would be the actor in an event such as an internal user/machine, or external client.

My experience so far has been that the necessary data and interesting features are very much determined by the specific problem you're trying to tackle. I hesitate to make a generic list of metrics for this reason.
I agree, all my triggers are built around the individual data variables and what kind of anomaly I'm interested in. Sorry for the generic nature of this question, I'm hoping I've missed something obvious, but have a sense machine learning is the only way to really jump forward from this point.
What kinds of problems have you tried to solve with machine learning?
Most of my experience with machine learning was training DLP to identify proprietary source code files unique to the company running it, this product worked very well. My own efforts were focused on identifying anomalies in network behavior from netflow data using Splunk's machine learning engine.
What's your background, if you don't mind my asking?
I got started in security in high school with a major security vendor (big yellow), supported and administered every type of security product you can think of, got my CISSP somewhere in there, and am now the technical lead on a security team of 4 at an independent state agency.
 
One project I'm working on currently is using a learning algorithm to detect data exfiltration. The data that we're feeding to the classifier are suitably transformed netflows; it's currently not clear which features we need to sufficiently (and minimally) characterize a given flow record, but I'm hoping to make it port/protocol agnostic and perhaps independent of actual amounts of traffic per connection. Preliminary results are promising, but a big part of the challenge is realistically modeling the exfiltration.
 
The features I found most helpful in machine learning were connection count, upload bytes, and download bytes. My variance triggers calculate these three values for each entity (user, client) or object (port, webhost) in their target data set by time buckets (1h, 6h, 1d); the latest bucket for each entity/object is then compared to previous buckets to identify sigma spikes in any of the calculated fields.
 
Last edited:

Similar threads

  • · Replies 42 ·
2
Replies
42
Views
4K
  • · Replies 5 ·
Replies
5
Views
4K
Replies
10
Views
5K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
9K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K