Anomaly detection in cybersecurity

In summary, Brian Powell has been working in cybersecurity for over 20 years, and is currently investigating data anomalies using Splunk. He is also working on a machine learning project to detect data exfiltration.
  • #1
stoomart
392
132
This question is primarily directed to @bapowell, but I encourage others to please add any thoughts or suggestions.

Brian, I just saw your bio while reading the CMB primers, and thought you may have some ideas on cybersecurity data analytics.

Some background: I've been in cybersecurity since 2000, and have been using Splunk for anomaly detection and investigation for just over a year now. Instead of opting for Splunk's SIEM package, I've been developing our anomaly detection logic from scratch, which has evolved over time to include any combination of the following:

volume (count)
commonality (count distinct entities)
frequency (relative time comparison)
variance (entity or population z-score)​

Am I missing any ways of looking at the data?

Variance detection was the last major evolution in my efforts, and now I am looking for the next one. I will say my reseach and testing in machine learning was a bit of a dud, since I could only ever achieve ~80% accuracy instead of high 90s like I was hoping for, but this may have been a limitation of my abilities.
 
Computer science news on Phys.org
  • #2
Hi there. What kinds of events/activities are you analyzing? What is an example of an "entity"? My experience so far has been that the necessary data and interesting features are very much determined by the specific problem you're trying to tackle. I hesitate to make a generic list of metrics for this reason.

What kinds of problems have you tried to solve with machine learning? What's your background, if you don't mind my asking?
 
Last edited:
  • #3
bapowell said:
Hi there. What kinds of events/activities are you analyzing?
Logs from web servers, perimeter security devices (fw, waf, ips), internal netflow, server logons, database access/audit/alert, endpoint security, software/hardware installs, and others in line with the CIS top 20 controls.
What is an example of an "entity"?
This would be the actor in an event such as an internal user/machine, or external client.

My experience so far has been that the necessary data and interesting features are very much determined by the specific problem you're trying to tackle. I hesitate to make a generic list of metrics for this reason.
I agree, all my triggers are built around the individual data variables and what kind of anomaly I'm interested in. Sorry for the generic nature of this question, I'm hoping I've missed something obvious, but have a sense machine learning is the only way to really jump forward from this point.
What kinds of problems have you tried to solve with machine learning?
Most of my experience with machine learning was training DLP to identify proprietary source code files unique to the company running it, this product worked very well. My own efforts were focused on identifying anomalies in network behavior from netflow data using Splunk's machine learning engine.
What's your background, if you don't mind my asking?
I got started in security in high school with a major security vendor (big yellow), supported and administered every type of security product you can think of, got my CISSP somewhere in there, and am now the technical lead on a security team of 4 at an independent state agency.
 
  • #4
One project I'm working on currently is using a learning algorithm to detect data exfiltration. The data that we're feeding to the classifier are suitably transformed netflows; it's currently not clear which features we need to sufficiently (and minimally) characterize a given flow record, but I'm hoping to make it port/protocol agnostic and perhaps independent of actual amounts of traffic per connection. Preliminary results are promising, but a big part of the challenge is realistically modeling the exfiltration.
 
  • #5
The features I found most helpful in machine learning were connection count, upload bytes, and download bytes. My variance triggers calculate these three values for each entity (user, client) or object (port, webhost) in their target data set by time buckets (1h, 6h, 1d); the latest bucket for each entity/object is then compared to previous buckets to identify sigma spikes in any of the calculated fields.
 
Last edited:

What is anomaly detection in cybersecurity?

Anomaly detection in cybersecurity is a technique used to identify unusual or abnormal behavior in a computer system or network. This can include activities such as unauthorized access attempts, unusual network traffic patterns, or changes in system behavior that may indicate a security breach.

How does anomaly detection work?

Anomaly detection uses machine learning algorithms and statistical analysis to compare current system activity to a baseline of normal behavior. Any deviations from this baseline are flagged as potential anomalies and further investigated by cybersecurity experts.

Why is anomaly detection important in cybersecurity?

Anomaly detection is important in cybersecurity because it can help identify and prevent potential security threats before they cause harm. By detecting abnormal behavior, it can alert cybersecurity teams to potential attacks or vulnerabilities that need to be addressed.

What are the challenges of anomaly detection in cybersecurity?

One of the main challenges of anomaly detection in cybersecurity is the high rate of false positives, where normal behavior is flagged as anomalous. This can lead to a large number of false alarms, making it difficult for cybersecurity teams to prioritize and investigate potential threats.

How can businesses implement anomaly detection in their cybersecurity strategy?

Businesses can implement anomaly detection in their cybersecurity strategy by using specialized software or tools that utilize machine learning and artificial intelligence to detect abnormal behavior. They can also establish a baseline of normal activity and regularly monitor and analyze system data for any deviations.

Similar threads

  • Computing and Technology
Replies
7
Views
831
  • STEM Educators and Teaching
Replies
5
Views
664
Replies
10
Views
2K
Replies
42
Views
3K
  • Electrical Engineering
Replies
18
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Beyond the Standard Models
Replies
2
Views
2K
Replies
23
Views
3K
  • Classical Physics
Replies
3
Views
2K
Back
Top