Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Anomaly detection in cybersecurity

  1. Jul 27, 2017 #1
    This question is primarily directed to @bapowell, but I encourage others to please add any thoughts or suggestions.

    Brian, I just saw your bio while reading the CMB primers, and thought you may have some ideas on cybersecurity data analytics.

    Some background: I've been in cybersecurity since 2000, and have been using Splunk for anomaly detection and investigation for just over a year now. Instead of opting for Splunk's SIEM package, I've been developing our anomaly detection logic from scratch, which has evolved over time to include any combination of the following:

    volume (count)
    commonality (count distinct entities)
    frequency (relative time comparison)
    variance (entity or population z-score)​

    Am I missing any ways of looking at the data?

    Variance detection was the last major evolution in my efforts, and now I am looking for the next one. I will say my reseach and testing in machine learning was a bit of a dud, since I could only ever achieve ~80% accuracy instead of high 90s like I was hoping for, but this may have been a limitation of my abilities.
  2. jcsd
  3. Jul 28, 2017 #2


    User Avatar
    Science Advisor

    Hi there. What kinds of events/activities are you analyzing? What is an example of an "entity"? My experience so far has been that the necessary data and interesting features are very much determined by the specific problem you're trying to tackle. I hesitate to make a generic list of metrics for this reason.

    What kinds of problems have you tried to solve with machine learning? What's your background, if you don't mind my asking?
    Last edited: Jul 28, 2017
  4. Jul 28, 2017 #3
    Logs from web servers, perimeter security devices (fw, waf, ips), internal netflow, server logons, database access/audit/alert, endpoint security, software/hardware installs, and others in line with the CIS top 20 controls.
    This would be the actor in an event such as an internal user/machine, or external client.

    I agree, all my triggers are built around the individual data variables and what kind of anomaly I'm interested in. Sorry for the generic nature of this question, I'm hoping I've missed something obvious, but have a sense machine learning is the only way to really jump forward from this point.
    Most of my experience with machine learning was training DLP to identify proprietary source code files unique to the company running it, this product worked very well. My own efforts were focused on identifying anomalies in network behavior from netflow data using Splunk's machine learning engine.
    I got started in security in high school with a major security vendor (big yellow), supported and administered every type of security product you can think of, got my CISSP somewhere in there, and am now the technical lead on a security team of 4 at an independent state agency.
  5. Jul 28, 2017 #4


    User Avatar
    Science Advisor

    One project I'm working on currently is using a learning algorithm to detect data exfiltration. The data that we're feeding to the classifier are suitably transformed netflows; it's currently not clear which features we need to sufficiently (and minimally) characterize a given flow record, but I'm hoping to make it port/protocol agnostic and perhaps independent of actual amounts of traffic per connection. Preliminary results are promising, but a big part of the challenge is realistically modeling the exfiltration.
  6. Jul 29, 2017 #5
    The features I found most helpful in machine learning were connection count, upload bytes, and download bytes. My variance triggers calculate these three values for each entity (user, client) or object (port, webhost) in their target data set by time buckets (1h, 6h, 1d); the latest bucket for each entity/object is then compared to previous buckets to identify sigma spikes in any of the calculated fields.
    Last edited: Jul 29, 2017
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted