What Other Data Distributions Can Be Used for Outlier Detection in Python?

  • Context: Python 
  • Thread starter Thread starter ipmax
  • Start date Start date
  • Tags Tags
    Data Distribution
Click For Summary

Discussion Overview

The discussion revolves around identifying suitable data distributions for outlier detection in Python, particularly in the context of generating datasets that simulate current variables. Participants explore various distribution options beyond sinusoidal, including their characteristics and applicability.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant seeks examples of distributions like sine, Gaussian, and binomial for outlier detection.
  • Another suggests Zipf and Poisson distributions as potential options.
  • A participant mentions the usefulness of the scipy.stats module for testing various distributions against data.
  • There is a discussion about the appropriateness of different distributions for current variables, with sinusoidal being the initial choice.
  • One participant proposes plotting data to visually assess which distributions might fit best.
  • A later reply clarifies that the goal is to generate a current dataset from a specific distribution and introduce random outliers for detection purposes.
  • Another participant suggests that the choice of distribution may depend on the device being simulated, mentioning that digital currents could resemble the derivative of a square wave.
  • Exponential and sinc distributions are proposed as potential candidates for current datasets, with a question raised about the real-world applicability of sinc currents.

Areas of Agreement / Disagreement

Participants express various viewpoints on suitable distributions for outlier detection, with no consensus reached on a definitive set of distributions. The discussion remains open-ended with multiple competing suggestions.

Contextual Notes

Participants express uncertainty regarding the best distribution for current datasets and the appropriateness of certain distributions for outlier detection. The discussion includes references to specific functions within the scipy.stats module, but no specific recommendations are settled upon.

Who May Find This Useful

Individuals interested in data analysis, particularly in the context of outlier detection and statistical modeling in Python, may find this discussion relevant.

ipmax
Messages
4
Reaction score
0
Hi folks...I am trying to use outlier detection techniques on python...I checked my algorithm for sinusoidal distribution of data. I need to develop some other kind of distribution to check the working of the algorithm I have used. Can you give me examples of some other known distribution like sine, gaussian, binomial etc...which I can use for outlier detection?

IPMAX
 
Technology news on Phys.org
Zipf, Poisson
 
What type of data do you have/what do you expect to see?

scipy.stats has a whole bunch of distributions you can test against and a bunch of tests for trying to figure out how your data is distributed.
 
I saw the scipy.stats module...I am confused with which function would be appropriate...I am dealing with the currents...what would be a good distribution for a current variable...I tried sinusoidal (thats what I could come up with) :P
 
The way I've done it is plot my data and then see which distributions it seems to look like. If you plot yours and post the graph, it may be easier to give you suggestions. Right now, I'd guess that sinusoidal does sound about right.
 
you misunderstood my post...My whole point is to generate a current dataset from a certain distribution and mix random outliers in it and detect the outliers...I have tried sinusoidal distribution as a possible dataset and tried the detection. Now, I need to devise some other distribution of dataset. I just know sinusoidal current dataset...what else could be a data distribution that would be favorable to called current dataset?
 
ipmax said:
I just know sinusoidal current dataset...what else could be a data distribution that would be favorable to called current dataset?

Depends on the device/whatever you're trying to simulate: digital currents will likely be the derivative of a square wave (which itself is a collection of impulse functions), mosfets look sort of like http://en.wikipedia.org/wiki/Current%E2%80%93voltage_characteristichttp://en.wikipedia.org/wiki/Current%E2%80%93voltage_characteristic , etc. You may need outlier detection for some distros and not others.
 
Last edited by a moderator:
what about exponential current and sinc? Is sinc current probable in real world?
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 133 ·
5
Replies
133
Views
12K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
7K