Python What Other Data Distributions Can Be Used for Outlier Detection in Python?

AI Thread Summary
The discussion centers on using outlier detection techniques in Python, specifically exploring different data distributions for current variables. The original poster has tested a sinusoidal distribution and seeks additional distribution examples for generating datasets with outliers. Participants suggest various distributions, including Zipf, Poisson, and exponential, while emphasizing the importance of understanding the nature of the data being modeled. The scipy.stats module is recommended for testing distributions, and visualizing data through plots is advised to identify suitable distributions. There is also a mention of using digital currents and the characteristics of specific devices, highlighting that the choice of distribution may depend on the context of the current being simulated. The feasibility of using a sinc function for current modeling is questioned, indicating a need for further exploration of real-world applicability.
ipmax
Messages
4
Reaction score
0
Hi folks...I am trying to use outlier detection techniques on python...I checked my algorithm for sinusoidal distribution of data. I need to develop some other kind of distribution to check the working of the algorithm I have used. Can you give me examples of some other known distribution like sine, gaussian, binomial etc...which I can use for outlier detection?

IPMAX
 
Technology news on Phys.org
Zipf, Poisson
 
What type of data do you have/what do you expect to see?

scipy.stats has a whole bunch of distributions you can test against and a bunch of tests for trying to figure out how your data is distributed.
 
I saw the scipy.stats module...I am confused with which function would be appropriate...I am dealing with the currents...what would be a good distribution for a current variable...I tried sinusoidal (thats what I could come up with) :P
 
The way I've done it is plot my data and then see which distributions it seems to look like. If you plot yours and post the graph, it may be easier to give you suggestions. Right now, I'd guess that sinusoidal does sound about right.
 
you misunderstood my post...My whole point is to generate a current dataset from a certain distribution and mix random outliers in it and detect the outliers...I have tried sinusoidal distribution as a possible dataset and tried the detection. Now, I need to devise some other distribution of dataset. I just know sinusoidal current dataset...what else could be a data distribution that would be favorable to called current dataset?
 
ipmax said:
I just know sinusoidal current dataset...what else could be a data distribution that would be favorable to called current dataset?

Depends on the device/whatever you're trying to simulate: digital currents will likely be the derivative of a square wave (which itself is a collection of impulse functions), mosfets look sort of like http://en.wikipedia.org/wiki/Current%E2%80%93voltage_characteristichttp://en.wikipedia.org/wiki/Current%E2%80%93voltage_characteristic , etc. You may need outlier detection for some distros and not others.
 
Last edited by a moderator:
what about exponential current and sinc? Is sinc current probable in real world?
 
Last edited:

Similar threads

Replies
8
Views
2K
Replies
10
Views
5K
Replies
11
Views
4K
Replies
4
Views
2K
Replies
3
Views
1K
Replies
1
Views
6K
Back
Top