Register to reply 
How to Normalize a simulated dataset to fit the actual dataset? 
Share this thread: 
#1
Feb1912, 06:46 AM

P: 5

Can someone tell me how I can 'normalize' my dataset?
My scenario is as follows. I have two datasets, A (reallife data) and B (simulated data). Dataset A contains 4 numerical values (from an actual experiment): > E.g. 4 leaves from a binary tree each assigned with values 12.5,13.5,20.0 and 45.0. Dataset B contains 40 numerical values (from a simulation done by the computer): > E.g. 40 leaves from a total of 10 binary trees where each tree produces 4 leaves with randomly assigned numerical values for each leaf. For both datasets, I have computed their respective cumulative frequencies and plotted their respective charts using MS Excel e.g. [Cumulative frequencies of leaf values VS Leaf values]. This was to observe how similar/different are both of these data sets, where the smaller the vertical displacement between the two plots implies that both datasets are less different. I was instructed to normalize my data from Dataset B and replot the chart for a better comparison between set A and set B. How can I do this (and why is this important?)? An example based on the situation described here will help a great deal. Thanks in advance. 


#2
Feb1912, 11:42 AM

Sci Advisor
P: 3,300

Unfortunately "normalize" is an ambiguous instruction. It might mean to convert each data value [itex] v [/itex] to it's "zscore" by computing [itex] \frac{v  \mu}{\sigma} [/itex] where [itex] \mu [/itex] is the mean of the sample in question ( real or simulated) and [itex] \sigma [/itex] is the standard deviation of the sample.
It could mean something as simplistic as converting each data value [itex] v [/itex] to a sort of ranking by computing [itex] \frac{v  v_{min}}{v_{max}  v_{min} } [/itex] where [itex] v_{max} [/itex] and [itex] v_{min} [/itex] are, respectively, the max and min values in the sample. We'd have to know more about what the data and the simulation represent to know what makes sense  (and we'd have to assume the person who told to do this gave sensible advice!). If you use zscores you can probably defend that choice as a common meaning for "normalize". If both your historgrams had a roughly a bell shaped appearance, I'd guess that this was was your advisor meant. 


Register to reply 
Related Discussions  
PeerReviewed Journal Question w/ Dataset needed  Set Theory, Logic, Probability, Statistics  0  
EEG dataset  Medical Sciences  5  
Numerical Integration of a dataset (what is the best method?)  General Engineering  2  
Predicting dataset  Set Theory, Logic, Probability, Statistics  0  
Finding an oscillator's period with a dataset in Mathematica  Introductory Physics Homework  1 