Register to reply

How to Normalize a simulated dataset to fit the actual dataset?

by jothisadhana
Tags: excel, normalisation
Share this thread:
jothisadhana
#1
Feb19-12, 06:46 AM
P: 5
Can someone tell me how I can 'normalize' my dataset?

My scenario is as follows.

I have two datasets, A (real-life data) and B (simulated data).

Dataset A contains 4 numerical values (from an actual experiment):
-> E.g. 4 leaves from a binary tree each assigned with values 12.5,13.5,20.0 and 45.0.

Dataset B contains 40 numerical values (from a simulation done by the computer):
-> E.g. 40 leaves from a total of 10 binary trees where each tree produces 4 leaves with randomly assigned numerical values for each leaf.

For both datasets, I have computed their respective cumulative frequencies and plotted their respective charts using MS Excel e.g. [Cumulative frequencies of leaf values VS Leaf values]. This was to observe how similar/different are both of these data sets, where the smaller the vertical displacement between the two plots implies that both datasets are less different.

I was instructed to normalize my data from Dataset B and re-plot the chart for a better comparison between set A and set B.

How can I do this (and why is this important?)?

An example based on the situation described here will help a great deal. Thanks in advance.
Phys.Org News Partner Science news on Phys.org
World's largest solar boat on Greek prehistoric mission
Google searches hold key to future market crashes
Mineral magic? Common mineral capable of making and breaking bonds
Stephen Tashi
#2
Feb19-12, 11:42 AM
Sci Advisor
P: 3,252
Unfortunately "normalize" is an ambiguous instruction. It might mean to convert each data value [itex] v [/itex] to it's "z-score" by computing [itex] \frac{v - \mu}{\sigma} [/itex] where [itex] \mu [/itex] is the mean of the sample in question ( real or simulated) and [itex] \sigma [/itex] is the standard deviation of the sample.

It could mean something as simplistic as converting each data value [itex] v [/itex] to a sort of ranking by computing [itex] \frac{v - v_{min}}{v_{max} - v_{min} } [/itex] where [itex] v_{max} [/itex] and [itex] v_{min} [/itex] are, respectively, the max and min values in the sample.

We'd have to know more about what the data and the simulation represent to know what makes sense - (and we'd have to assume the person who told to do this gave sensible advice!). If you use z-scores you can probably defend that choice as a common meaning for "normalize". If both your historgrams had a roughly a bell shaped appearance, I'd guess that this was was your advisor meant.


Register to reply

Related Discussions
Peer-Reviewed Journal Question w/ Dataset needed Set Theory, Logic, Probability, Statistics 0
EEG dataset Medical Sciences 5
Numerical Integration of a dataset (what is the best method?) General Engineering 2
Predicting dataset Set Theory, Logic, Probability, Statistics 0
Finding an oscillator's period with a dataset in Mathematica Introductory Physics Homework 1