How to Normalize a simulated dataset to fit the actual dataset?

jothisadhana · Feb 19, 2012

Can someone tell me how I can 'normalize' my dataset?

My scenario is as follows.

I have two datasets, A (real-life data) and B (simulated data).

Dataset A contains 4 numerical values (from an actual experiment):
-> E.g. 4 leaves from a binary tree each assigned with values 12.5,13.5,20.0 and 45.0.

Dataset B contains 40 numerical values (from a simulation done by the computer):
-> E.g. 40 leaves from a total of 10 binary trees where each tree produces 4 leaves with randomly assigned numerical values for each leaf.

For both datasets, I have computed their respective cumulative frequencies and plotted their respective charts using MS Excel e.g. [Cumulative frequencies of leaf values VS Leaf values]. This was to observe how similar/different are both of these data sets, where the smaller the vertical displacement between the two plots implies that both datasets are less different.

I was instructed to normalize my data from Dataset B and re-plot the chart for a better comparison between set A and set B.

How can I do this (and why is this important?)?

An example based on the situation described here will help a great deal. Thanks in advance.

Stephen Tashi · Feb 19, 2012

Unfortunately "normalize" is an ambiguous instruction. It might mean to convert each data value v to it's "z-score" by computing \frac{v - \mu}{\sigma} where \mu is the mean of the sample in question ( real or simulated) and \sigma is the standard deviation of the sample.

It could mean something as simplistic as converting each data value v to a sort of ranking by computing \frac{v - v_{min}}{v_{max} - v_{min} } where v_{max} and v_{min} are, respectively, the max and min values in the sample.

We'd have to know more about what the data and the simulation represent to know what makes sense - (and we'd have to assume the person who told to do this gave sensible advice!). If you use z-scores you can probably defend that choice as a common meaning for "normalize". If both your historgrams had a roughly a bell shaped appearance, I'd guess that this was was your advisor meant.

How to Normalize a simulated dataset to fit the actual dataset?

Similar threads

Graduate Confusion about the Moyal-Weyl twist

Undergrad How to define a vector field?

Undergrad The vector to which a dual vector corresponds

Undergrad 2 interpretations of bra-ket expression: equal, & isomorphic, but...

Undergrad Erroneously finding discrepancy in transpose rule

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight