How to Normalize a simulated dataset to fit the actual dataset?

jothisadhana · Feb 19, 2012

Can someone tell me how I can 'normalize' my dataset?

My scenario is as follows.

I have two datasets, A (real-life data) and B (simulated data).

Dataset A contains 4 numerical values (from an actual experiment):
-> E.g. 4 leaves from a binary tree each assigned with values 12.5,13.5,20.0 and 45.0.

Dataset B contains 40 numerical values (from a simulation done by the computer):
-> E.g. 40 leaves from a total of 10 binary trees where each tree produces 4 leaves with randomly assigned numerical values for each leaf.

For both datasets, I have computed their respective cumulative frequencies and plotted their respective charts using MS Excel e.g. [Cumulative frequencies of leaf values VS Leaf values]. This was to observe how similar/different are both of these data sets, where the smaller the vertical displacement between the two plots implies that both datasets are less different.

I was instructed to normalize my data from Dataset B and re-plot the chart for a better comparison between set A and set B.

How can I do this (and why is this important?)?

An example based on the situation described here will help a great deal. Thanks in advance.

Stephen Tashi · Feb 19, 2012

Unfortunately "normalize" is an ambiguous instruction. It might mean to convert each data value [itex]v[/itex] to it's "z-score" by computing [itex]\frac{v - \mu}{\sigma}[/itex] where [itex]\mu[/itex] is the mean of the sample in question ( real or simulated) and [itex]\sigma[/itex] is the standard deviation of the sample.

It could mean something as simplistic as converting each data value [itex]v[/itex] to a sort of ranking by computing [itex]\frac{v - v_{min}}{v_{max} - v_{min} }[/itex] where [itex]v_{max}[/itex] and [itex]v_{min}[/itex] are, respectively, the max and min values in the sample.

We'd have to know more about what the data and the simulation represent to know what makes sense - (and we'd have to assume the person who told to do this gave sensible advice!). If you use z-scores you can probably defend that choice as a common meaning for "normalize". If both your historgrams had a roughly a bell shaped appearance, I'd guess that this was was your advisor meant.

How to Normalize a simulated dataset to fit the actual dataset?

Similar threads

Undergrad The vector to which a dual vector corresponds

Graduate Confusion about the Moyal-Weyl twist

Undergrad 2 interpretations of bra-ket expression: equal, & isomorphic, but...

Undergrad Spinor calculus

Undergrad Matrix representation of rank-2 spinors

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect