Efficient Visualization Techniques for Large Datasets: A Scientific Inquiry

lylos · Oct 26, 2012

First of all, let me apologize if this is the wrong section to post to.

I am in need of ideas on how to visualize a large dataset. I have a dataset (~4GB) of x,y,z,f(x,y,z) values. To best be able to draw conclusions from my data, it must be plotted in such a way that at each x,y,z there is a 3d sphere that has opacity scaled by f(x,y,z).

I have tried a couple plotting programs: VisIt, and ParaView. I was not able to get either to provide the functionality I need, although they are very close!

I then wrote a python script that created a VTK file with each individual point defined. Yet, the vtk file turned out to be pretty large and there was no viewer capable of displaying it.

Any suggestion would be greatly appreciated. I am just looking for some ideas here, I've about ran out.

DavidSnider · Oct 26, 2012

With that many points you need something with visibility determination. The usual approach for this kind of data is an octree.

DavidSnider · Oct 26, 2012

Take a look at this:
http://pointclouds.org/

This is based on VTK though.. It sounds to me like the problem you have is that the data itself needs to be streamed in as it won't all fit in memory.

Is there some way you could aggregate the datapoints so that it would still give you meaningful data?

lylos · Oct 26, 2012

The idea of a point cloud representation is similar to what I had in mind.

The f(x,y,z) spans many magnitudes in value. I'm only interested in those points that have a larger value of f(x,y,z). As such, perhaps I could have a lower threshold, below which a point will not even be generated. This should lower the memory footprint, still giving me meaningful data...

Bill Simpson · Oct 26, 2012

Since your f spans many orders of magnitudes almost nothing will display that meaningfully, so take the log of the f before trying to display.

Since you have a vast supply of data, more than almost anything could visualize meaningfully, try sampling your data. Randomly select 1/16 of your data points, repeat the process, display both of those side by side and see if you see any substantial difference.

If you see large differences then perhaps you might have success identifying clusters within the data and then displaying one representative for each cluster. There are lots of papers describing how to identify clusters, but you will need a program that can cope with that much data.

gsal · Oct 26, 2012

Mayavi?

Efficient Visualization Techniques for Large Datasets: A Scientific Inquiry

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

LaTeX How to correctly type "-i.e." in LaTeX?

MATLAB MATLAB GPU Cross-Correlation Code Fails with OOM on 8 GB Supercomputer

LaTeX Spaces when using widetext in latex

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers