Looking for large dataset of non image-centric physics data

In summary, the conversation revolved around the search for a large dataset of non image-centric physics data for the purpose of adding an example to a documentation for the Gallia library. The user was seeking a dataset comparable to the ICGC data for skin cancer, with a similar size and format. Suggestions were made for the Sloan Digital Sky Survey and the Large Hadron Collider data as potential options for this project.
  • #1
galliaproject
How did you find PF?: Google search

Hi all,

I'm looking for a good example of a large dataset of non image-centric physics data (e.g. astronomy, particles, ...) so I can add an example to this section of my documentation (formal announcement for the Gallia library: see Scala users forum).

I looked around for instance on the Awesome public datasets page but it doesn't link directly to the data in most cases, and I've gotten lost going to the rabbit holes too many times already. I think I'm just not familiar enough with the domain. For reference, here's what a great counterpart in bioinformatics data would be for what I need: ICGC data for skin cancer (it adds up to ~40GB of data once uncompressed).

It'd be nice if the dataset was similarly large-ish (despite no images) in terms for "rows", as in: not fitting your typical customer-grade computer's memory. It could be in pretty much any format among json/tsv/csv/avro/parquet and the likes (see Gallia's input section), and it doesn't have to be all in one file either. It can't however have millions or billions of columns, a single "row" has to fit memory at this time as Gallia is not a particularly column-focused data processing tool.

Any pointers would be greatly appreciated!

Thanks,

Anthony
 
Physics news on Phys.org
  • #2


Hello Anthony,

I often use large datasets for my research and I have come across a few that may be of interest to you for your project. One dataset that comes to mind is the Sloan Digital Sky Survey (SDSS) which contains information on over 3 million astronomical objects. It is available in various formats, including SQL and FITS, and can be accessed through their website or through the Virtual Observatory. Another dataset that may be useful is the Large Hadron Collider (LHC) data, which contains information on particle collisions from experiments at CERN. This data is available in various formats, including ROOT and HDF5, and can be accessed through the CERN Open Data Portal.

I hope this helps with your search for a suitable dataset. Good luck with your project!
 

1. What is the purpose of looking for a large dataset of non-image centric physics data?

The purpose of looking for a large dataset of non-image centric physics data is to use it for research and analysis. This type of data can provide valuable insights and help scientists understand complex physical phenomena.

2. How can I access a large dataset of non-image centric physics data?

There are various sources for accessing large datasets of non-image centric physics data. Some options include government databases, academic institutions, and online repositories. It is also possible to collect and compile your own dataset through experiments or collaborations with other scientists.

3. What types of data are typically included in non-image centric physics datasets?

Non-image centric physics datasets can include a wide range of data, such as numerical simulations, experimental measurements, theoretical calculations, and observational data. This can include data on various physical properties, such as temperature, pressure, velocity, and energy.

4. How can I ensure the quality and accuracy of a large dataset of non-image centric physics data?

To ensure the quality and accuracy of a large dataset of non-image centric physics data, it is important to carefully select the source of the data and thoroughly review the methodology used to collect and analyze the data. It is also recommended to compare the data with other sources and perform quality checks, such as data cleaning and outlier removal.

5. Are there any ethical considerations when using a large dataset of non-image centric physics data?

Yes, there may be ethical considerations when using a large dataset of non-image centric physics data. It is important to obtain proper consent and follow ethical guidelines when collecting and using data from human subjects. Additionally, it is important to properly cite the source of the data and give credit to the original researchers.

Similar threads

  • Programming and Computer Science
Replies
3
Views
777
  • STEM Educators and Teaching
Replies
5
Views
653
Replies
4
Views
2K
Replies
1
Views
587
  • Beyond the Standard Models
Replies
25
Views
5K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
Back
Top