CERN releases 300TB of LHC data to public

  • Context: High School 
  • Thread starter Thread starter Greg Bernhardt
  • Start date Start date
  • Tags Tags
    Cern Data Lhc
Click For Summary
SUMMARY

CERN has released 300TB of data from the Large Hadron Collider (LHC) to the public, enabling researchers and students to engage with groundbreaking scientific data. This initiative allows universities to utilize the data for educational purposes, encouraging students to analyze and verify findings. The release includes not only raw data but also simulation tools that help in understanding detector responses to various events. This transparency aims to counter misinformation and enhance public engagement with particle physics research.

PREREQUISITES
  • Understanding of particle physics principles
  • Familiarity with data analysis techniques in scientific research
  • Knowledge of Monte Carlo (MC) simulations and their applications
  • Basic skills in using data visualization tools for scientific data
NEXT STEPS
  • Explore the CMS data analysis framework for processing LHC data
  • Learn about Monte Carlo simulations in particle physics
  • Investigate the role of data-driven background estimates in experimental physics
  • Research the impact of public access to scientific data on funding and public opinion
USEFUL FOR

Researchers, educators, and students in the fields of particle physics, data science, and scientific communication will benefit from this discussion, particularly those interested in data analysis and public engagement in science.

Physics news on Phys.org
Hello Greg
I have to ask if your question is rhetorical? I read that page linked and a few it linked and it looks like due diligence to me at it's most basic but also since 300TB is a sum, one is not required to download it all. The article even mentioned that some Universities are employing the data to task students with plotting and verifying so it seems a terrific way to get students involved with some of the most exciting data ever collected.

Additionally, among some ummm "less than stringently scientific groups" ; ) for some reason they feel compelled to misinterpret and even lie about CERN. While some institutions employ the "ignore absurdity and maybe it will go away" method, in all honesty that has not seemed to workout well at the very least since public opinion does indeed have a powerful affect on funding. Now itr is possible to reply to such "villagers with pitchforks" with a simple, "We have released 300 TB of our data and you are welcome to examine and question any part of it. Thank you for your interest." : D
 
  • Like
Likes   Reactions: Greg Bernhardt
Many studies can be done with small subsets of those 300 TB.

Various groups will probably look at the whole dataset - the number of analyses you can do is always limited by manpower, so there are things CMS did not study with that data. Theorists with pet models about physics beyond the standard model that would lead to very obscure signals can now check the data on their own.

From a broader perspective: the public funded the experiments. It already has access to the final published results (all CERN results are open access, and all particle physics gets uploaded to arXiv anyway), but why not also give access to the raw data?
 
It seems they also have some other things for people to download. Some sort of simulated data and simulation tools maybe. I'm not quite sure what they are. Could anyone explain?
 
Simulations how the detector reacts to given (also simulated) events, yes. You need that for most analyses. You have to know how a possible signal will look like, how the background looks like, and how your detector will react to all those events.
Experiments never trust those simulations, and check their accuracy with independent analyses, but it does not work completely without simulations.
 
  • Like
Likes   Reactions: ShayanJ
It is quiet interesting how MC simulations are disfavored vs Data-driven background estimates in particle physicis... on the other hand other fields have adopted the MCs (like using MC to simulate the interaction of cosmic rays with the lunar surface https://arxiv.org/abs/1604.03349).
 
It is known that the MC descriptions (simulations) are not that good, and without data-driven estimates you rarely know how large the deviations to data are.
 
I've heard that statement made about the MC method before, but I never understood why that was? Is it b/c of the dimensionality of the space being considered or is it more to do with details of the actual detectors?
 
Both.

The description of the detector is not perfect: You never get the radiation length of every component exactly right, you never know the exact asymmetry of your detector in response to kaon/antikaons, you don't get the exact amount of charge sharing between adjacent channels after radiation damage right, and hundreds of similar details.

The simulation of the proton-proton collisions is not perfect. This is mainly due to nonperturbative QCD effects. You don't know the parton distribution functions exactly, the hadronization description is not perfect. In addition, you have to limit the calculations of some processes to fixed order, and so on. There are processes that can be modeled very well, while others don't have any purely theoretical predictions and rely on experimental data.
 
  • Like
Likes   Reactions: Haelfix

Similar threads

  • · Replies 9 ·
Replies
9
Views
993
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 69 ·
3
Replies
69
Views
14K
  • · Replies 17 ·
Replies
17
Views
6K
  • · Replies 48 ·
2
Replies
48
Views
8K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
4K
  • · Replies 10 ·
Replies
10
Views
4K