Comp Sci Why is distributed computing/system important/necessary for big data?

AI Thread Summary
Distributed computing is crucial for managing big data due to its ability to process vast amounts of information across multiple systems, enhancing efficiency and scalability. Traditional data processing methods often fall short when dealing with the volume and variety of data generated from diverse, uncontrolled sources. An example of distributed systems in big data includes processing petabytes of data collected from various client platforms, which require significant computational resources beyond a single location. This approach allows for the integration and analysis of data from disparate sources, ensuring that insights can be derived effectively. Ultimately, distributed systems enable organizations to harness the full potential of big data, facilitating better decision-making and innovation.
shivajikobardan
Messages
637
Reaction score
54
Homework Statement
need of distributed computing in big data?
Relevant Equations
none
What is 1 example of use of distributed system in big data?

Here are the notes in my college curriculum, which I of course understand but it doesn't make clear what is the role of distributed system in big data-:


https://dotnettutorials.net/lesson/big-data-distributed-computing-and-complexity/https://www.dummies.com/article/tec...tributed-computing-basics-for-big-data-166996

https://www.ukessays.com/essays/engineering/distributed-computing-processing-data-5529.php

These are some tutorials that try to explain this topic. But imo fail to do so. They don't really explain the need of distributed system in big data.

(I already have studied subject called distributed system.https://www.ioenotes.edu.np/ioe-syllabus/distributed-system-computer-engineering-712 this was our syllabus. I studied it really well. I still have hipster pdas of this subject to reference upon...)
 
Physics news on Phys.org
"Big Data" means more than a really big data base. At my last job, we collected many petabytes of test data every time we ran a road test campaign. In a 1-month campaign, we might collect 20 petabytes of test data (the application was car radar). But we didn't consider it "Big Data" because we had full control of the process and the test system the whole time. The original collection of the data was done by automobile drivers who were company employees. Most of the data processing was done by dedicated processors in a single room of the building - and the rest was done by other systems in that same building. It wasn't that distributed.

"Big Data", by its nature, commonly originates from sources that are not under complete project control. The data is commonly exchanged with "client" platforms that are not dedicated to the projects purpose - and a substantial amount of processing is required on those remote platforms to support project goals.
 
  • Like
  • Informative
Likes FactChecker and shivajikobardan
Back
Top