Why is distributed computing/system important/necessary for big data?

Click For Summary
SUMMARY

Distributed computing is essential for managing big data due to its ability to process vast amounts of information from diverse, uncontrolled sources. In scenarios like automotive testing, where data can reach up to 20 petabytes, a distributed system allows for efficient data handling across multiple platforms. Unlike traditional systems that operate within a controlled environment, distributed systems facilitate the processing of data from various client platforms, ensuring scalability and flexibility in data management. This distinction underscores the necessity of distributed computing in the realm of big data.

PREREQUISITES
  • Understanding of distributed systems architecture
  • Familiarity with big data concepts and terminology
  • Knowledge of data processing frameworks (e.g., Apache Hadoop, Apache Spark)
  • Experience with data storage solutions (e.g., NoSQL databases, distributed file systems)
NEXT STEPS
  • Explore Apache Hadoop for distributed data processing
  • Learn about Apache Spark for real-time big data analytics
  • Investigate the role of NoSQL databases in handling large datasets
  • Study the principles of data governance in distributed environments
USEFUL FOR

Data engineers, big data analysts, and IT professionals involved in large-scale data processing and management will benefit from this discussion.

shivajikobardan
Messages
637
Reaction score
54
Homework Statement
need of distributed computing in big data?
Relevant Equations
none
What is 1 example of use of distributed system in big data?

Here are the notes in my college curriculum, which I of course understand but it doesn't make clear what is the role of distributed system in big data-:


https://dotnettutorials.net/lesson/big-data-distributed-computing-and-complexity/https://www.dummies.com/article/tec...tributed-computing-basics-for-big-data-166996

https://www.ukessays.com/essays/engineering/distributed-computing-processing-data-5529.php

These are some tutorials that try to explain this topic. But imo fail to do so. They don't really explain the need of distributed system in big data.

(I already have studied subject called distributed system.https://www.ioenotes.edu.np/ioe-syllabus/distributed-system-computer-engineering-712 this was our syllabus. I studied it really well. I still have hipster pdas of this subject to reference upon...)
 
Physics news on Phys.org
"Big Data" means more than a really big data base. At my last job, we collected many petabytes of test data every time we ran a road test campaign. In a 1-month campaign, we might collect 20 petabytes of test data (the application was car radar). But we didn't consider it "Big Data" because we had full control of the process and the test system the whole time. The original collection of the data was done by automobile drivers who were company employees. Most of the data processing was done by dedicated processors in a single room of the building - and the rest was done by other systems in that same building. It wasn't that distributed.

"Big Data", by its nature, commonly originates from sources that are not under complete project control. The data is commonly exchanged with "client" platforms that are not dedicated to the projects purpose - and a substantial amount of processing is required on those remote platforms to support project goals.
 
  • Like
  • Informative
Likes   Reactions: FactChecker and shivajikobardan

Similar threads

  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
Replies
10
Views
5K
Replies
8
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
12K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K