Why is distributed computing/system important/necessary for big data?

In summary, big data refers to large amounts of data that are not under complete project control and require distributed processing on remote platforms. This is where distributed systems play a crucial role, as they allow for efficient and effective processing of this vast amount of data. These systems help to manage the complexity and scale of big data, making it possible to extract meaningful insights and value from it. Without distributed systems, it would be challenging to handle such massive amounts of data and achieve the desired project goals.
  • #1
shivajikobardan
674
54
Homework Statement
need of distributed computing in big data?
Relevant Equations
none
What is 1 example of use of distributed system in big data?

Here are the notes in my college curriculum, which I of course understand but it doesn't make clear what is the role of distributed system in big data-:


https://dotnettutorials.net/lesson/big-data-distributed-computing-and-complexity/https://www.dummies.com/article/tec...tributed-computing-basics-for-big-data-166996

https://www.ukessays.com/essays/engineering/distributed-computing-processing-data-5529.php

These are some tutorials that try to explain this topic. But imo fail to do so. They don't really explain the need of distributed system in big data.

(I already have studied subject called distributed system.https://www.ioenotes.edu.np/ioe-syllabus/distributed-system-computer-engineering-712 this was our syllabus. I studied it really well. I still have hipster pdas of this subject to reference upon...)
 
Physics news on Phys.org
  • #2
"Big Data" means more than a really big data base. At my last job, we collected many petabytes of test data every time we ran a road test campaign. In a 1-month campaign, we might collect 20 petabytes of test data (the application was car radar). But we didn't consider it "Big Data" because we had full control of the process and the test system the whole time. The original collection of the data was done by automobile drivers who were company employees. Most of the data processing was done by dedicated processors in a single room of the building - and the rest was done by other systems in that same building. It wasn't that distributed.

"Big Data", by its nature, commonly originates from sources that are not under complete project control. The data is commonly exchanged with "client" platforms that are not dedicated to the projects purpose - and a substantial amount of processing is required on those remote platforms to support project goals.
 
  • Like
  • Informative
Likes FactChecker and shivajikobardan

1. Why is distributed computing important for big data?

Distributed computing is important for big data because it allows for the processing and analysis of large datasets that cannot be handled by a single computer. By distributing the workload among multiple computers, the processing time can be significantly reduced, making it possible to handle large amounts of data in a reasonable amount of time.

2. How does distributed computing help with big data?

Distributed computing helps with big data by breaking down the data into smaller chunks and distributing them among multiple computers. This allows for parallel processing, where each computer can work on its assigned chunk of data simultaneously. The results from each computer are then combined to get the final result.

3. What are the benefits of using a distributed system for big data?

There are several benefits of using a distributed system for big data. Firstly, it allows for faster processing and analysis of large datasets. Additionally, it provides fault tolerance, as if one computer fails, the others can continue to work and the data is not lost. It also allows for scalability, as more computers can be added to the system as the data grows.

4. Is distributed computing necessary for handling big data?

Yes, distributed computing is necessary for handling big data. As the amount of data continues to grow, it becomes impossible for a single computer to handle the processing and analysis. Distributed computing allows for the efficient handling of large datasets and is essential for big data applications.

5. What are the challenges of using distributed computing for big data?

There are several challenges of using distributed computing for big data. One of the main challenges is ensuring data consistency and synchronization among the different computers. Another challenge is managing the communication and coordination between the computers. Additionally, there may be security concerns when transferring data between different nodes in a distributed system.

Similar threads

  • Programming and Computer Science
Replies
1
Views
716
  • Engineering and Comp Sci Homework Help
Replies
2
Views
982
  • Programming and Computer Science
Replies
1
Views
708
Replies
10
Views
2K
Replies
8
Views
2K
  • STEM Career Guidance
Replies
1
Views
3K
  • Art, Music, History, and Linguistics
Replies
1
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
2K
Back
Top