Multiprocessing with subprocessing running

  • Python
  • Thread starter ChrisVer
  • Start date
  • Tags
    Running
In summary: I think you're talking about the output produced by the exe when it's running? In that case, no, because when I try to run it in parallel the errors always occurs.In summary, the program runs fine but errors are produced when it's run in parallel.
  • #1
ChrisVer
Gold Member
3,378
464
Hi, I have created the following code:

Python:
import subprocess
import threading
import multiprocessing
import timedef run_proc( name ,locker):
    #locker.acquire()
    cmd = './Executable -a -b -m %s -p Low -bins=100 --max=%s  -c %s channel%s'%(name[0],name[1],name[2],name[3])
    subprocess.call(cmd,shell=True)
    #locker.release()

def main():
    arguments = ([500, 0.1 , 100, 500], [500,0.1,150,500])
    '''
    t1 =time.time()
    for argument in arguments:
        run_proc(argument, 'h')
    print("Conventional way: ",time.time()-t1)
    '''

    print('STARTING MULTIPROCESSING')
    t3=time.time()
    locker=''
    #locker = multiprocessing.Lock()
    proc1 =  multiprocessing.Process(target=run_proc, args=(arguments[0],locker))
    proc2 =  multiprocessing.Process(target=run_proc, args=(arguments[1],locker))
    proc1.start()
    proc2.start()
    proc1.join()
    proc2.join()
    print("Multiprocessing way: ",time.time()-t3)if __name__=="__main__":
    main()

well, the run_proc(name, locker) function is supposed to be running an executable file I have, with several input variables determined by the set name variable. I also pass the locker for the multiprocessing.
the main() function does the following; it creates 2 processes (I want to run the executable two times in parallel with different configurations), and starts them.
The problem is that the executable treats similar output in the same way, and so I generally receive "errors" (eg producing the same root files and writing histos or graphs into them, unable to delete them etc) if I don't use the Locks. If I use the Lock, on the other hand, my program becomes as slow as can be... I tested the performance of time by running twice -in a sequential manner- the run_proc (commented out lines between triple quotes).
Is there a way to get over it? Thanks.
 
Technology news on Phys.org
  • #2
I'm not sure exactly why things aren't working here but thought this tutorial might help you with some basics:

https://pymotw.com/2/multiprocessing/basics.html

I've done equivalent code in Java and wanted to point out that when you join a thread the parent stops there waiting for the thread to complete before doing the next statement unless the thread has already completed its task.

Your p1.join() waits for p1 to complete and the p2.join() waits for p2.

When the command is fired off in run-proc does run-proc wait until the command completed or does it return because there's nothing more to do? In Java we might use a waitfor() to hold the thread until the process is completed. Without the waitfor a long running command process will be running freely in the background while your thread quickly exits.

Your use of the lock seems to make the program sequential. You start p1 and it acquires the lock while you're starting p2. P2 can't do anything but monitor the lock until p1 gives it up. Once the lock is free then p2 can run. In your main thread it waits on p1 via the join and when p1 completes it waits on p2 ...

To see why you're getting the errors we'd need to see the errors and the code for the command you're running. If executable is reading and writing the same files then you might need to catch the errors and handle recovery or retry yourself because you know there's bound to be collisions with two of the same program running simultaneously.
 
  • Like
Likes QuantumQuest
  • #3
Hi, thanks for the answer...
But I was confused: are you blaming the .Lock() or the .join() for the delays?

jedishrfu said:
I'm not sure exactly why things aren't working here
Sorry, the program works, I just consider it as slow as not using multiprocessing... the way I visualize what I want to do is instead of running the exe file 50 times one after the other, create 50 processes that will run the exe in parallel (so save time). My study of the code though indicated that I am not saving time yet.

jedishrfu said:
join a thread the parent stops there waiting for the thread to complete before
isn't that against multiprocessing? I was misinformed that join links the child to parent process, once the first has finished.

jedishrfu said:
When the command is fired off in run-proc does run-proc wait until the command completed or does it return because there's nothing more to do?
Hm, I am not quiet sure I understand this question... when run_proc starts, it runs the exe file. The exe file takes around 20seconds to complete (the executable comes from BAT -Bayesian Analysis Toolkit- and does some work in the background).

jedishrfu said:
To see why you're getting the errors we'd need to see the errors and the code for the command you're running. If executable is reading and writing the same files then you might need to catch the errors and handle recovery or retry yourself because you know there's bound to be collisions with two of the same program running simultaneously.
the errors mainly come because of the two processes trying to access the same root file.
Code:
chrisver@n1:scripts$ python Test_run.py >> org.txt
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1
SysError in <TFile::TFile>: could not delete limit_tree_exp_signalModel.root (errno: 2) (No such file or directory)
Error in <TFile::WriteTObject>: Directory limit_tree_exp_signalModel.root is not writable
Error in <TFile::WriteTObject>: Directory limit_tree_exp_signalModel.root is not writable
I don't think that retrying can save me.
Would it possible to tell during multiprocessing to create tmp folders and write in the output files? Or it has to be done within the executable?
 
  • #4
Multiprocessing is creating so-called LWP's (light weight processes) that share some memory but operate independently. In C and C++ it is called threading.
I think @jedishrfu is correct - your processes are spending lots of resources on lock waits (mutex checking).

Comment: So your idea of using multiprocessing has limits you need to understand. If your processes become I/O bound (tied up on disk I/O becasue of a long I/O request queue) using multiprocessing will make things worse. Plus, your desktop PC does not have 50 cpu cores. If your process uses tons of cpu then you need to limit the number of LWP's to (cpu core count) * 2. Or some other factor.

There are some ways to see what is happening - like I think you seem to have an exclusive read/write lock on a file. Based on the error message.
Anyway, find out about instrumenting code for your platform. Linux also has an extensive set of tools for doing that outside your code. Windows has tools as well -- the ones I know about are not free. If you are going to program this skill is a real boon.

Also consider: multiprocessing and parallel processing are somewhat different. You appear to be invoking a completely separate program. The program code "thinks" it owns everything in it's own little world.
 
  • Like
Likes QuantumQuest
  • #5
As I see it through Java - as jedishrfu also points out and in particular in Android platform that I have worked extensively, you're basically trying to invoke processes with the limitations of pseudo-parallelism, multiple times. If your machine has several processors or at least several cores, then the processes or spawn sub-processes (threads in an explicit or implicit way) can run independently, provided that your OS also supports multi-processing and in the way that it does it .The limitations that jim mcnamara points out also apply. In the more traditional or maybe better "old-fashioned" case that you need to start, stop and lock sub-processes i.e create mutex controlled code for one processor, you have to take care of the things that jedishrfu mentions above. For this latter case, there are ways to improve the code through Python you use. Take a look here at Python docs for the recent Python version - you can choose other versions as well, if you need to. The OS usually tries its best regarding CPU/cores utilization but there are ways also to improve this - check the documentation of your particular OS.
 
  • #6
Its a bad idea to multitask pgms that write to or modify the same files as some oses won't let uou write a file if someone is reading or writing it.

If instead you can make the executable read and write to a different directory then you can get something out if your parallelization.

In my code i often run identical programs in separate Home directories by setting the directory via the process api since most programs work relative to the directory youre in when you launch them.
 

1. What is multiprocessing and how does it differ from multitasking?

Multiprocessing is the ability of a computer to simultaneously execute multiple tasks or processes. It differs from multitasking in that multitasking involves switching between different tasks or processes, while multiprocessing allows for the execution of multiple tasks or processes at the same time.

2. What is the role of subprocesses in multiprocessing?

Subprocesses are used in multiprocessing to create and manage additional processes within a single program. They allow for the distribution of tasks across multiple cores or processors, thereby improving performance and efficiency.

3. How do you create and manage subprocesses in Python?

In Python, the subprocess module is used to create and manage subprocesses. The subprocess module provides various functions and classes for creating, managing, and communicating with subprocesses.

4. What are the potential benefits of multiprocessing with subprocesses?

Multiprocessing with subprocesses can improve the performance and efficiency of a program by distributing tasks across multiple cores or processors. It can also help with handling complex or resource-intensive tasks more quickly and efficiently.

5. Are there any potential challenges or limitations when using multiprocessing with subprocesses?

Yes, there are a few potential challenges and limitations when using multiprocessing with subprocesses. These include managing shared resources, ensuring proper synchronization between processes, and potential conflicts with other multiprocessing libraries or frameworks.

Similar threads

  • Programming and Computer Science
Replies
8
Views
798
  • Programming and Computer Science
Replies
13
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
2K
  • Programming and Computer Science
Replies
7
Views
3K
  • Programming and Computer Science
Replies
6
Views
4K
  • Programming and Computer Science
Replies
5
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
3
Views
408
  • Programming and Computer Science
Replies
29
Views
3K
  • Programming and Computer Science
Replies
1
Views
2K
Back
Top