Running a limited number of child processes in parallel in bash?












23















I have a large set of files for which some heavy processing needs to be done.
This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run.
My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.



In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.



However a very simple example shell script like this will trash the system performance due to excessive load and swapping:



find . -type f | while read name ; 
do
some_heavy_processing_command ${name} &
done


So what I want is essentially similar to what "gmake -j4" does.



I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).



What is the simplest/cleanest/best solution to do what I want?





Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads/sub-processes used in a function in bash
The "xargs --max-procs=4" works like a charm.
(So I voted to close my own question)










share|improve this question




















  • 8





    possible duplicate of stackoverflow.com/questions/6511884/… I'd use xargs --max-procs=4 for this...

    – Fredrik Pihl
    Jul 6 '11 at 8:57






  • 4





    it seems like a job for GNU parallel, but I'm not sure it adds extra power to xargs --max-procs, which I didn't know

    – larsen
    Jul 6 '11 at 10:14











  • @Niels: I've been using screen for the purpose, though it's a bit messy this way, especially when started from within another screen session ;)

    – 0xC0000022L
    Jul 6 '11 at 13:38
















23















I have a large set of files for which some heavy processing needs to be done.
This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run.
My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.



In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.



However a very simple example shell script like this will trash the system performance due to excessive load and swapping:



find . -type f | while read name ; 
do
some_heavy_processing_command ${name} &
done


So what I want is essentially similar to what "gmake -j4" does.



I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).



What is the simplest/cleanest/best solution to do what I want?





Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads/sub-processes used in a function in bash
The "xargs --max-procs=4" works like a charm.
(So I voted to close my own question)










share|improve this question




















  • 8





    possible duplicate of stackoverflow.com/questions/6511884/… I'd use xargs --max-procs=4 for this...

    – Fredrik Pihl
    Jul 6 '11 at 8:57






  • 4





    it seems like a job for GNU parallel, but I'm not sure it adds extra power to xargs --max-procs, which I didn't know

    – larsen
    Jul 6 '11 at 10:14











  • @Niels: I've been using screen for the purpose, though it's a bit messy this way, especially when started from within another screen session ;)

    – 0xC0000022L
    Jul 6 '11 at 13:38














23












23








23


17






I have a large set of files for which some heavy processing needs to be done.
This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run.
My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.



In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.



However a very simple example shell script like this will trash the system performance due to excessive load and swapping:



find . -type f | while read name ; 
do
some_heavy_processing_command ${name} &
done


So what I want is essentially similar to what "gmake -j4" does.



I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).



What is the simplest/cleanest/best solution to do what I want?





Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads/sub-processes used in a function in bash
The "xargs --max-procs=4" works like a charm.
(So I voted to close my own question)










share|improve this question
















I have a large set of files for which some heavy processing needs to be done.
This processing in single threaded, uses a few hundred MiB of RAM (on the machine used to start the job) and takes a few minutes to run.
My current usecase is to start a hadoop job on the input data, but I've had this same problem in other cases before.



In order to fully utilize the available CPU power I want to be able to run several those tasks in paralell.



However a very simple example shell script like this will trash the system performance due to excessive load and swapping:



find . -type f | while read name ; 
do
some_heavy_processing_command ${name} &
done


So what I want is essentially similar to what "gmake -j4" does.



I know bash supports the "wait" command but that only waits untill all child processes have completed. In the past I've created scripting that does a 'ps' command and then grep the child processes out by name (yes, i know ... ugly).



What is the simplest/cleanest/best solution to do what I want?





Edit: Thanks to Frederik: Yes indeed this is a duplicate of How to limit number of threads/sub-processes used in a function in bash
The "xargs --max-procs=4" works like a charm.
(So I voted to close my own question)







bash parallel-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 12:00









Community

11




11










asked Jul 6 '11 at 8:29









Niels BasjesNiels Basjes

6,51273953




6,51273953








  • 8





    possible duplicate of stackoverflow.com/questions/6511884/… I'd use xargs --max-procs=4 for this...

    – Fredrik Pihl
    Jul 6 '11 at 8:57






  • 4





    it seems like a job for GNU parallel, but I'm not sure it adds extra power to xargs --max-procs, which I didn't know

    – larsen
    Jul 6 '11 at 10:14











  • @Niels: I've been using screen for the purpose, though it's a bit messy this way, especially when started from within another screen session ;)

    – 0xC0000022L
    Jul 6 '11 at 13:38














  • 8





    possible duplicate of stackoverflow.com/questions/6511884/… I'd use xargs --max-procs=4 for this...

    – Fredrik Pihl
    Jul 6 '11 at 8:57






  • 4





    it seems like a job for GNU parallel, but I'm not sure it adds extra power to xargs --max-procs, which I didn't know

    – larsen
    Jul 6 '11 at 10:14











  • @Niels: I've been using screen for the purpose, though it's a bit messy this way, especially when started from within another screen session ;)

    – 0xC0000022L
    Jul 6 '11 at 13:38








8




8





possible duplicate of stackoverflow.com/questions/6511884/… I'd use xargs --max-procs=4 for this...

– Fredrik Pihl
Jul 6 '11 at 8:57





possible duplicate of stackoverflow.com/questions/6511884/… I'd use xargs --max-procs=4 for this...

– Fredrik Pihl
Jul 6 '11 at 8:57




4




4





it seems like a job for GNU parallel, but I'm not sure it adds extra power to xargs --max-procs, which I didn't know

– larsen
Jul 6 '11 at 10:14





it seems like a job for GNU parallel, but I'm not sure it adds extra power to xargs --max-procs, which I didn't know

– larsen
Jul 6 '11 at 10:14













@Niels: I've been using screen for the purpose, though it's a bit messy this way, especially when started from within another screen session ;)

– 0xC0000022L
Jul 6 '11 at 13:38





@Niels: I've been using screen for the purpose, though it's a bit messy this way, especially when started from within another screen session ;)

– 0xC0000022L
Jul 6 '11 at 13:38












7 Answers
7






active

oldest

votes


















18














#! /usr/bin/env bash

set -o monitor
# means: run background processes in a separate processes...
trap add_next_job CHLD
# execute add_next_job when we receive a child complete signal

todo_array=($(find . -type f)) # places output into an array

index=0
max_jobs=2

function add_next_job {
# if still jobs to do then add one
if [[ $index -lt ${#todo_array[*]} ]]
# apparently stackoverflow doesn't like bash syntax
# the hash in the if is not a comment - rather it's bash awkward way of getting its length
then
echo adding job ${todo_array[$index]}
do_job ${todo_array[$index]} &
# replace the line above with the command you want
index=$(($index+1))
fi
}

function do_job {
echo "starting job $1"
sleep 2
}

# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
add_next_job
done

# wait for all jobs to complete
wait
echo "done"


Having said that Fredrik makes the excellent point that xargs does exactly what you want...






share|improve this answer
























  • I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

    – 0xC0000022L
    Jul 6 '11 at 18:42











  • Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

    – Niels Basjes
    Jul 7 '11 at 20:34



















22














I know I'm late to the party with this answer but I thought I would post an alternative that, IMHO, makes the body of the script cleaner and simpler. (Clearly you can change the values 2 & 5 to be appropriate for your scenario.)



function max2 {
while [ `jobs | wc -l` -ge 2 ]
do
sleep 5
done
}

find . -type f | while read name ;
do
max2; some_heavy_processing_command ${name} &
done
wait





share|improve this answer



















  • 2





    Dude, this works brilliantly! Thanks! :)

    – mkgrunder
    Oct 8 '13 at 19:43











  • This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

    – Jeffrey Cordero
    Jun 23 '17 at 15:07



















9














With GNU Parallel it becomes simpler:



find . -type f | parallel  some_heavy_processing_command {}


Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1






share|improve this answer































    4














    I think I found a more handy solution using make:



    #!/usr/bin/make -f

    THIS := $(lastword $(MAKEFILE_LIST))
    TARGETS := $(shell find . -name '*.sh' -type f)

    .PHONY: all $(TARGETS)

    all: $(TARGETS)

    $(TARGETS):
    some_heavy_processing_command $@

    $(THIS): ; # Avoid to try to remake this makefile


    Call it as e.g. 'test.mak', and add execute rights. If You call ./test.mak it will call the some_heavy_processing_command one-by-one. But You can call as ./test.mak -j 4, then it will run four subprocesses at once. Also You can use it on a more sophisticated way: run as ./test.mak -j 5 -l 1.5, then it will run maximum 5 sub-processes while the system load is under 1.5, but it will limit the number of processes if the system load exceeds 1.5.



    It is more flexible than xargs, and make is part of the standard distribution, not like parallel.






    share|improve this answer

































      3














      This code worked quite well for me.



      I noticed one issue in which the script couldn't end.
      If you run into a case where the script wont end due to max_jobs being greater than the number of elements in the array, the script will never quit.



      To prevent the above scenario, I've added the following right after the "max_jobs" declaration.



      if [ $max_jobs -gt ${#todo_array[*]} ];
      then
      # there are more elements found in the array than max jobs, setting max jobs to #of array elements"
      max_jobs=${#todo_array[*]}
      fi





      share|improve this answer

































        -1














        Another option:



        PARALLEL_MAX=...
        function start_job() {
        while [ $(ps --no-headers -o pid --ppid=$$ | wc -l) -gt $PARALLEL_MAX ]; do
        sleep .1 # Wait for background tasks to complete.
        done
        "$@" &
        }
        start_job some_big_command1
        start_job some_big_command2
        start_job some_big_command3
        start_job some_big_command4
        ...





        share|improve this answer































          -1














          Here is a very good function I used to control the maximum # of jobs from bash or ksh. NOTE: the - 1 in the pgrep subtracts the wc -l subprocess.



          function jobmax
          {
          typeset -i MAXJOBS=$1
          sleep .1
          while (( ($(pgrep -P $$ | wc -l) - 1) >= $MAXJOBS ))
          do
          sleep .1
          done
          }

          nproc=5
          for i in {1..100}
          do
          sleep 1 &
          jobmax $nproc
          done
          wait # Wait for the rest





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6593531%2frunning-a-limited-number-of-child-processes-in-parallel-in-bash%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            7 Answers
            7






            active

            oldest

            votes








            7 Answers
            7






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            18














            #! /usr/bin/env bash

            set -o monitor
            # means: run background processes in a separate processes...
            trap add_next_job CHLD
            # execute add_next_job when we receive a child complete signal

            todo_array=($(find . -type f)) # places output into an array

            index=0
            max_jobs=2

            function add_next_job {
            # if still jobs to do then add one
            if [[ $index -lt ${#todo_array[*]} ]]
            # apparently stackoverflow doesn't like bash syntax
            # the hash in the if is not a comment - rather it's bash awkward way of getting its length
            then
            echo adding job ${todo_array[$index]}
            do_job ${todo_array[$index]} &
            # replace the line above with the command you want
            index=$(($index+1))
            fi
            }

            function do_job {
            echo "starting job $1"
            sleep 2
            }

            # add initial set of jobs
            while [[ $index -lt $max_jobs ]]
            do
            add_next_job
            done

            # wait for all jobs to complete
            wait
            echo "done"


            Having said that Fredrik makes the excellent point that xargs does exactly what you want...






            share|improve this answer
























            • I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

              – 0xC0000022L
              Jul 6 '11 at 18:42











            • Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

              – Niels Basjes
              Jul 7 '11 at 20:34
















            18














            #! /usr/bin/env bash

            set -o monitor
            # means: run background processes in a separate processes...
            trap add_next_job CHLD
            # execute add_next_job when we receive a child complete signal

            todo_array=($(find . -type f)) # places output into an array

            index=0
            max_jobs=2

            function add_next_job {
            # if still jobs to do then add one
            if [[ $index -lt ${#todo_array[*]} ]]
            # apparently stackoverflow doesn't like bash syntax
            # the hash in the if is not a comment - rather it's bash awkward way of getting its length
            then
            echo adding job ${todo_array[$index]}
            do_job ${todo_array[$index]} &
            # replace the line above with the command you want
            index=$(($index+1))
            fi
            }

            function do_job {
            echo "starting job $1"
            sleep 2
            }

            # add initial set of jobs
            while [[ $index -lt $max_jobs ]]
            do
            add_next_job
            done

            # wait for all jobs to complete
            wait
            echo "done"


            Having said that Fredrik makes the excellent point that xargs does exactly what you want...






            share|improve this answer
























            • I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

              – 0xC0000022L
              Jul 6 '11 at 18:42











            • Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

              – Niels Basjes
              Jul 7 '11 at 20:34














            18












            18








            18







            #! /usr/bin/env bash

            set -o monitor
            # means: run background processes in a separate processes...
            trap add_next_job CHLD
            # execute add_next_job when we receive a child complete signal

            todo_array=($(find . -type f)) # places output into an array

            index=0
            max_jobs=2

            function add_next_job {
            # if still jobs to do then add one
            if [[ $index -lt ${#todo_array[*]} ]]
            # apparently stackoverflow doesn't like bash syntax
            # the hash in the if is not a comment - rather it's bash awkward way of getting its length
            then
            echo adding job ${todo_array[$index]}
            do_job ${todo_array[$index]} &
            # replace the line above with the command you want
            index=$(($index+1))
            fi
            }

            function do_job {
            echo "starting job $1"
            sleep 2
            }

            # add initial set of jobs
            while [[ $index -lt $max_jobs ]]
            do
            add_next_job
            done

            # wait for all jobs to complete
            wait
            echo "done"


            Having said that Fredrik makes the excellent point that xargs does exactly what you want...






            share|improve this answer













            #! /usr/bin/env bash

            set -o monitor
            # means: run background processes in a separate processes...
            trap add_next_job CHLD
            # execute add_next_job when we receive a child complete signal

            todo_array=($(find . -type f)) # places output into an array

            index=0
            max_jobs=2

            function add_next_job {
            # if still jobs to do then add one
            if [[ $index -lt ${#todo_array[*]} ]]
            # apparently stackoverflow doesn't like bash syntax
            # the hash in the if is not a comment - rather it's bash awkward way of getting its length
            then
            echo adding job ${todo_array[$index]}
            do_job ${todo_array[$index]} &
            # replace the line above with the command you want
            index=$(($index+1))
            fi
            }

            function do_job {
            echo "starting job $1"
            sleep 2
            }

            # add initial set of jobs
            while [[ $index -lt $max_jobs ]]
            do
            add_next_job
            done

            # wait for all jobs to complete
            wait
            echo "done"


            Having said that Fredrik makes the excellent point that xargs does exactly what you want...







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jul 6 '11 at 9:54









            DunesDunes

            23.9k44468




            23.9k44468













            • I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

              – 0xC0000022L
              Jul 6 '11 at 18:42











            • Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

              – Niels Basjes
              Jul 7 '11 at 20:34



















            • I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

              – 0xC0000022L
              Jul 6 '11 at 18:42











            • Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

              – Niels Basjes
              Jul 7 '11 at 20:34

















            I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

            – 0xC0000022L
            Jul 6 '11 at 18:42





            I now understand the code, but had to think a bit. Especially the part about why these would run in parallel (well, because they are subprocesses) eluded me. I think it would be worthwhile adding comments for that part into the code as well.

            – 0xC0000022L
            Jul 6 '11 at 18:42













            Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

            – Niels Basjes
            Jul 7 '11 at 20:34





            Although my current application works great with the xargs --max-procs I'm still giving you the credit for being "the answer" because your script is usable in more situations. Thanks.

            – Niels Basjes
            Jul 7 '11 at 20:34













            22














            I know I'm late to the party with this answer but I thought I would post an alternative that, IMHO, makes the body of the script cleaner and simpler. (Clearly you can change the values 2 & 5 to be appropriate for your scenario.)



            function max2 {
            while [ `jobs | wc -l` -ge 2 ]
            do
            sleep 5
            done
            }

            find . -type f | while read name ;
            do
            max2; some_heavy_processing_command ${name} &
            done
            wait





            share|improve this answer



















            • 2





              Dude, this works brilliantly! Thanks! :)

              – mkgrunder
              Oct 8 '13 at 19:43











            • This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

              – Jeffrey Cordero
              Jun 23 '17 at 15:07
















            22














            I know I'm late to the party with this answer but I thought I would post an alternative that, IMHO, makes the body of the script cleaner and simpler. (Clearly you can change the values 2 & 5 to be appropriate for your scenario.)



            function max2 {
            while [ `jobs | wc -l` -ge 2 ]
            do
            sleep 5
            done
            }

            find . -type f | while read name ;
            do
            max2; some_heavy_processing_command ${name} &
            done
            wait





            share|improve this answer



















            • 2





              Dude, this works brilliantly! Thanks! :)

              – mkgrunder
              Oct 8 '13 at 19:43











            • This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

              – Jeffrey Cordero
              Jun 23 '17 at 15:07














            22












            22








            22







            I know I'm late to the party with this answer but I thought I would post an alternative that, IMHO, makes the body of the script cleaner and simpler. (Clearly you can change the values 2 & 5 to be appropriate for your scenario.)



            function max2 {
            while [ `jobs | wc -l` -ge 2 ]
            do
            sleep 5
            done
            }

            find . -type f | while read name ;
            do
            max2; some_heavy_processing_command ${name} &
            done
            wait





            share|improve this answer













            I know I'm late to the party with this answer but I thought I would post an alternative that, IMHO, makes the body of the script cleaner and simpler. (Clearly you can change the values 2 & 5 to be appropriate for your scenario.)



            function max2 {
            while [ `jobs | wc -l` -ge 2 ]
            do
            sleep 5
            done
            }

            find . -type f | while read name ;
            do
            max2; some_heavy_processing_command ${name} &
            done
            wait






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 17 '13 at 20:10









            BruceHBruceH

            35327




            35327








            • 2





              Dude, this works brilliantly! Thanks! :)

              – mkgrunder
              Oct 8 '13 at 19:43











            • This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

              – Jeffrey Cordero
              Jun 23 '17 at 15:07














            • 2





              Dude, this works brilliantly! Thanks! :)

              – mkgrunder
              Oct 8 '13 at 19:43











            • This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

              – Jeffrey Cordero
              Jun 23 '17 at 15:07








            2




            2





            Dude, this works brilliantly! Thanks! :)

            – mkgrunder
            Oct 8 '13 at 19:43





            Dude, this works brilliantly! Thanks! :)

            – mkgrunder
            Oct 8 '13 at 19:43













            This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

            – Jeffrey Cordero
            Jun 23 '17 at 15:07





            This worked for me after changing the while syntax to: while [ $(jobs | wc -l) -ge 2 ]

            – Jeffrey Cordero
            Jun 23 '17 at 15:07











            9














            With GNU Parallel it becomes simpler:



            find . -type f | parallel  some_heavy_processing_command {}


            Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1






            share|improve this answer




























              9














              With GNU Parallel it becomes simpler:



              find . -type f | parallel  some_heavy_processing_command {}


              Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1






              share|improve this answer


























                9












                9








                9







                With GNU Parallel it becomes simpler:



                find . -type f | parallel  some_heavy_processing_command {}


                Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1






                share|improve this answer













                With GNU Parallel it becomes simpler:



                find . -type f | parallel  some_heavy_processing_command {}


                Learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Feb 3 '13 at 15:20









                Ole TangeOle Tange

                19.4k35567




                19.4k35567























                    4














                    I think I found a more handy solution using make:



                    #!/usr/bin/make -f

                    THIS := $(lastword $(MAKEFILE_LIST))
                    TARGETS := $(shell find . -name '*.sh' -type f)

                    .PHONY: all $(TARGETS)

                    all: $(TARGETS)

                    $(TARGETS):
                    some_heavy_processing_command $@

                    $(THIS): ; # Avoid to try to remake this makefile


                    Call it as e.g. 'test.mak', and add execute rights. If You call ./test.mak it will call the some_heavy_processing_command one-by-one. But You can call as ./test.mak -j 4, then it will run four subprocesses at once. Also You can use it on a more sophisticated way: run as ./test.mak -j 5 -l 1.5, then it will run maximum 5 sub-processes while the system load is under 1.5, but it will limit the number of processes if the system load exceeds 1.5.



                    It is more flexible than xargs, and make is part of the standard distribution, not like parallel.






                    share|improve this answer






























                      4














                      I think I found a more handy solution using make:



                      #!/usr/bin/make -f

                      THIS := $(lastword $(MAKEFILE_LIST))
                      TARGETS := $(shell find . -name '*.sh' -type f)

                      .PHONY: all $(TARGETS)

                      all: $(TARGETS)

                      $(TARGETS):
                      some_heavy_processing_command $@

                      $(THIS): ; # Avoid to try to remake this makefile


                      Call it as e.g. 'test.mak', and add execute rights. If You call ./test.mak it will call the some_heavy_processing_command one-by-one. But You can call as ./test.mak -j 4, then it will run four subprocesses at once. Also You can use it on a more sophisticated way: run as ./test.mak -j 5 -l 1.5, then it will run maximum 5 sub-processes while the system load is under 1.5, but it will limit the number of processes if the system load exceeds 1.5.



                      It is more flexible than xargs, and make is part of the standard distribution, not like parallel.






                      share|improve this answer




























                        4












                        4








                        4







                        I think I found a more handy solution using make:



                        #!/usr/bin/make -f

                        THIS := $(lastword $(MAKEFILE_LIST))
                        TARGETS := $(shell find . -name '*.sh' -type f)

                        .PHONY: all $(TARGETS)

                        all: $(TARGETS)

                        $(TARGETS):
                        some_heavy_processing_command $@

                        $(THIS): ; # Avoid to try to remake this makefile


                        Call it as e.g. 'test.mak', and add execute rights. If You call ./test.mak it will call the some_heavy_processing_command one-by-one. But You can call as ./test.mak -j 4, then it will run four subprocesses at once. Also You can use it on a more sophisticated way: run as ./test.mak -j 5 -l 1.5, then it will run maximum 5 sub-processes while the system load is under 1.5, but it will limit the number of processes if the system load exceeds 1.5.



                        It is more flexible than xargs, and make is part of the standard distribution, not like parallel.






                        share|improve this answer















                        I think I found a more handy solution using make:



                        #!/usr/bin/make -f

                        THIS := $(lastword $(MAKEFILE_LIST))
                        TARGETS := $(shell find . -name '*.sh' -type f)

                        .PHONY: all $(TARGETS)

                        all: $(TARGETS)

                        $(TARGETS):
                        some_heavy_processing_command $@

                        $(THIS): ; # Avoid to try to remake this makefile


                        Call it as e.g. 'test.mak', and add execute rights. If You call ./test.mak it will call the some_heavy_processing_command one-by-one. But You can call as ./test.mak -j 4, then it will run four subprocesses at once. Also You can use it on a more sophisticated way: run as ./test.mak -j 5 -l 1.5, then it will run maximum 5 sub-processes while the system load is under 1.5, but it will limit the number of processes if the system load exceeds 1.5.



                        It is more flexible than xargs, and make is part of the standard distribution, not like parallel.







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Nov 25 '15 at 20:01

























                        answered Jul 16 '13 at 12:38









                        TrueYTrueY

                        5,89412538




                        5,89412538























                            3














                            This code worked quite well for me.



                            I noticed one issue in which the script couldn't end.
                            If you run into a case where the script wont end due to max_jobs being greater than the number of elements in the array, the script will never quit.



                            To prevent the above scenario, I've added the following right after the "max_jobs" declaration.



                            if [ $max_jobs -gt ${#todo_array[*]} ];
                            then
                            # there are more elements found in the array than max jobs, setting max jobs to #of array elements"
                            max_jobs=${#todo_array[*]}
                            fi





                            share|improve this answer






























                              3














                              This code worked quite well for me.



                              I noticed one issue in which the script couldn't end.
                              If you run into a case where the script wont end due to max_jobs being greater than the number of elements in the array, the script will never quit.



                              To prevent the above scenario, I've added the following right after the "max_jobs" declaration.



                              if [ $max_jobs -gt ${#todo_array[*]} ];
                              then
                              # there are more elements found in the array than max jobs, setting max jobs to #of array elements"
                              max_jobs=${#todo_array[*]}
                              fi





                              share|improve this answer




























                                3












                                3








                                3







                                This code worked quite well for me.



                                I noticed one issue in which the script couldn't end.
                                If you run into a case where the script wont end due to max_jobs being greater than the number of elements in the array, the script will never quit.



                                To prevent the above scenario, I've added the following right after the "max_jobs" declaration.



                                if [ $max_jobs -gt ${#todo_array[*]} ];
                                then
                                # there are more elements found in the array than max jobs, setting max jobs to #of array elements"
                                max_jobs=${#todo_array[*]}
                                fi





                                share|improve this answer















                                This code worked quite well for me.



                                I noticed one issue in which the script couldn't end.
                                If you run into a case where the script wont end due to max_jobs being greater than the number of elements in the array, the script will never quit.



                                To prevent the above scenario, I've added the following right after the "max_jobs" declaration.



                                if [ $max_jobs -gt ${#todo_array[*]} ];
                                then
                                # there are more elements found in the array than max jobs, setting max jobs to #of array elements"
                                max_jobs=${#todo_array[*]}
                                fi






                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Feb 2 '12 at 21:28









                                animuson

                                42.5k22116130




                                42.5k22116130










                                answered Feb 2 '12 at 17:46









                                masseomasseo

                                493




                                493























                                    -1














                                    Another option:



                                    PARALLEL_MAX=...
                                    function start_job() {
                                    while [ $(ps --no-headers -o pid --ppid=$$ | wc -l) -gt $PARALLEL_MAX ]; do
                                    sleep .1 # Wait for background tasks to complete.
                                    done
                                    "$@" &
                                    }
                                    start_job some_big_command1
                                    start_job some_big_command2
                                    start_job some_big_command3
                                    start_job some_big_command4
                                    ...





                                    share|improve this answer




























                                      -1














                                      Another option:



                                      PARALLEL_MAX=...
                                      function start_job() {
                                      while [ $(ps --no-headers -o pid --ppid=$$ | wc -l) -gt $PARALLEL_MAX ]; do
                                      sleep .1 # Wait for background tasks to complete.
                                      done
                                      "$@" &
                                      }
                                      start_job some_big_command1
                                      start_job some_big_command2
                                      start_job some_big_command3
                                      start_job some_big_command4
                                      ...





                                      share|improve this answer


























                                        -1












                                        -1








                                        -1







                                        Another option:



                                        PARALLEL_MAX=...
                                        function start_job() {
                                        while [ $(ps --no-headers -o pid --ppid=$$ | wc -l) -gt $PARALLEL_MAX ]; do
                                        sleep .1 # Wait for background tasks to complete.
                                        done
                                        "$@" &
                                        }
                                        start_job some_big_command1
                                        start_job some_big_command2
                                        start_job some_big_command3
                                        start_job some_big_command4
                                        ...





                                        share|improve this answer













                                        Another option:



                                        PARALLEL_MAX=...
                                        function start_job() {
                                        while [ $(ps --no-headers -o pid --ppid=$$ | wc -l) -gt $PARALLEL_MAX ]; do
                                        sleep .1 # Wait for background tasks to complete.
                                        done
                                        "$@" &
                                        }
                                        start_job some_big_command1
                                        start_job some_big_command2
                                        start_job some_big_command3
                                        start_job some_big_command4
                                        ...






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Nov 18 '14 at 21:31









                                        Jeff KaufmanJeff Kaufman

                                        438310




                                        438310























                                            -1














                                            Here is a very good function I used to control the maximum # of jobs from bash or ksh. NOTE: the - 1 in the pgrep subtracts the wc -l subprocess.



                                            function jobmax
                                            {
                                            typeset -i MAXJOBS=$1
                                            sleep .1
                                            while (( ($(pgrep -P $$ | wc -l) - 1) >= $MAXJOBS ))
                                            do
                                            sleep .1
                                            done
                                            }

                                            nproc=5
                                            for i in {1..100}
                                            do
                                            sleep 1 &
                                            jobmax $nproc
                                            done
                                            wait # Wait for the rest





                                            share|improve this answer




























                                              -1














                                              Here is a very good function I used to control the maximum # of jobs from bash or ksh. NOTE: the - 1 in the pgrep subtracts the wc -l subprocess.



                                              function jobmax
                                              {
                                              typeset -i MAXJOBS=$1
                                              sleep .1
                                              while (( ($(pgrep -P $$ | wc -l) - 1) >= $MAXJOBS ))
                                              do
                                              sleep .1
                                              done
                                              }

                                              nproc=5
                                              for i in {1..100}
                                              do
                                              sleep 1 &
                                              jobmax $nproc
                                              done
                                              wait # Wait for the rest





                                              share|improve this answer


























                                                -1












                                                -1








                                                -1







                                                Here is a very good function I used to control the maximum # of jobs from bash or ksh. NOTE: the - 1 in the pgrep subtracts the wc -l subprocess.



                                                function jobmax
                                                {
                                                typeset -i MAXJOBS=$1
                                                sleep .1
                                                while (( ($(pgrep -P $$ | wc -l) - 1) >= $MAXJOBS ))
                                                do
                                                sleep .1
                                                done
                                                }

                                                nproc=5
                                                for i in {1..100}
                                                do
                                                sleep 1 &
                                                jobmax $nproc
                                                done
                                                wait # Wait for the rest





                                                share|improve this answer













                                                Here is a very good function I used to control the maximum # of jobs from bash or ksh. NOTE: the - 1 in the pgrep subtracts the wc -l subprocess.



                                                function jobmax
                                                {
                                                typeset -i MAXJOBS=$1
                                                sleep .1
                                                while (( ($(pgrep -P $$ | wc -l) - 1) >= $MAXJOBS ))
                                                do
                                                sleep .1
                                                done
                                                }

                                                nproc=5
                                                for i in {1..100}
                                                do
                                                sleep 1 &
                                                jobmax $nproc
                                                done
                                                wait # Wait for the rest






                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Jan 23 '15 at 0:05









                                                user2709129user2709129

                                                1




                                                1






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6593531%2frunning-a-limited-number-of-child-processes-in-parallel-in-bash%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Coverage of Google Street View

                                                    Full-time equivalent

                                                    Surfing