Calculate Pi algorithm from a tutorial using OpenMP











up vote
1
down vote

favorite












I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:



static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;

for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}

pi = step * sum;
}


I can not use, up to this point, #pragma parallel for. I can only use:



#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();


My implementation looks like this :



#define NUM_STEPS 800

int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;

double sum[num_steps];

for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}

omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}

double totalSum = 0;

for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}

pi = step * totalSum;

printf("Pi: %.5f", pi);
}


Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?



Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.










share|improve this question






















  • Are you allowed to use #pragma omp atomic?
    – Increasingly Idiotic
    Nov 11 at 2:52










  • @IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
    – Vector Sigma
    Nov 11 at 2:57










  • This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
    – Zulan
    Nov 11 at 7:11










  • @Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
    – Vector Sigma
    Nov 11 at 22:11















up vote
1
down vote

favorite












I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:



static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;

for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}

pi = step * sum;
}


I can not use, up to this point, #pragma parallel for. I can only use:



#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();


My implementation looks like this :



#define NUM_STEPS 800

int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;

double sum[num_steps];

for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}

omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}

double totalSum = 0;

for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}

pi = step * totalSum;

printf("Pi: %.5f", pi);
}


Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?



Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.










share|improve this question






















  • Are you allowed to use #pragma omp atomic?
    – Increasingly Idiotic
    Nov 11 at 2:52










  • @IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
    – Vector Sigma
    Nov 11 at 2:57










  • This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
    – Zulan
    Nov 11 at 7:11










  • @Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
    – Vector Sigma
    Nov 11 at 22:11













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:



static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;

for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}

pi = step * sum;
}


I can not use, up to this point, #pragma parallel for. I can only use:



#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();


My implementation looks like this :



#define NUM_STEPS 800

int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;

double sum[num_steps];

for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}

omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}

double totalSum = 0;

for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}

pi = step * totalSum;

printf("Pi: %.5f", pi);
}


Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?



Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.










share|improve this question













I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:



static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;

for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}

pi = step * sum;
}


I can not use, up to this point, #pragma parallel for. I can only use:



#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();


My implementation looks like this :



#define NUM_STEPS 800

int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;

double sum[num_steps];

for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}

omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}

double totalSum = 0;

for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}

pi = step * totalSum;

printf("Pi: %.5f", pi);
}


Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?



Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.







c parallel-processing openmp






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 at 1:01









Vector Sigma

1084




1084












  • Are you allowed to use #pragma omp atomic?
    – Increasingly Idiotic
    Nov 11 at 2:52










  • @IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
    – Vector Sigma
    Nov 11 at 2:57










  • This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
    – Zulan
    Nov 11 at 7:11










  • @Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
    – Vector Sigma
    Nov 11 at 22:11


















  • Are you allowed to use #pragma omp atomic?
    – Increasingly Idiotic
    Nov 11 at 2:52










  • @IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
    – Vector Sigma
    Nov 11 at 2:57










  • This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
    – Zulan
    Nov 11 at 7:11










  • @Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
    – Vector Sigma
    Nov 11 at 22:11
















Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52




Are you allowed to use #pragma omp atomic?
– Increasingly Idiotic
Nov 11 at 2:52












@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57




@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57












This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11




This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11












@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11




@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.



The most simple way is to do something like:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) {
// ...
}


This way the work is split across all threads regardless of the number of threads.



If there were 3 threads and 9 steps:




  • Thread 0 would do steps 0, 3, 6

  • Thread 1 would do steps 1, 4, 7

  • Thread 2 would do steps 2, 5, 8


This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.



In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) {
// ...
}


Now the 3 threads performing 9 steps looks like:




  • Thread 0 does steps 0, 1, 2

  • Thread 1 does steps 3, 4, 5

  • Thread 2 does steps 6, 7, 8


This approach is actually what is most likely happening when #pragma omp for is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.



So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.



Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.






share|improve this answer





















  • Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
    – Vector Sigma
    Nov 11 at 22:09











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244922%2fcalculate-pi-algorithm-from-a-tutorial-using-openmp%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.



The most simple way is to do something like:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) {
// ...
}


This way the work is split across all threads regardless of the number of threads.



If there were 3 threads and 9 steps:




  • Thread 0 would do steps 0, 3, 6

  • Thread 1 would do steps 1, 4, 7

  • Thread 2 would do steps 2, 5, 8


This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.



In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) {
// ...
}


Now the 3 threads performing 9 steps looks like:




  • Thread 0 does steps 0, 1, 2

  • Thread 1 does steps 3, 4, 5

  • Thread 2 does steps 6, 7, 8


This approach is actually what is most likely happening when #pragma omp for is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.



So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.



Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.






share|improve this answer





















  • Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
    – Vector Sigma
    Nov 11 at 22:09















up vote
1
down vote



accepted










You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.



The most simple way is to do something like:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) {
// ...
}


This way the work is split across all threads regardless of the number of threads.



If there were 3 threads and 9 steps:




  • Thread 0 would do steps 0, 3, 6

  • Thread 1 would do steps 1, 4, 7

  • Thread 2 would do steps 2, 5, 8


This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.



In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) {
// ...
}


Now the 3 threads performing 9 steps looks like:




  • Thread 0 does steps 0, 1, 2

  • Thread 1 does steps 3, 4, 5

  • Thread 2 does steps 6, 7, 8


This approach is actually what is most likely happening when #pragma omp for is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.



So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.



Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.






share|improve this answer





















  • Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
    – Vector Sigma
    Nov 11 at 22:09













up vote
1
down vote



accepted







up vote
1
down vote



accepted






You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.



The most simple way is to do something like:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) {
// ...
}


This way the work is split across all threads regardless of the number of threads.



If there were 3 threads and 9 steps:




  • Thread 0 would do steps 0, 3, 6

  • Thread 1 would do steps 1, 4, 7

  • Thread 2 would do steps 2, 5, 8


This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.



In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) {
// ...
}


Now the 3 threads performing 9 steps looks like:




  • Thread 0 does steps 0, 1, 2

  • Thread 1 does steps 3, 4, 5

  • Thread 2 does steps 6, 7, 8


This approach is actually what is most likely happening when #pragma omp for is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.



So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.



Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.






share|improve this answer












You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.



The most simple way is to do something like:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

for (int i = tid; i < num_steps; i += n_threads) {
// ...
}


This way the work is split across all threads regardless of the number of threads.



If there were 3 threads and 9 steps:




  • Thread 0 would do steps 0, 3, 6

  • Thread 1 would do steps 1, 4, 7

  • Thread 2 would do steps 2, 5, 8


This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.



In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:



int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();

int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;

for (int i = start; i < end; i++) {
// ...
}


Now the 3 threads performing 9 steps looks like:




  • Thread 0 does steps 0, 1, 2

  • Thread 1 does steps 3, 4, 5

  • Thread 2 does steps 6, 7, 8


This approach is actually what is most likely happening when #pragma omp for is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.



So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.



Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 11 at 5:53









Increasingly Idiotic

2,1282829




2,1282829












  • Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
    – Vector Sigma
    Nov 11 at 22:09


















  • Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
    – Vector Sigma
    Nov 11 at 22:09
















Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09




Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244922%2fcalculate-pi-algorithm-from-a-tutorial-using-openmp%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Full-time equivalent

Bicuculline

さくらももこ