Calculate Pi algorithm from a tutorial using OpenMP
up vote
1
down vote
favorite
I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:
static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;
for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum;
}
I can not use, up to this point, #pragma parallel for. I can only use:
#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();
My implementation looks like this :
#define NUM_STEPS 800
int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;
double sum[num_steps];
for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}
omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}
double totalSum = 0;
for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}
pi = step * totalSum;
printf("Pi: %.5f", pi);
}
Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?
Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.
c parallel-processing openmp
add a comment |
up vote
1
down vote
favorite
I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:
static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;
for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum;
}
I can not use, up to this point, #pragma parallel for. I can only use:
#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();
My implementation looks like this :
#define NUM_STEPS 800
int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;
double sum[num_steps];
for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}
omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}
double totalSum = 0;
for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}
pi = step * totalSum;
printf("Pi: %.5f", pi);
}
Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?
Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.
c parallel-processing openmp
Are you allowed to use#pragma omp atomic
?
– Increasingly Idiotic
Nov 11 at 2:52
@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57
This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11
@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:
static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;
for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum;
}
I can not use, up to this point, #pragma parallel for. I can only use:
#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();
My implementation looks like this :
#define NUM_STEPS 800
int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;
double sum[num_steps];
for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}
omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}
double totalSum = 0;
for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}
pi = step * totalSum;
printf("Pi: %.5f", pi);
}
Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?
Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.
c parallel-processing openmp
I am studying this tutorial about OpenMP and I came across this exercise, on page 19. It is a pi calculation algorithm which I have to parallelize:
static long num_steps = 100000;
double step;
void main ()
{
int i;
double x, pi
double sum = 0.0;
step = 1.0 / (double)num_steps;
for(i = 0; i < num_steps; i++)
{
x = (I + 0.5) * step;
sum = sum + 4.0 / (1.0 + x*x);
}
pi = step * sum;
}
I can not use, up to this point, #pragma parallel for. I can only use:
#pragma omp parallel {}
omp_get_thread_num();
omp_set_num_threads(int);
omp_get_num_threads();
My implementation looks like this :
#define NUM_STEPS 800
int main(int argc, char **argv)
{
int num_steps = NUM_STEPS;
int i;
double x;
double pi;
double step = 1.0 / (double)num_steps;
double sum[num_steps];
for(i = 0; i < num_steps; i++)
{
sum[i] = 0;
}
omp_set_num_threads(num_steps);
#pragma omp parallel
{
x = (omp_get_thread_num() + 0.5) * step;
sum[omp_get_thread_num()] += 4.0 / (1.0 + x * x);
}
double totalSum = 0;
for(i = 0; i < num_steps; i++)
{
totalSum += sum[i];
}
pi = step * totalSum;
printf("Pi: %.5f", pi);
}
Ignoring the problem by using an sum array (It explains later that it needs to define a critical section for the sum value with #pragma omp critical or #pragma omp atomic), the above impelentation only works for a limited number of threads (800 in my case), where the serial code uses 100000 steps. Is there a way to achieve this with only the aforementioned OpenMP commands, or am I obliged to use #pragma omp parallel for, which hasn't been mentioned yet in the tutorial?
Thanks a lot for your time, I am really trying to grasp the concept of parallelization in C using OpenMP.
c parallel-processing openmp
c parallel-processing openmp
asked Nov 11 at 1:01
Vector Sigma
1084
1084
Are you allowed to use#pragma omp atomic
?
– Increasingly Idiotic
Nov 11 at 2:52
@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57
This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11
@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11
add a comment |
Are you allowed to use#pragma omp atomic
?
– Increasingly Idiotic
Nov 11 at 2:52
@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57
This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11
@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11
Are you allowed to use
#pragma omp atomic
?– Increasingly Idiotic
Nov 11 at 2:52
Are you allowed to use
#pragma omp atomic
?– Increasingly Idiotic
Nov 11 at 2:52
@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57
@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57
This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11
This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11
@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11
@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.
The most simple way is to do something like:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
for (int i = tid; i < num_steps; i += n_threads) {
// ...
}
This way the work is split across all threads regardless of the number of threads.
If there were 3 threads and 9 steps:
- Thread 0 would do steps 0, 3, 6
- Thread 1 would do steps 1, 4, 7
- Thread 2 would do steps 2, 5, 8
This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.
In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;
for (int i = start; i < end; i++) {
// ...
}
Now the 3 threads performing 9 steps looks like:
- Thread 0 does steps 0, 1, 2
- Thread 1 does steps 3, 4, 5
- Thread 2 does steps 6, 7, 8
This approach is actually what is most likely happening when #pragma omp for
is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.
So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.
Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.
The most simple way is to do something like:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
for (int i = tid; i < num_steps; i += n_threads) {
// ...
}
This way the work is split across all threads regardless of the number of threads.
If there were 3 threads and 9 steps:
- Thread 0 would do steps 0, 3, 6
- Thread 1 would do steps 1, 4, 7
- Thread 2 would do steps 2, 5, 8
This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.
In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;
for (int i = start; i < end; i++) {
// ...
}
Now the 3 threads performing 9 steps looks like:
- Thread 0 does steps 0, 1, 2
- Thread 1 does steps 3, 4, 5
- Thread 2 does steps 6, 7, 8
This approach is actually what is most likely happening when #pragma omp for
is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.
So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.
Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
add a comment |
up vote
1
down vote
accepted
You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.
The most simple way is to do something like:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
for (int i = tid; i < num_steps; i += n_threads) {
// ...
}
This way the work is split across all threads regardless of the number of threads.
If there were 3 threads and 9 steps:
- Thread 0 would do steps 0, 3, 6
- Thread 1 would do steps 1, 4, 7
- Thread 2 would do steps 2, 5, 8
This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.
In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;
for (int i = start; i < end; i++) {
// ...
}
Now the 3 threads performing 9 steps looks like:
- Thread 0 does steps 0, 1, 2
- Thread 1 does steps 3, 4, 5
- Thread 2 does steps 6, 7, 8
This approach is actually what is most likely happening when #pragma omp for
is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.
So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.
Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.
The most simple way is to do something like:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
for (int i = tid; i < num_steps; i += n_threads) {
// ...
}
This way the work is split across all threads regardless of the number of threads.
If there were 3 threads and 9 steps:
- Thread 0 would do steps 0, 3, 6
- Thread 1 would do steps 1, 4, 7
- Thread 2 would do steps 2, 5, 8
This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.
In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;
for (int i = start; i < end; i++) {
// ...
}
Now the 3 threads performing 9 steps looks like:
- Thread 0 does steps 0, 1, 2
- Thread 1 does steps 3, 4, 5
- Thread 2 does steps 6, 7, 8
This approach is actually what is most likely happening when #pragma omp for
is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.
So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.
Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.
You will need to find a way to make your parallel algorithm somewhat independent from the number of threads.
The most simple way is to do something like:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
for (int i = tid; i < num_steps; i += n_threads) {
// ...
}
This way the work is split across all threads regardless of the number of threads.
If there were 3 threads and 9 steps:
- Thread 0 would do steps 0, 3, 6
- Thread 1 would do steps 1, 4, 7
- Thread 2 would do steps 2, 5, 8
This works but isn't ideal if each thread is accessing data from some shared array. It is better if threads access sections of data nearby for locality purposes.
In that case you can divide the number of steps by the number of threads and give each thread a contiguous set of tasks like so:
int tid = omp_get_thread_num();
int n_threads = omp_get_num_threads();
int steps_per_thread = num_steps / n_threads;
int start = tid * steps_per_thread;
int end = start + steps_per_thread;
for (int i = start; i < end; i++) {
// ...
}
Now the 3 threads performing 9 steps looks like:
- Thread 0 does steps 0, 1, 2
- Thread 1 does steps 3, 4, 5
- Thread 2 does steps 6, 7, 8
This approach is actually what is most likely happening when #pragma omp for
is used. In most cases the compiler just divides the tasks according to the number of threads and assigns each thread a section.
So given a set of 2 threads and a 100 iteration for loop, the compiler would likely give iterations 0-49 to thread 0 and iterations 50-99 to thread 1.
Note that if the number of iterations does not divide evenly by the number of threads the remainder needs to be handled explicitly.
answered Nov 11 at 5:53
Increasingly Idiotic
2,1282829
2,1282829
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
add a comment |
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
Thank you, that's a very useful answer since it helps me to clarify the "inner workings" of parallelization regarding functionalities such as #pragma omp parallel for.
– Vector Sigma
Nov 11 at 22:09
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244922%2fcalculate-pi-algorithm-from-a-tutorial-using-openmp%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you allowed to use
#pragma omp atomic
?– Increasingly Idiotic
Nov 11 at 2:52
@IncreasinglyIdiotic It is explained later the usefulness of #pragma omp atomic, but only regarding sum value. How can I use it to solve the "too many threads" issue? Is there a way, without having to use parallel for?
– Vector Sigma
Nov 11 at 2:57
This tutorial is regularly sending confused learners to StackOverflow. I suggest you look for a learning material that follows a more idiomatic high-level approach instead of explaining OpenMP bottom-up. Maybe it works if you are getting a live-workshop, but it definitely does not when reading/watching the material online.
– Zulan
Nov 11 at 7:11
@Zulan [ad]Stackoverflow: turning confusion to knowledge since 2008 :-D[/ad]
– Vector Sigma
Nov 11 at 22:11