OpenMP* Sample

The OpenMP* sample illustrates how create and compile multithreaded applications.

See Included Samples for other samples included with the compiler.

Sample files and locations

Source

Locations

openmp_sample.f90

Linux* and Mac OS* X

<install-dir>/samples/openmp_samples/

Windows*

<install-dir>\samples\openmp_samples\

Description

This sample illustrates combining compiler options and OpenMP* directives to compile and run multi- and single-threaded executables. The sample generates a multithread executable when you add the -openmp (Linux and Mac OS X) or /Qopenmp (Windows) compiler option to the compilation command. Without that command the same source code results in a single-threaded executable.

The code in this sample finds all primes in the first 10,000,000 integers, the number of 4n+1 primes, and the number of 4n-1 primes in the same range.

This sample illustrates using OpenMP directives to help increase performance. In the first instance, the sample uses schedule with the OpenMP for directive. schedule improves performance because in this example the workload in the for loop increases as the index gets bigger; the default static scheduling does not work well under these conditions. Instead dynamic scheduling is used to account for the increasing workload. There are tradeoffs in using this method. Dynamic scheduling has more overhead than static scheduling, so a "chunk size" of 10 is used to reduce the overhead for dynamic scheduling.  

In the second case, the reduction clause is used intead of an OpenMP critical directive to eliminate lock overhead. The critical directive would cause excessive lock overhead due to the one-thread-at-time update of the shared variables each time through the for loop.  Instead the reduction clause causes only one update of the shared variables once at the end of the loop.

Compile the sample as a multithreaded application

To make proper use of the directives the preprocessor must be enabled; include the -fpp (Linux and Mac OS X) or /Qfpp (Windows) option.

Platform

Commands

Linux and Mac OS X

ifort -openmp -fpp openmp_sample.f90

Windows

ifort /Qopenmp /Qfpp openmp_sample.f90

The compiler generates status messages informing you which defined loops or regions were parallelized. The following examples illustrate typical messages. (The Windows status messages include the linker phase and the path to the source.)

Platform

Status Messages

Linux

openmp_sample.f90(77): (col. 7) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.

openmp_sample.f90(68): (col. 7) remark: OpenMP DEFINED REGION WAS PARALLELIZED.

Windows

C:\openmp_sample.f90(77): (col. 7) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.

C:\openmp_sample.f90(68): (col. 7) remark: OpenMP DEFINED REGION WAS PARALLELIZED.

Microsoft (R) Incremental Linker Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:openmp_sample.exe

Run the multithreaded executable.

Platform

Commands

Linux and Mac OS X

./a.out

Windows

openmp_sample

The multithreaded executable should generate results similar to the following.

Sample Output

Range to check for Primes:           1    10000000

We are using           2  thread(s)

Number of primes found:      664579

Number of 4n+1 primes found:      332181

Number of 4n-1 primes found:      332398

Note the number of threads reported; at least two threads should have been used.

Linux and Mac OS X: If you get an error similar to Bad CPU type in executable the most likely cause is that you did not source the ifortvars.sh file before compiling.

Compile the sample as a single-threaded application

Delete the executable created earlier, and enter the following compilation command. Notice the -openmp (Linux and Mac OS X) or /Qopenmp (Windows) option is not included.

Caution

Linux and Mac OS X: If you've closed the session since the last time you set the stack size, you must set the stack size again.

Platform

Commands

Linux and Mac OS X

ifort -fpp openmp_sample.f90

Windows

ifort /Qfpp openmp_sample.f90

Notice that the compiler does not generate messages about parallelized loops or regions. OpenMP support was disabled.

Run the single-threaded executable.

Platform

Commands

Linux and Mac OS X

./a.out

Windows

openmp_sample

The executable should generate results similar to the following.

Sample Output

Range to check for Primes:           1    10000000

We are using           1  thread(s)

Number of primes found:      664579

Number of 4n+1 primes found:      332181

Number of 4n-1 primes found:      332398

Notice that only one thread was used.