The OpenMP* sample illustrates how create and compile multithreaded applications.
See Included Samples for other samples included with the compiler.
Source |
Locations | ||||
---|---|---|---|---|---|
openmp_sample.f90 |
|
This sample illustrates combining compiler options and OpenMP* directives to compile and run multi- and single-threaded executables. The sample generates a multithread executable when you add the -openmp (Linux and Mac OS X) or /Qopenmp (Windows) compiler option to the compilation command. Without that command the same source code results in a single-threaded executable.
The code in this sample finds all primes in the first 10,000,000 integers, the number of 4n+1 primes, and the number of 4n-1 primes in the same range.
This sample illustrates using OpenMP directives to help increase performance. In the first instance, the sample uses schedule with the OpenMP for directive. schedule improves performance because in this example the workload in the for loop increases as the index gets bigger; the default static scheduling does not work well under these conditions. Instead dynamic scheduling is used to account for the increasing workload. There are tradeoffs in using this method. Dynamic scheduling has more overhead than static scheduling, so a "chunk size" of 10 is used to reduce the overhead for dynamic scheduling.
In the second case, the reduction clause is used intead of an OpenMP critical directive to eliminate lock overhead. The critical directive would cause excessive lock overhead due to the one-thread-at-time update of the shared variables each time through the for loop. Instead the reduction clause causes only one update of the shared variables once at the end of the loop.
To make proper use of the directives the preprocessor must be enabled; include the -fpp (Linux and Mac OS X) or /Qfpp (Windows) option.
Platform |
Commands |
---|---|
Linux and Mac OS X |
ifort -openmp -fpp openmp_sample.f90 |
Windows |
ifort /Qopenmp /Qfpp openmp_sample.f90 |
The compiler generates status messages informing you which defined loops or regions were parallelized. The following examples illustrate typical messages. (The Windows status messages include the linker phase and the path to the source.)
Platform |
Status Messages |
---|---|
Linux |
openmp_sample.f90(77): (col. 7) remark: OpenMP DEFINED LOOP WAS PARALLELIZED. openmp_sample.f90(68): (col. 7) remark: OpenMP DEFINED REGION WAS PARALLELIZED. |
Windows |
C:\openmp_sample.f90(77): (col. 7) remark: OpenMP DEFINED LOOP WAS PARALLELIZED. C:\openmp_sample.f90(68): (col. 7) remark: OpenMP DEFINED REGION WAS PARALLELIZED. Microsoft (R) Incremental Linker Version 8.00.50727.42 -out:openmp_sample.exe |
Run the multithreaded executable.
Platform |
Commands |
---|---|
Linux and Mac OS X |
./a.out |
Windows |
openmp_sample |
The multithreaded executable should generate results similar to the following.
Sample Output |
---|
Range to check for Primes: 1 10000000 We are using 2 thread(s) Number of primes found: 664579 Number of 4n+1 primes found: 332181 Number of 4n-1 primes found: 332398 |
Note the number of threads reported; at least two threads should have been used.
Linux and Mac OS X: If you get an error similar to Bad CPU type in executable the most likely cause is that you did not source the ifortvars.sh file before compiling.
Delete the executable created earlier, and enter the following compilation command. Notice the -openmp (Linux and Mac OS X) or /Qopenmp (Windows) option is not included.
Linux and Mac OS X: If you've closed the session since the last time you set the stack size, you must set the stack size again.
Platform |
Commands |
---|---|
Linux and Mac OS X |
ifort -fpp openmp_sample.f90 |
Windows |
ifort /Qfpp openmp_sample.f90 |
Notice that the compiler does not generate messages about parallelized loops or regions. OpenMP support was disabled.
Run the single-threaded executable.
Platform |
Commands |
---|---|
Linux and Mac OS X |
./a.out |
Windows |
openmp_sample |
The executable should generate results similar to the following.
Sample Output |
---|
Range to check for Primes: 1 10000000 We are using 1 thread(s) Number of primes found: 664579 Number of 4n+1 primes found: 332181 Number of 4n-1 primes found: 332398 |
Notice that only one thread was used.