Compiler Optimization Sample

The sample illustrates how to use common compiler options to quickly change application performance.

See Included Samples for other samples included with the compiler.

Sample file and locations

Source

Locations

init_sin.f90

Linux* and Mac OS* X

<install-dir>/samples/optimize/

Windows*

<install-dir>\samples\optimize\

Description

The sample is a math program that integrates the absolute value of a sine curve for one cycle of 2 pi radians. The sample is used to illustrate performance differences in automatic optimization because of the relative complexity of the calculations involved.

The following figure shows the method used for calculation. The method successively adds the areas of rectangles with a height centered on the curve. As the number of rectangles increases (and the slice width decreases), the calculated area approaches four (4.0). The figure shows what is being calculated for 24 interior points and the first eight slices of a 25 interior point calculation.

Compiling without Optimizations

Establish a performance baseline by compiling the source code without enabling optimizations.

Platform

Example Commands

Linux and Mac OS X

ifort int_sin.f90 -O0

Windows

ifort int_sin.f90 /Od

The compiled program is located in the same directory as the source. Execute the program as follows:

Platform

Example Commands

Linux and Mac OS X

./a.out

Windows

int_sin.exe

The computed integral value nears or equals 4.0 for each calculation as the execution time consumed during each of the calculations generally increases with the number of interior points. The following sample output illustrates typical results:

Sample Output

   Number of     | Computed Integral |
Interior Points  |                   |
--------------------------------------
     4    |   3.1415927E+00   |
--------------------------------------
     8    |   3.7922378E+00   |
--------------------------------------
    16    |   3.9484632E+00   |
--------------------------------------
    32    |   3.9871407E+00   |
--------------------------------------
    64    |   3.9967867E+00   |
--------------------------------------
   128    |   3.9991968E+00   |
--------------------------------------
   256    |   3.9997992E+00   |
--------------------------------------
   512    |   3.9999498E+00   |
--------------------------------------
  1024    |   3.9999875E+00   |
--------------------------------------
  2048    |   3.9999969E+00   |
--------------------------------------
  4096    |   3.9999992E+00   |
--------------------------------------
  8192    |   3.9999998E+00   |
--------------------------------------
 16384    |   4.0000000E+00   |
--------------------------------------
 32768    |   4.0000000E+00   |
--------------------------------------
 65536    |   4.0000000E+00   |
--------------------------------------
131072    |   4.0000000E+00   |
--------------------------------------
262144    |   4.0000000E+00   |
--------------------------------------
524288    |   4.0000000E+00   |
--------------------------------------
1048576    |   4.0000000E+00   |
--------------------------------------
2097152    |   4.0000000E+00   |
--------------------------------------
4194304    |   4.0000000E+00   |
--------------------------------------
8388608    |   4.0000000E+00   |
--------------------------------------
16777216    |   4.0000000E+00   |
--------------------------------------
33554432    |   4.0000000E+00   |
--------------------------------------
67108864    |   4.0000000E+00   |
--------------------------------------

CPU Time = <output time>

Note the time reported in the final line of the output: CPU Time = <output time>. Use this as the number for comparing subsequent runs.

Compiling with Optimizations

The performance enhancement realized by using some of the optimization options of the compiler can be significant. Other options allow you to enhance operation or performance in different areas.

Compile the source file with the default optimization level (the example commands are equivalent):

Platform

Example Commands

Linux and Mac OS X

ifort int_sin.f90

or

ifort int_sin.f90 -O2

Windows

ifort int_sin.f90

or

ifort int_sin.f90 /O2

Execute the optimized program.

Platform

Example Commands

Linux and Mac OS X

./a.out

Windows

int_sin.exe

Compare the number of reported on the final line with the number you noted for the unoptimized program. Compile sources again using the following automatic optimization options. Use one option per compile.

Platform

Suggested Options

Linux and Mac OS X

-O1

-O3

-fast

Windows

/O1

/O3

/fast

You should notice a difference in execution times (reported as CPU Time), and in some cases executable file size, as you experiment with more, and less, aggressive optimization levels.

While the improvement in execution time (unoptimized to optimized program) in this sample application might not be typical for all programs, you should be able to improve the execution time for programs by compiling your sources using automatic optimization options.

Limitations

Some of the optimizations options shown here increase performance by enabling options that impose minimum architectural levels for the application. For example, if you specify -fast (Linux and Mac OS X) or /fast (Windows) during compilation but execute the application on an IntelŪ PentiumŪ 4 processor, the application might generate an error similar to the following:

Run-time Error

Fatal Error: This program was not built to run on the processor in your system.

The allowed processors are: Intel(R) Core(TM) Duo processors and compatible Intel processors with supplemental Streaming SIMD Extensions 3 (SSSE3) instruction support.