Intel C/C++ and FORTRAN Compilers
White Paper
The Intel C/C++ and FORTRAN Compiler plug-ins allow software
developers to gain superior performance on Intel Architecture processors.
The objective of these compilers is to allow software to take advantage
of the full potential of Intel Architecture processors. The
Intel C/C++ and FORTRAN Compiler plug-ins are the first compilers
to offer processor-specific optimizations with the introduction of each
new Intel processor generation. This allows developers
to take immediate advantage of each new processor's improvements. Support of
MMX technology is a vivid example of the Intel compilers’
evolution in parallel with the latest line of Intel Architecture processors.
Feature Summary
Intel compilers have been developed to be compatible with Microsoft* Visual C++*.
To ensure ease of use, the Intel C/C++ and FORTRAN Compiler plug-ins are usable
as first generation plug-ins to the Microsoft Developer Studio*. This
capability combines the power of Intel compilers with the features of Microsoft's
Integrated Development Environment (IDE).
The following summarizes the features for each of the compilers:
C/C++ plug-in:
- Provides MMX™ technology support through the use of C "intrinsics."
This allows C developers to take advantage of MMX Technology using C
function call syntax rather than having to manually code assembly language statements.
- Supports in-line assembly language insertions for C developers who have particular
performance demands.
- Offers profile-guided optimizations, which allows the compiler to adjust
the flow of the program to achieve optimum performance based on previous
executions with the same data set.
- Provides a "blended" code optimization switch that allows you to
generate code with optimal performance for any Intel Architecture processor.
Also, for developers with more specific targets, the compilers provide
processor-specific optimizations that maximize performance for a specific Intel processor.
- Provides maximum floating-point instruction throughput by using
the full power of the floating-point stack.
FORTRAN plug-in:
- Supports dynamic COMMON
- Supports thread-safe code generation for multi-threaded applications
System Requirements
The Intel C/C++ and FORTRAN compilers run under the
Microsoft Windows* 95 or Windows NT* operating systems.
Language Support
The Intel C/C++ Compiler is a plug-in to Microsoft Visual C++* version 4.x/5.0,
which provides the development and run-time environments plus the MFC* libraries. The Intel
FORTRAN Compiler is a plug-in to only the Visual C++ version 4.x IDE.
NOTE: The Intel C/C++ Compiler Plug-in (version 2.4) will operate
with either Visual C++ 4.x or Visual C++ 5.0. However, some new features of Visual C++ 5.0, such as native COM, are
not supported in this version of the Intel C/C++ Compiler.
The following FORTRAN languages and extensions are supported:
- Full support for ANSI FORTRAN 77 (X3.9-1978) and ISO 1539:1980
- Many extensions popularized by DEC* (VMS*)
Microsoft Visual C/C++ 4.x/5.0 Compatibility
The Intel C/C++ Compiler is compatible with Microsoft Visual C++ 4.x/5.0 in
the following areas:
- Compilation switches
- Makefile support
- In-line assembly language syntax
- Object module, library, and DLL formats
- Debug and symbol formats
If you have MSVC++ 4.x/5.0 on your system when you install the Intel C/C++ Compiler,
the installation procedure automatically integrates the Intel C/C++ Compiler
within the tools menu of the Visual C++ IDE. This gives you the
choice of using the Intel C/C++ Compiler to compile the projects that you
create in Visual C/C++. Just click on "Tools," then click on "Select Compiler"
and the selection window provided by Intel appears.
Application Support
The Intel C/C++ Compiler plug-in is particularly efficient in support of the
applications described in the sections that follow.
Graphics / Multimedia Applications
MMX technology adds 57 powerful new assembly instructions to the Intel
Architecture instruction set which are designed
to efficiently manipulate and process video, audio, and graphical data.
The Pentium® and Pentium Pro processors with MMX technology include these
new instructions to enhance performance of multimedia applications.
The Intel C/C++ Compiler plug-in supports these new MMX instructions in C/C++
programs by using special compiler intrinsics that are coded using C
function call syntax.
The compiler allows you to use C language variables in place of hardware
registers, which frees you from managing these registers.
The compiler generates the corresponding MMX instructions and reorders
them to maximize performance through the Pentium processor’s dual
instruction pipeline. In addition, the compiler also handles the
loads and stores of the C variables to and from memory. Here is
an example of an Intel C Compiler intrinsic and a description of
its function:
_m64 _m_pmaddwd (__m64 m1, __m64 m2)
This intrinsic multiplies four 16-bit values in m1
by four 16-bit values in m2 to produce four
32-bit intermediate results, which are then summed by pairs to
produce two 32-bit results.
The Intel C/C++ Compiler plug-in also provides a
rounding control option, which optimizes floating-point-to-integer
conversions. The system default rounding mode is round-to-nearest.
Because the C language requires that floating-point-to-integer
conversions be truncated, the compiler must generate additional
instructions to change the rounding mode to truncation before each
floating-point instruction and then change it back afterwards.
With the -Qrcd switch you can optimize
your code by eliminating the additional overhead of instructions
required to change the rounding mode back and forth. This option
has no effect on floating-point calculations, but conversions to
integer will not conform to C semantics. Graphics applications
that use floating-point data as input into their rendering operations
can benefit from this type of optimization.
Consider the following example:
int a;
float f;
void func()
{
a = f;
}
The following is the standard code generation that would take place:
| fld | | DWORD PTR _f[0+eax*4] |
| fnstcw | | [esp+24] |
| mov | | DWORD PTR [esp+20], eax |
| mov | | eax, DWORD PTR [esp+24] |
| or | | eax, 3072 |
| mov | | DWORD PTR [esp+16], eax |
| mov | | eax, DWORD PTR [esp+20] |
| fldcw | | [esp+16] |
| fistp | | DWORD PTR _a[0+eax*4] |
| fldcw | | [esp+24] |
Notice that it takes ten instructions to complete this function.
Using the rounding control option -Qrcd, the
optimized code looks like this:
| fld | | DWORD PTR _f[0+eax*4] |
| fistp | | DWORD PTR _a[0+eax*4] |
You can see that it takes only two instructions to complete the function.
This has reduced the total number of instructions by 80%.
Scientific / Engineering Applications
The Intel C/C++ Compiler plug-in provides analysis for interprocedural optimizations
to assist you with programs that contain many small or medium-sized frequently
used functions, especially for programs that contain calls within loops.
Potential optimizations around calling points are normally inhibited due
to a lack of information about what happens in the called procedure.
Interprocedural analysis examines the relationship between calling and
called procedures and enables the following optimizations:
- Function inlining
- Passing arguments in registers
- Interprocedural constant propagation
In addition, the Intel C/C++ Compiler plug-in exploits the use of the floating-point (FP)
stack by implementing code generation optimizations that allow FP instructions
to execute more efficiently. Most floating-point operations require that one
operand and the result use the top of stack. This makes each FP instruction
dependent on the previous one and inhibits overlapping the instructions.
The compiler breaks this dependency by allowing a program to arrange for
one of the inputs for the next operation to always be at the top of stack.
It provides this capability by effective use of the FXCH
instruction, which comes at almost no additional cost on the
Pentium® Pro processor.
Consider the following expression:
a = ((b + c) * b) + ((d + e) * d);.
This expression can be presented graphically as follows:
Serial Instruction Sequence | | Parallel Instruction Sequence |
The serial instruction sequence depicts instructions executed one
at a time with no overlapping because of the top-of-stack dependency.
The parallel instruction sequence uses the FXCH instruction that
provides the following gains:
- Overlapping instructions that can put their calculation results
in any stack register, not necessarily to the top of the stack, but
different stack registers
- More parallelism achieved
Database Server Applications
The Intel C/C++ Compiler plug-in has been proven to assist large database
applications through combination of interprocedural analysis and profile-guided
optimization. Profile-guided optimization provides detailed
information on program execution. Therefore, you can optimize
performance-critical areas of large applications where the
execution time is mostly spent. Profile-guided optimizations
can help eliminate instruction cache thrashing by reorganizing
code layout, shrinking code size, and reducing branch mispredictions.
Information collected during program execution can be fed back into
the compiler to allow a higher degree of optimization. For example,
profile-guided optimization might find that a particular section of
code is rarely executed. This code would then be moved to the end
of the module resulting in the processor fetching instructions more
efficiently. The following are the three phases of profile-guided
optimization that, when completed, provide the data that can significantly
improve the performance of large applications:
Phase 1: Instrumentation Compilation | The compiler
inserts code into your program to produce profile information. The resulting
code is said to be instrumented by the compiler.
|
Phase 2: Instrumented Execution | When you execute the
instrumented program, it creates a dynamic information file. This file
contains data that represents the actual behavior of the program
during execution. |
Phase 3: Feedback Compilation | When you compile your
program a second time, the compiler uses the data in the dynamic information
file to help optimize your program. This data helps the compiler determine
the most heavily traveled paths through the program and optimizes along
these paths. You can use additional optimization switches during this
phase so that other compilation optimization routines can also benefit
from the dynamic information. |
Application Optimizations Summary
The following table summarizes the optimizations that the compiler
applies to your program for each optimization switch. The entry "any"
in the Option column means that the compiler automatically performs
this optimization, even when optimizations are disabled.
-
Optimization |
Affected Aspect of the Program |
Option |
optimized code selection |
instruction selection / addressing modes |
any |
global register allocation |
register use |
-O1 / -O2 |
instruction scheduling |
instruction reordering |
-O1 / -O2 |
register variable detection |
register use |
-O1 / -O2 |
common subexpression elimination |
constants and expression evaluation |
-O1 / -O2 |
dead-code elimination |
instruction sequencing |
-O1 / -O2 |
variable renaming |
register use |
-O1 / -O2 |
loop-invariant code movement |
instruction sequencing |
-O1 / -O2 |
copy propagation |
constants and expression evaluation |
-O1 / -O2 |
constant propagation |
constants and expression evaluation |
-O1 / -O2 |
strength reduction/induction variable simplification |
instruction selection/sequencing constants and expression evaluation |
-O1 / -O2 |
tail recursion elimination |
calls, further optimization |
-O1 / -O2 |
in-line function expansion |
calls, jumps, branches, and loops |
-Qip / -Qipo |
interprocedural constant propagation |
arguments, global variables, and return values |
-Qip / -Qipo |
passing arguments in registers |
calls, register usage |
-Qip / -Qipo |
monitoring module-level static variables |
further optimizations, loop invariant code |
-Qip / -Qipo |
multifile optimization |
affects the same aspects as -Qip, but across multiple files |
-Qipo |
Future Enhancements
The following list summarizes the enhancements expected to be added
to forthcoming releases of the Intel compiler products:
- Support for FORTRAN 90 / FORTRAN 95 / MIL-STD 1793
- Many extensions popularized by Cray*, IBM*, Sun*, and Microsoft*
- Multi-threading support (including SGI*-compatible SMP directives)
- Improved optimizer that requires less memory and runs faster
- Automatic MMX ™ technology code generation for vector operations
- Global pointer tracking for improved alias detection
- Improved dependence analysis for threading and loop transformations
- Optimizations in presence of exception handling
- Enhanced code and data layout optimizations to improve cache efficiency
- Code coverage tool with a Graphical User Interface (GUI)
- Interprocedural pointer analysis that includes knowledge of library functions
Conclusion
Intel is dedicated to providing a suite of software performance
products to assist developers with creating the most powerful
applications that run on Intel Architecture processors. The Intel
C/C++ and FORTRAN compilers make up a part of this suite,
and as Intel’s microprocessor technology evolves, our advanced
compiler technology will be right there alongside our newest
high-performance processors to let you benefit from every
performance gain.
|