NAG Logo
Numerical Algorithms Group

Performance Tips for NAGWare f95

General Performance Tips

  • Use -O3 or -O4 instead of just -O. This will lengthen compile time (sometimes substantially with -O4), but runtime performance is usually improved.

  • If you use assumed-shape arrays and you know that the actual arguments are always contiguous (i.e. you do not pass array slices using section notation), use -Oassumed=always_contig. With this option, a runtime error occurs if a non-contiguous actual argument is detected (so it is also useful for discovering whether you use such array sections).

    If you are not 100% sure, but you think that this is true all or almost all of the time, use -Oassumed. With this option, non-contiguous actual arguments will be accepted though access to them will be slow.

Specific Platforms

Performance tips for Intel Linux

  • -Wc,-malign-double
    This compiler option may provide a worthwhile speed up on this platform. However it may also have some pitfalls.

    It may give incorrect results when either common blocks or derived types have double precision entities following an odd number of single precision entities.
    E.g.
      COMMON/c/x(3),d
      INTEGER x
      DOUBLE PRECISION d
    
    or
      TYPE t
        DOUBLE PRECISION value1
        LOGICAL flag
        DOUBLE PRECISION value2
      END TYPE
    
    You can often avoid these problems by ensuring that double precision entities are at the beginning of common blocks and structures, e.g.
      COMMON/c/d,x(3)
    
    and
      TYPE t
        DOUBLE PRECISION value1,value2
        LOGICAL flag
      END TYPE
    
    But, if your code does not use common blocks or derived types with the above pitfalls, a good speed up may be expected on many programs.

Performance tips for DEC Alpha running Unix

  • -ieee=nonstd
    This typically speeds up an application by a factor of three at the cost of losing IEEE gradual underflow. Speed-ups of more than a factor of 100 (that is not a typo!) have been seen in some cases.

  • -Ounsafe
    This option increases speed yet further over -ieee=nonstd. However, some numerically unsafe optimisations are done, and floating-point exceptions are sometimes reported later than expected.

Performance tips for IBM Risc System 6000

  • -ieee=full
    The floating-point hardware on the RS/6000 is much slower when floating-point traps are enabled. By default, these traps are enabled by NAGWare f95 (because it greatly eases debugging); by using -ieee=full floating-point operations run several times faster.

Performance tips for Sun SPARC running Solaris

  • -ieee=nonstd
    If your application makes significant use of denormalised numbers, but does not rely on them for accurate results, this option can improve performance substantially. (This is not true of all SPARC processors; the switch is only important if a significant fraction of execution time is "system" rather than "user" time).