Legal notice   Contact   Internals   Search 
System
Status
News /
Events
Support /
Documentation
Accounts /
Projects
Organisation Public
Relations
 

The HLRN Quickstart Guide

Chapter 7. Running Parallel Programs

Bernd Kallies(1)
Revision History:
Revision 2.3, Published 2006/10/27 10:45:03 (UTC) by Bernd Kallies

Table of Contents

7.1. Introduction
7.2. MPI Applications
7.3. SMP Applications
7.4. Further Reading

7.1. Introduction

Parallelized applications use at least one of the parallel programming paradigms, namely message passing (MPI, LAPI, PVM), or shared-memory parallel (SMP). Depending on which paradigm is used, the process of compilation and running the application is different. In addition, programming paradigms are coupled to specific hardware requirements. There also exist differences for running a parallel application interactively and in batch.

In particular, applications that use message passing can run on more than one node and communicate via a network. On the other hand, shared-memory parallel applications can run on one node only. They do not need a network.

Note

If you are not sure to which type of parallel program your application belongs, consult the appropriate program manual, then follow the instructions given below.

You should also know the current configuration of the HLRN machines and understand at least terms such as node, task, thread, network.

7.2. MPI Applications

Parallel programs that use message passing (MPI, LAPI, PVM) are handled by the Parallel Operating Environment (POE) on IBM SP machines.

The POE consists of

  • a number of compiler wrapper scripts like mpcc or mpxlf etc. to generate MPI executables.
  • the central poe command to run an MPI executable.
  • a number of tools for debugging and profiling.

7.2.1. Running programs using poe

The poe command enables the user to load and execute programs on different nodes. The poe acts like mpirun or mpprun on other platforms.

When you start a program with poe, you want to load a number of instances of it (tasks) on the resources you requested. The resources may be on one node or on different nodes. To do this, follow the poe command with the program name and any options. These options can include program options, followed by any of command line options of poe. If the program was compiled and linked with the mpXXX compilers, the poe command can be omitted. Thus, the following two commands are equivalent for these programs:

$ poe a.out [options to a.out ...] [poe options ...]
$ a.out [options to a.out ... ] [poe options ...] 

7.2.2. poe options

The poe command takes command line options that define resources like number of MPI tasks or network. These options are given on the command line when running an MPI program interactively. Alternatively, they can be set via environment variables. POE environment variables share the common prefix MP_. These options take defaults if they are not given. When running an MPI program in batch using LoadLeveler, many of the poe options become overridden by corresponding LoadLeveler keywords.

Table 7.1 shows the most important poe options to get started.

Table 7.1. Important poe options

Command line flag Environment variable LoadLeveler keyword Example Desription
-rmpool MP_RMPOOL   -rmpool 0 The pool ID that is used for node allocation when running interactively
-nodes MP_NODES #@ node -node 2 Number of different nodes (LPAR's)
-procs MP_PROCS #@ total_tasks -procs 4 Total number of MPI tasks
-tasks_per_node MP_TASKS_PER_NODE #@ tasks_per_node -tasks_per_node 4 Number of tasks to start on each node (used as an alternative to -procs)
-euidevice -euilib MP_EUIDEVICE, MP_EUILIB #@ network.mpi -euidevice sn_all -euilib us Adapter set and communication subsystem library to use for message passing.

7.2.3. Examples

The following examples comment on the use of poe interactively or from within a batch job. Chapter 10, Examples gives additional working examples including short source codes for sample programs.

Example 7.1 shows how to invoke instances of the uname command on different nodes. Three variants are given, which do the same. The first calls poe interactively with command line flags. The second calls poe interactively with environment variables. The third calls poe from within a LoadLeveler script.

Example 7.1. Usage of poe to start instances of a non-parallel program

Interactive, poe command line flags:

$ poe uname -n -s -rmpool 0 -nodes 1 -tasks_per_node 2 -labelio yes -stdoutmode ordered

Interactive, poe environment variables:

$ export MP_RMPOOL=0 $ export MP_NODES=1 $ export MP_TASKS_PER_NODE=2 $ export MP_LABELIO=yes $ export MP_STDOUTMODE=ordered $ poe uname -n -s

LoadLeveler script:

#!/bin/ksh
# @ job_type       = parallel
# @ node           = 1
# @ tasks_per_node = 2
# @ resources      = ConsumableCpus(1) ConsumableMemory(16 mb)
# @ output         = poe_ex1.llout
# @ error          = $(output)
# @ class          = cdev
# @ queue

poe uname -n -s -labelio yes -stdoutmode ordered

The output (variant 1 and 2: stdout, variant 3: file poe_ex1.llout) is something like

   
0:AIX hreg02a-en0
1:AIX hreg02a-en0

Example 7.2 shows how to invoke an MPI program with a number of total tasks. The control on how the tasks are allocated is given to LoadLeveler. Use of the HPS adapters will be requested. The shown environment settings are critical for getting performance. The shown input is appropriate for the majority of MPI applications. It is assumed that the executable is named a.out, and that it needs an input file called input.in as argument.

Example 7.2. Usage of poe to start an MPI program

Interactive:

$ export MP_SHARED_MEMORY=yes
$ export MP_WAIT_MODE=poll
$ export MP_SINGLE_THREAD=yes
$ a.out input.in -rmpool 0 -procs 6 -euidevice sn_all -euilib us

LoadLeveler script:

#!/bin/ksh
# @ job_type         = parallel
# @ total_tasks      = 6
# @ blocking         = unlimited
# @ network.mpi      = sn_all,,us
# @ resources        = ConsumableCpus(1) ConsumableMemory(1968 mb)
# @ output           = poe_ex2.llout
# @ error            = $(output)
# @ environment      = MEMORY_AFFINITY=MCM; \
#                      MP_SHARED_MEMORY=yes; MP_WAIT_MODE=poll; \
#                      MP_SINGLE_THREAD=yes; MP_TASK_AFFINITY=MCM
# @ wall_clock_limit = 70,60
# @ node_usage       = shared
# @ queue

./a.out input.in

7.3. SMP Applications

Parallel programs that can run shared-memory in parallel usually spawn a number of POSIX threads. The number of threads and their behaviour is defined usually at runtime via environment variables, or by options specific to the application.

Note

Consult the appropriate documentation of your application to find out how to do the setup.

7.3.1. Environment variables for OpenMP programs

Most SMP applications follow the OpenMP standard. Their runtime behaviour depends on the setting of some environment variables. There exist the standardized OMP_XXXX variables. On AIX, there also exists a set of XLSMPOPTS settings that do the same. If both are set, the OpenMP variable takes precedence.

7.4. Further Reading

Please see the following documents for a more detailed description of POE and LoadLeveler:




2003-2008 © Norddeutscher Verbund für Hoch- und Höchstleistungsrechnen (HLRN)