|
Accessing the Code
The cluster manager is comprised of two distinct parts:
- base infrastructure that includes process launch and monitoring,
reliable multicast and unicast messaging, and fault response subystems.
This capability in included in the OpenRTE open source project, a part
of the OpenMPI project and distributed as part of that code.
- code specific to the Open Resilient Cluster Manager (OpenRCM, or ORCM)
project. This includes the plug-and-play messaging system, configuration
manager interface, and a variety of ORCM-specific tools. The code is built
upon ORTE, and is an official OMPI sub-project with its own mailing lists
and code distribution.
The two codes can be accessed in several ways:
- ORTE can be acquired either via tarball or subversion checkout from this
web site, or cloned from a
public Mercurial repository on the Bitbucket site.
The latter repository frequently contains more advanced code
elements in various stages of development. At times, the ORCM code will depend
upon these advancements and may not build with the standard code available from
the OMPI web site. Thus, we suggest that developers contact the ORCM developers
mailing list to obtain write-access to the Bitbucket repository, including subscription to the
automated mailing list showing commits to that code base. Please note that this
does not authorize users to commit their changes to the official OMPI repository,
but only to the Bitbucket repository.
- Similarly, ORCM is available via subversion checkout from this site, or from
a public Bitbucket Mercurial repository.
In this case, we recommend using
the subversion checkout to create your own Mercurial (or git, or subversion - whatever
your preference) branch for development.
Building the Code
Building ORTE:
- In the top source directory, execute the following:
- ./autogen.sh -no-ompi <== this step is skipped when building from a tarball
- Configure the ORCM code:
- On a Mac:
- ./configure --prefix=whatever --with-platform=contrib/platform/cisco/macosx-dynamic
- On a Linux box:
- ./configure --prefix=whatever --with-platform=contrib/platform/cisco/ebuild/native
- Make clean all install > /dev/null
- Add the prefix/bin to PATH, prefix/lib to LD_LIBRARY_PATH, and prefix/share/man to MANPATH
- Compile any applications using the ortecc command
Building ORCM:
- Be sure to build ORTE first. In the top source directory for ORCM, execute the following:
- ./autogen.pl
- ./configure \--prefix=whatever --with-orte=ompi-prefix where ompi-prefix is the prefix used
when configuring ORTE (see above)
- make clean all install > /dev/null
- Add the prefix/bin to PATH, prefix/lib to LD_LIBRARY_PATH, and prefix/share/man to MANPATH
- Compile any applications using the orcmcc command
Running the Code
Building and running applications based on ORTE is relatively simple. Applications can be
compiled using the ortecc wrapper compiler, which contains all the required include and
library paths. The resulting binaries can then be executed using the orterun program - see
orterun -h for the full range of supported options.
Building ORCM applications is similarly easy via the orcmcc wrapper compiler. However,
executing ORCM applications is a little trickier as they require that the ORCM distributed
virtual machine (DVM) be running. Thus, executing an ORCM application requires two steps:
- Initiate the ORCM DVM using the orcm-vm command
- Start your application using the orcm-start command. See orcm-start -h for a list of options.
- Stop your application using the orcm-stop command. Again, see orcm-stop -h for a list of options.
Useful Tools
Each of the two code bases is packaged with a set of tools to aid users and developers.
Besides the wrapper compilers and the executables mentioned above, two tools from each
code base merit mention. A list of options for any tool can be found by executing the
tool with the -h option.
ORTE
- orte-ps - prints a list of ORTE jobs, their processes, where each process is located,
and the state of each process
- orte-info - prints a list of MCA parameters for use in controlling ORTE behavior.
ORCM
- orcm-ps - prints a list of ORCM applications being executed, plus where each replica
is located, its state, and the number of times it has been restarted
- orcm-info - prints a list of MCA parameters for use in controlling ORCM behavior.
|