A Guide to Running AMBER at SDSC
(DATASTAR)



This page provides an end user guide to running the AMBER molecular dynamics simulation software on the various High Performance Computing (HPC) resources available at San Diego Supercomputer Center (SDSC).

Running on SDSC's DATASTAR Machine

DataStar has 272 (8-way) P655+ and 7 (32-way) P690 compute nodes. The 1.5 GHz 8-way nodes (176 in number) have 16 GB, the 1.7 GHz 8-way nodes (96 in number) have 32 GB, while the 32-way nodes have 128 GB of memory. There is also one 32-way node with 256 GB of memory for applications requiring unusually large memory space. DataStar has a nominal theoretical peak performance of 15.6 TFlops.

For the latest news on allocations, queues, resources etc please see the Datastar Documentation.

If you have any specific questions relating to running AMBER at SDSC please contact consulting@sdsc.edu. General questions concerning AMBER should be directed to the AMBER mailing list (amber@scripps.edu).

Amber 9 Installation / Available Codes
The recommended version of AMBER to run on Datastar is AMBER 9.

AMBER 9 is installed in /usr/local/apps/amber9

In here you will find an exe directory containing the executables and a dat directory containing the force field files. The main files that you will need to use on Datastar are as follows:
 

Executable (aliases) Description
pmemd.MPI (pmemd) Recommended executable for running Molecular Dynamics simulations in parallel on the Datastar architecture. Supports Particle Mesh Ewald (PME) and Generalized Born (GB) simulations.
pmemd.1cpu Serial version of pmemd.MPI executable, use for single cpu runs.
sander.MPI (sander) Dynamics engine similar to PMEMD but supports many more options. If you plan to run a simulation type that is not supported by pmemd then you should use this executable (e.g. QM/MM). Note: Parallel scaling will not be as efficient as pmemd. Test the performance for your chosen simulation before submitting long jobs.
sander.1cpu Serial version of sander.MPI executable, use for single cpu runs.
sander.PIMD.MPI (sander.PIMD) Supports Path Integral MD simulations and Nudged Elastic Band Simulations. Parallel version.
sander.PIMD.1cpu Serial version of sander.PIMD.MPI executable, use for single cpu runs.
sander.LES.MPI (sander.LES) Supports Locally Enhanced Sampling (LES) MD. Parallel version.
sander.LES.1cpu Serial version of sander.LES.MPI executable, use for single cpu runs.

Other executables are present (e.g. nmode for normal mode analysis) but do not support parallel execution.

Amber 9 Performance and Scaling
The best performance and scaling will typically be obtained by using the PMEMD executable. Hence if your simulation falls under the remit of what PMEMD supports then you should use this. The scaling behaviour will very much depend on the type and size of your job. Implicit solvent GB simulations typically scale better than explicit solvent PME simulations but often have many less residues which limits the maximum number of cpus (for GB you need 1.01x more residues than processors). For PME simulations you require 4.0 x more residues than processors. For both GB and PME simulations you will generally find that the scaling improves as you go to more and more atoms. This is a function of the underlying theory.

The graph below shows the expected scaling for three PME simulations (Cellulose [408K atoms], JAC [23.5K atoms] and FactorIX [91K atoms]) and for a medium size GB simulation (gb_mb [2.49K atoms]):
 

Ps/day
Speedup
click image for larger view

As you can see from the graph all simulations have a region where the scaling is acceptable and then where it tends to tail off. Caution: Going to very large numbers of cpus can often result in your code taking longer. The exact scaling you see will depend on the size and type of job you are running so before burning to much cpu time you should test the scaling with the simulation you plan to run. Typically the optimum point on Datastar is between 128 and 256 cpus but if your simulation is small you may need to use less cpus.

Required Environment Options

The Amber 9 installation in /usr/local/apps/amber9 was compiled with xlf90 and was linked against IBM's MASS libraries and MPI libraries. The libraries are dynamically accessed at runtime but should all be in /usr/lib/ which should be searched by default. Hence, unlike the Teragrid cluster, you do not need to specify which compilers and mpi to use.

You should, however, edit your .cshrc file and add: setenv AMBERHOME /usr/local/apps/amber9

Example Job Submission Scripts
In order to run the AMBER software via the queuing system all you require are your mdin files, inpcrd/restart files and prmtop files. Note, you should copy these files to either /gpfs/mydir or /gpfs-wan/mydir and both read and write everything here. When your job is done you can then copy these over to your local machine using scp.

The following is an example job submission script for a PMEMD run (the \'s act as line continuation characters. If you want you can put all of the options on a single line):

pmemd_datastar_8cpu.x  
Script Explanation
#!/usr/bin/ksh

#@environment = COPY_ALL;\
#AIXTHREAD_COND_DEBUG=OFF;\
#AIXTHREAD_MUTEX_DEBUG=OFF;\
#AIXTHREAD_RWLOCK_DEBUG=OFF;\
#AIXTHREAD_SCOPE=S;\
#MP_ADAPTER_USE=dedicated;\
#MP_CPU_USE=unique;\
#MP_EAGER_LIMIT=64K;\
#MP_EUIDEVELOP=min;\
#MP_LABELIO=yes;\
#MP_POLLING_INTERVAL=100000;\
#MP_PULSE=0;\
#MP_SHARED_MEMORY=yes;\
#MP_SINGLE_THREAD=yes;\
#RT_GRQ=ON;\
#SPINLOOPTIME=0;\
#YIELDLOOPTIME=0;\
#MP_CSS_INTERRUPT=no;

#@network.MPI = sn_all, shared, US
#@job_type = parallel
#@job_name= job.$(jobid)
#@output = LL_out.$(jobid)
#@error = LL_err.$(jobid)
#@notification = error
#@node_usage = not_shared

#@notify_user = myemail@sdsc.edu
#@account_no = use320
#@class = normal
#@node = 1
#@tasks_per_node = 8
#@wall_clock_limit = 00:30:00
#@initialdir = /gpfs/mydir/directory_to_run_in/

#@queue

poe /usr/local/apps/amber9/exe/pmemd -O -i mdin \
-o mdout \
-p prmtop \
-c inpcrd \
-r restrt \
-x mdcrd
 
#@notify_user : Add the email address here that you want to be notified of an error.

#@account_no : Enter the account code that you want to be charged for this run.

#@class : enter the queue that you want the job to run in. See Datastar docs for more info.

#@node : Enter the number of nodes you want. Total cpus = nodes * tasks_per_node

#@tasks_per_node : Enter the number of tasks per node. If you are running on the 8 way p655 nodes then this should normally always be 8 as you will be billed in multiples of 8 as the nodes are not shared.

#@wall_clock_limit : Enter the length of time you expect your job to take to run. Maximum = 18:00:00 (18 hours). Specifying smaller values may get you through the queue quicker due to backfill opportunities.

#@initialdir : Enter the directory in which the job should run. Typically this should be a directory on gpfs.

You can submit this job to the queue using llsubmit.

KNOWN LIMITATIONS
The Divcon QMMM interface in sander fails the tests and should not be used on Datastar. This interface is accessed when idc>0. Leaving IDC at the default of 0 will use the built in QMMM interface when ifqnt=1. This built in interface works fine and should be used for all QM/MM simulations on Datastar.

| Return to Main Page |