Using Grid Engine

Castor uses Grid Engine as a grid computing computer cluster software system. Basically, the software can be divided into two parts; the scheduler and the queuing system. In order to send a job to our cluster, users must submit their job-script via the submission program. The Scheduler will read the script, prepare resource regarding the user requirements and calculate the priority from the requirements. Jobs which require fewer resources or less run time will receive higher priority. Then the job will be scheduled queuing system and job with a high priority will be executed first.

Grid Engine can be used in two ways: command line interface (hereafter CLI) and graphic user interface (GUI). In this tutorial, you will learn how to submit your jobs via CLI, request for available resources, choose the parallel environment, delete failed or unwanted jobs, and inspect errors when the job failed. The following text is the introduction message displayed when the command qmonis run and then crtl + a is pressed.

Welcome User,

You are using Open Grid Scheduler/Grid Engine 2011.11p1 in cell ‘default’.

Grid Engine is based on code donated by Sun Microsystems. The copyright is owned by Sun Microsystems and other contributors. It has been made available to the open source community under the SISSL license.

The Open Grid Scheduler project is the maintainer of open source Grid Engine. For further information: http://gridscheduler.sourceforge.net/

Basic Job Submission

The command qsub followed by a job subscription file is used to send a job to the cluster. The general format is as follows:

The command below shows an example of job submission and the server response after successful submission.

The script.sh is the script containing the commands we wish to run. It can be any kind of standard shell script, e.g. bash or csh. Flags, variables with a minus (-) sign, can be interpreted as:

-V passes all environment variables to the job

-b y allows the command to be a binary file

-cwd run the job at the current working directory

The following is a qsub command which uses more optional parameters:

Grid Engine has a better way to deal with commands using multiple optional parameters. Optional parameters can be added after extra comments, starting with #$, in the script.sh file:

Once we define the number of processes to use, it will be stored at NSLOTS (in this case, 16). We can see the variable NSLOTS has been called to specify the number of the process for mpiexec to use.

Single Process Job

For task arrays, see Advance Grid Engine: Task Arrays

To submit a single process job, the script should be written without parallel environment (-pe) flag:

Another way is to use the -pe smp flag but limit the number of process to 1:

Parallel Job

Here, we provide a brief explanation to submit a single process job and a multi-process (parallel) jobs. For a full explanation see Advance Grid Engine: Job Types. To submit a job with multiple processes, it is necessary to declare a parallel environment (-pe) flag followed by the parallel type.

Shared Memory

Shared memory job runs multiple processes which share memory together on one machine. We simply define the parallel environment flag smp and the number of processes to use.

Distributed Memory

For distributed memory, each process has its own memory and does not share with any others. A distributed memory job can run across a compute node but requires additional set up how to scatter the processes over compute nodes. Suppose we want to run a job with 16 processes which spawn 4 processes on each compute node, we use:

To see the eligible parallel environment, run qconf -sql. To see a parallel environments description, run qconf -sq <pe-name>.

Hybrid Shared/Distributed Memory

Hybrid parallel jobs use distributed memory for each Compute Node. In each compute-node, the processes share memory together. This is the most complex type of parallel job. First, we need to define OMP_NUM_THREADS, the environment variable, to specify how many processes on each compute node will be spawned. Suppose we want to run a hybrid job with 16 processes each compute node gets 4 processes which share memory together, we shall write:

Commonly Used Flags

Here we list commonly used flags to add to the script before submitting a job in the table below. Another important flag, -l, is used to request for any available resource on our cluster. We will explain in the next topic.

FlagArgumentDescription
-NnameName of the job
-VAll environment variables active within the qsub utility be exported to the context of the job. This option is activated by default.
-cwdExecute the job from the current working directory. If -cwd was not specified then the job will be run from user's home directory.
-by[es]|n[o]Gives the user the possibility to indicate explicitly whether command should be treated as binary or script.
  • If the value of -b is y, then command may be a binary or script.
  • If the value of -b is n the command needs to be a script and it will be handle as script.
-shelly[es]|n[o]With -shell n no command shell will be executed for the job. This option only applies when -b y is also used. Without -b y, -shell n has no effect.
-SpathSpecifies the interpreting shell for the job.
-qwc_queue_listDefines or redefines a list of cluster queue, queue domains or queue instances which may be used to execute this job.
ry[es]|n[o]Indentifies the ability of a job to be rerun or not.
  • If the value of -r is y, yjr job will be rerun if the job was aborted without leaving a consistent exit state.
  • If -r is n, the job will not be rerun under any circumstances.
-epathDefines or redefines the path used for the standard error stream of the job.
-opathDefines or redefines the path used for the standard output stream of the job.
-jy[es]|n[o]Specifies whether or not the standard error stream of the job is merged into the standard output stream.
  • If both the -j y and the -e options are present, Grid Engine sets but ignores the error-path attribute.
-peparallel_environmentParallel programming environment (PE) to instantiate.
-Memail_addressDefines or redefines the list pf users to which the server that executes the job has to send email, if the server sends mail about the job.
-mb|e|a|s|nDefines or redefines under which circumstances mail is to be sent to the job owner or to the users defined with the -M option described below.The option arguments have the following meaning:
  • b Mail is sent at the beginning of the job.
  • e Mail is sent at the end of the job.
  • a Mail is sent when the job is aborted or rescheduled.
  • s Mail is sent when the job is suspended.
  • n No mail is sent.
  • Currently no mail is sent when a job is suspended.

Other available options and detailed explanations can be read by running the command manqsub. You might be noticed that someone using .sge as the file extension (or suffix) of the script and define the shell environment at the first line. This extension helps them to distinguish job submission files from other standard shell scripts.

Resource & Queue Configuration

Memory Usage

Another custom configuration is a memory requirement for a job. Users can ensure that enough memory is allocated to the job before it is deployed. By default, memory is limited to 1 gigabyte per process. We allow the user to override this option by adding the flag, -l h_vmem=<size_of_memory><unit>:

This command will request a total memory of 4 gigabytes per process. Below is  another example

This command will request a total memory of 16 gigabytes shared across 8 processes.

Querying Queue

After submitting a job, users can check the status and other information of the job by using the command, qstat. It prints an output regarding the job details:

The state column indicates the progress of a job. After submitting a job, the system will determine the priority and put it on the queue with a wait (qw) state. Once the resource is available and the scheduler runs the job. Its state will be changed to running (r). The simple states are listed as follow:

  • qw pending for available resource to start
  • t transferring to compute node(s)
  • r running on the Compute node(s)

There are additional state other than above including;

  • d deleting a job, see job deletion for more details
  • S queue is suspended
  • E pending error, see troubleshooting for more details

We can also add the following flags to retrieve more information:

  • -u <username> shows the job status of a specific user, using this flag with wildcard, -u "*", will show job of all users.
  • -f shows the full output of each compute node. If -u "*" is added, shows full output for all user.
  • -q <queue_name> print information on a given queue.

To list all options, use qstat -help. The manual for qstat can be accessed by running the command,man qstat.

Job Modification

User can alter the resource requirement only when a job is pending in queue (qw). The command is qalter; it has the format as follows:

job_task_list is job id’s (and task id’s) of jobs. For example, we have a job with the job id 17061. We want to change the required memory to 64 gigabytes and merge stdout and stderr stream together. We use

To delete the job, use command qdelfollows by job id.