wiki:DRM4G/Tutorial

Version 78 (modified by minondoa, 5 years ago) (diff)

--

About DRM4G

DRM4G is an open platform, based on GridWay, used to define, submit, and manage computational jobs. DRM4G is a Python (2.6+, 3.3+) implementation that provides a single point of control for computing resources without installing any intermediate middlewares. As a result, a user is able to run any job on laptops, desktops, workstations, clusters, supercomputers, and any grid.

Start Guide

In order to install DRM4G, follow this link :

Start to run DRM4G

  1. Start up DRM4G :
    [user@mycomputer~]$ drm4g start
    Checking DRM4G local configuration ...
      Creating a DRM4G local configuration in '/home/user/.drm4g'
      Copying from '/home/user/drm4g/etc' to '/home/user/.drm4g/etc'
    Starting DRM4G .... 
      OK
    Starting ssh-agent ...
      OK
    
  2. Show information about all available resources, their hosts and their queues :
    [user@mycomputer~]$ drm4g resource list
    RESOURCE            STATE               
    localmachine        enabled
    
    [user@mycomputer~]$ drm4g host list
    HID ARCH       JOBS(R/T) LRMS       HOST                
    0   x86_64           0/0 fork       localmachine
    
    [user@mycomputer~]$ drm4g host list 0
    HID ARCH       JOBS(R/T) LRMS       HOST                
    0   x86_64           0/0 fork       localmachine        
    
    QUEUENAME      JOBS(R/T) WALLT CPUT  MAXR  MAXQ 
    default              0/0 0     0     1     1               
    

My first job

  1. Create a job template :
    [user@mycomputer~]$ echo "EXECUTABLE=/bin/date" > date.job
    
  2. Submit the job :
    [user@mycomputer~]$ drm4g job submit date.job
    ID: 0
    
  3. Check the evolution of the job :
    [user@mycomputer~]$ drm4g job list 0
    JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
    0   pend ---- 19:39:09 --:--:-- 0:00:00 0:00:00 --   date.job        --                        
    
    If you execute successive drm4g job list 0, you will see the different states of this job:
    • pend: The job is pending for a host to run on.
       JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
       0   pend ---- 19:39:09 --:--:-- 0:00:00 0:00:00 --   date.job        --                                            
      
    • prol: The frontend is being prepared for execution.
       JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
       0   prol ---- 19:39:09 --:--:-- 0:00:00 0:00:00 --   date.job        --                                                            
      
    • wrap pend: The job has been successfully submitted to the frontend and it is pending in the queue
       JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
       0   wrap pend 19:39:09 --:--:-- 0:00:00 0:00:00 --   date.job localhost/fork                                                                         
      
    • wrap actv:The job is running in the remote queue.
       JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
       0   wrap actv 19:39:09 --:--:-- 0:00:05 0:00:00 --   date.job localhost/fork                                                         
      
    • epil:The job is done/complete in queue and it's fetching the results.
       JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
       0   epil ---- 19:39:09 --:--:-- 0:00:10 0:00:00 --   date.job localhost/fork
      
    • done:The job is done.
       JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST                                          
       0   done ---- 19:39:09 19:39:27 0:00:10 0:00:01 0    date.job localhost/fork         
      
  4. In this job template, the results from the job are the standard output (stdout) and standard error (stderr), both files will be in the same directory of the job submision:
    [user@mycomputer~]$ cat stdout.0
    Mon Jul 28 12:29:43 CEST 2014
    
    [user@mycomputer~]$ cat stderr.0
    

How to configure a TORQUE/PBS resource

Before starting, configure a public/private key pair for your ssh connection:

  1. Generate a public/private key pair without password :
    [user@mycomputer~]$ ssh-keygen -t rsa -b 2048 -f $HOME/.ssh/meteo_rsa -N ""
    
  2. Copy the new public key to the TORQUE/PBS resource :
    [user@mycomputer~]$ ssh-copy-id -i $HOME/.ssh/meteo_rsa.pub user@ui.macc.unican.es
    

DRM4G uses the environment variable EDITOR to select which editor is going to be used for configuring resources. By default the editor is nano

In order to configure a TORQUE/PBS cluster accessed through ssh protocol, you can follow the next steps:

  1. Configure the meteo resource :
    [user@mycomputer~]$ drm4g resource edit
    
    [DEFAULT]
    enable           = true
    communicator     = local
    frontend         = localhost
    lrms             = fork
    
    [localmachine]
    max_jobs_running = 1
    
    [meteo]
    enable            = true
    communicator      = ssh
    username          = user
    frontend          = ui.macc.unican.es
    private_key       = ~/.ssh/meteo_rsa
    lrms              = pbs
    queue             = grid
    max_jobs_running  = 1
    max_jobs_in_queue = 2
    
  2. List and check if the resource has been created successfully :
    [user@mycomputer~]$ drm4g resource list
    RESOURCE            STATE
    localmachine        enabled               
    meteo               enabled
    
    [user@mycomputer~]$ drm4g host list
    HID ARCH       JOBS(R/T) LRMS       HOST
    0   x86_64           0/0 fork       localmachine
    1   x86_64           0/0 pbs        meteo
    

That's it! Now, you can summit jobs to both resources.

User Scenarios

This section will describe how to take advantage of DRM4G to calculate the number Pi. To do that, three types of jobs single, array and mpi will be used.

Single Job

  • DRM4G job template :
    EXECUTABLE  = pi.sh
    ARGUMENTS   = 0 1 100000000
    STDOUT_FILE = stdout_file.${JOB_ID}
    STDERR_FILE = stderr_file.${JOB_ID}
    INPUT_FILES = pi_serial, pi.sh
    
  • pi.sh script :
    #!/bin/bash
    chmod +x ./pi_serial
    ./pi_serial $@
    

Array Job

  • DRM4G job template :
    EXECUTABLE  = pi.sh
    ARGUMENTS   = ${TASK_ID} ${TOTAL_TASKS} 100000000
    STDOUT_FILE = stdout_file.${TASK_ID}
    STDERR_FILE = stderr_file.${TASK_ID}
    INPUT_FILES = pi_serial, pi.sh
    
  • pi.sh script :
    #!/bin/bash
    chmod +x ./pi_serial
    ./pi_serial $@
    
  • Sum the results inside each file :
    $ awk 'BEGIN {sum=0} {sum+=$1} END {printf "Pi is %0.12g\n", sum}' stdout_file.*
    

MPI Job

  • DRM4G job template :
    EXECUTABLE    = pi_mpi.sh
    STDOUT_FILE   = stdout.${JOB_ID}
    STDERR_FILE   = stderr.${JOB_ID}
    INPUT_FILES   = pi_mpi.sh, pi_parallel
    NP            = 2
    
  • pi_mpi.sh script :
    #!/bin/bash
    source /software/meteo/use/load_use
    use openmpi14intel
    chmod +x pi_parallel
    mpirun -np 2 ./pi_parallel
    

Attachments (2)

Download all attachments as: .zip