DQS Accounting

Introduction

The Distributed Queueing System (DQS) can potentially manage a large, heterogeneous collection of workstations involving a massive throughput of jobs. It is often desirable to track that throughput, for reasons of accountability. DQS provides for accountability by:

DQS's accounting support software consists of the programs qacct and qusage. DQS Queue Accounting (qacct) is a UNIX shell command-line interface to the analysis of DQS accounting files. DQS Queue Usage (qusage) is an X Window System graphical interface to the analysis of DQS accounting files.

Measuring DQS Utilization

We're interested in measurements of queue usage during a given interval of time. The smallest time resolution is one second -- typically, however, the interval is divided into some number of usage bins.

Queue Utilization

The queue utilization measure provides an indication of the number of queues that are running jobs at a particular moment (i.e. during a bin). It is given by the sum of the job execution times during a bin divided by the bin size (length of time of the bin). Another measure, CPU utilization, will be described in a moment.

Consider the following figure. The interval is bounded by the times denoted IStart and IStop. The interval has been divided into eight usage bins, enumerated 0 to 7, with pound symbols (#) marking the end of each usage bin. The bin size is 6.

Also illustrated are five jobs (lettered) whose begin and end times are marked with carets (^). (Note that the execution time for each job is one time unit less than you may interpret from the diagram e.g. job A's execution time is 8 time units, not 9.) Jobs B, C, and D fall completely within the interval, while jobs A and E are only partially contained.

  IStart                                        IStop
    |---------------------------------------------|
      0  #  1  #  2  #  3  #  4  #  5  #  6  #  7 #

  ^---A---^
           ^B-^
                ^------------C----------^
                                ^--D---^
                        ^------------E----------------^

At the time of analysis, bin 0 has seen one job (A) which endured the entire bin (6 time units). The queue utilization during that bin is therefore 6 time units divided by a bin size of 6 time units, or 1.

Bin 1 involves job B, which lasted 3 time units. The queue utilization, then, is 3 / 6, or 0.5.

Bin 5 has seen jobs C, D, and E. During that bin, job C persisted for 6 time units, D for 5, and E for 6. Accordingly, the queue usage during the bin is (6 + 5 + 6) / 6, or 2.83.

CPU Utilization

The CPU utilization measurement indicates the usage of the processors in the cluster, broken down into user and system categories. (The quality of this information is entirely dependent on the facilities that your machines' operating systems provide. Documentation for the UNIX system call getrusage(2) might be a good starting point.)

CPU usage is given by the sum of each job's execution time multiplied by that job's average CPU time (user or system) during a bin, divided by the bin size.

For instance, recalling the above figure, job B lasted 3 time units. Suppose the user time for that job, reported by the operating system, was 1. The CPU usage (user) is therefore (3 * (1 / 3)) / 6, or 0.17.

Bin 5 has seen jobs C, D, and E. During that bin, job C persisted for 6 time units, D for 5, and E for 6. The total time for jobs C, D, and E is 24, 7, and 31 time units, respectively. Assume that the system times for those jobs are 19, 4, and 29, respectively. The CPU usage (system) for bin 5, then, is given by:

    (6 * (19 / 24)) + (5 * (4 / 7)) + (6 * (29 / 31))
    -------------------------------------------------
                            6

or 2.2.

qacct

qusage

Starting qusage

Using qusage

The top-level window presents the following options:

[Usage...] enters the Usage Options submenu, allowing the specification of parameters for the analysis of an accounting file.

[Help...] invokes the on-line help facility.

[About...] displays version, copyright and author information.

[Quit] exits the program.

Usage Options

The following fields are available to specify options for the analysis of the accounting file. The logic behind the matching of entries in the accounting file is roughly: "If any part of the job occurs within the specified interval (Start and Days) and fits any of the specified parameters (Queue, Host, Complex, Group, Owner, and Job), then a match has occurred."

[Use Defaults/Remove Defaults] either inserts the aforementioned default values into the fields, or removes them.

[Accept] signifies approval of the options entered--the accounting file is analyzed and graphed.

[Help...] invokes the on-line help facility (see section Help).

[Cancel] returns to the top-level window.

Usage

[CPU/Queue] toggles between CPU and queue utilization.

[Line/Bar] toggles between a line or bar plot style.

[Grid/No Grid] toggles between a grid or no grid on the plot.

[Legend/No Legend] toggles the plot legend on and off.

[Print...] enters the Usage Print submenu, allowing the writing of the current plot to a PostScript file.

[Help...] invokes the on-line help facility (see section Help).

[Done] exits the plot, returning to the Usage Options window.

Usage Print

This window allows the specification of some options for the writing of a plot to a PostScript file.

WARNING: Existing files will be overwritten.

Help

About

This window displays version, copyright and author information. Refer to this information when reporting bugs, suggesting enhancements, etc.

Quit

Internals

This chapter is intended for those who wish to understand and perhaps modify the source code for the DQS accounting programs. It is also relevant for users that are asking "Why in hell is this program not giving me what I think it should!?"

The high-level algorithm for the DQS accounting programs is as follows:

(1.0)  get user's request (i.e. interval start and stop, queue complex
         specification, job ID, etc.)

(2.0)  for each job (i.e. line in the accounting file)
(2.1)     if job matches user's request
(2.2)       tally job's usage into the usage bins

(3.0)  display the usage bins

Portions 1.0 and 3.0 are related to user interface and are discussed in the "qacct Internals" and "qusage Internals" sections.

Portions 2.1 and 2.2 are covered in the "Commonalities" section.

Portion 2.0 is covered by the section entitled "Accounting Files".

Accounting Files

Accounting

The DQS accounting file (act_file) contains a line for each DQS job that has completed execution. Each line contains the following fields, separated by colons.

    char     *qname;          /* name of queue */

    char     *hostname;       /* name of host */

    u_long32 master;          /* master node? (true/false) */

    char     *complex;        /* queue complex resource string */
                              /* (comma sep'd)                 */

    char     *group;          /* name of accounting group */

    char     *owner;          /* user name of owner */

    char     *job_name;       /* name of job (perhaps NULL) */

    char     *dqs_job_name;   /* job identifier */

    u_long32 job_number;      /* job identifier */

    u_long32 submission_time; /* time of receipt by qmaster (in sec) */

    u_long32 start_time;      /* time execution began (in sec) */

    u_long32 end_time;        /* time execution finished (in sec) */

    u_long32 exit_status;     /* exit value returned */

    u_long32 ru_wallclock;    /* time taken to execute (in sec) */

    u_long32 ru_utime;        /* user time used */

    u_long32 ru_stime;        /* system time used */

    u_long32 ru_maxrss;       /* maximum resident set size */

    u_long32 ru_ixrss;        /* integral shared text size */

    u_long32 ru_ismrss;       /* integral shared memory size */

    u_long32 ru_idrss;        /* integral unshared data size */

    u_long32 ru_isrss;        /* integral unshared stack size */

    u_long32 ru_minflt;       /* page reclaims */

    u_long32 ru_majflt;       /* page faults */

    u_long32 ru_nswap;        /* swaps */

    u_long32 ru_inblock;      /* block input operations */

    u_long32 ru_oublock;      /* block output operations */

    u_long32 ru_msgsnd;       /* messages sent */

    u_long32 ru_msgrcv;       /* messages received */

    u_long32 ru_nsignals;     /* signals received */

    u_long32 ru_nvcsw;        /* voluntary context switches */

    u_long32 ru_nivcsw;       /* involuntary context switches */

The fields whose labels begin with "ru_" contain information gathered from the UNIX system call getrusage(2). It is important to note that the quality of this information is entirely dependent on the facilities that your machines' operating systems provide. Your vendor's documentation for getrusage(2) might be a good starting point.

Statistics

The DQS statistics file (stat_file) contains a line for each DQS job currently running. The statistics are repeatedly written to the statistics file at the end of an interval (usually every ten minutes, but this is adjustable). Each line contains the following fields, separated by colons.

    u_long32 now;             /* time (in secs) that stats were logged */

    char     *hostname;       /* name of host */

    char     *qname;          /* name of queue */

    u_long32 load_avg;        /* load average */

    u_long32 qty;             /* number of said queues */

    u_long32 qty_active;      /* number of said queues with active jobs */

    char     *complex;        /* queue complex resource string */
                              /* (comma sep'd)                 */

    char     *states;         /* One or more of (concat'ed): */
                              /*   'a' ALARM           */
                              /*   'c' SUSPEND_ON_COMP */
                              /*   'd' DISABLED        */
                              /*   'e' ENABLED         */
                              /*   'h' HELD            */
                              /*   'm' MIGRATING       */
                              /*   'q' QUEUED          */
                              /*   'r' RUNNING         */
                              /*   's' SUSPENDED       */
                              /*   't' TRANSISTING     */
                              /*   'u' UNKNOWN         */
                              /*   'w' WAITING         */
                              /*   'x' EXITING         */

Commonalities

The qacct and qusage programs necessarily share a good deal of functionality. Essentially, the only difference is the user interface e.g. textual and shell command-line-based vs. graphical and point-and-click-based.

This section covers the internals of the code common to both programs. Recall portions 2.X of the high-level algorithm stated above (See section Internals).

(2.0)  for each job (i.e. line in the accounting file)
(2.1)     if job matches user's request
(2.2)       tally job's usage into the usage bins

Portion 2.0 has been covered above (See section Accounting). Portion 2.1 is handled below in "Matching a Job to a Request" and portion 2.2 in "Calculating DQS Usage".

Matching a Job to a Request

The user has specified a request that constrains us to match a subset of the jobs in the accounting file. A match has occurred if both of the following are true:

All of the request's parameters may contain wildcard characters.

If these conditions are met then the matched job is factored into the usage.

Calculating DQS Usage

The sections that follow consider the following figure, reproduced from a previous section (See section Queue Utilization), only here using more specific timing details.

The interval is bounded by the times 10 and 56, denoted IStart and IStop, respectively. According to a bin size of 6, the interval has been divided into eight usage bins, enumerated 0 to 7, with pound symbols (#) marking the end of each usage bin.

Also illustrated are five jobs (lettered) whose begin and end times are below the carets (^). The execution time for each job is defined to be the job's end time minus its start time e.g. job A's execution time is 16 - 8 = 8 time units. (Note that this is one time unit less than you may interpret from the diagram's caret markers.) Jobs B, C, and D fall completely within the interval, while jobs A and E are only partially contained.

  IStart                                        IStop    <
    1         2         3         4         5     5      |
    0         0         0         0         0     6      | Time Interval
    |---------------------------------------------|      <
      0  #  1  #  2  #  3  #  4  #  5  #  6  #  7 #      < Usage Bins

  ^---A---^                                              <
  8       16                                             |
                                                         |
           ^B-^                                          |
          17  20                                         |
                                                         |
                ^------------C----------^                |
               22                       46               | Jobs
                                                         |
                                ^--D---^                 |
                               38      45                |
                                                         |
                        ^------------E----------------^  |
                       30                             61 <

The sections that follow provide an overview of the algorithm for calculating usage and an example of its use.

The Usage Calculation Algorithm

The basic idea for calculating a job's usage is to walk through the job a bin at a time, accruing usage in each bin. The job may begin and/or end inside of a bin, so we handle those as special cases. Here's the algorithm.

adjust job start and stop times if either falls outside the interval

if job begins inside a bin
  calc job's usage up to the start of the next bin

if job has any "middle" bins
  for each middle bin
    calc job's usage for that bin

if job ends inside a bin
  calc job's usage from the end of the previous bin

Example of the Usage Calculation Algorithm

Job E from the above figure provides an interesting case for our usage calculation algorithm. Job E begins inside of bin 3 and proceeds through bins 4 to 7.

Let's follow the algorithm. We first adjust the job's start and stop times according to how they fall within the bounds of the interval. Job E's stop time falls outside the interval, so we adjust the stop time to IStop + 1, or 57.

The job begins inside bin 3 at time 30. The next bin (4) begins at time 34, so the usage during bin 3 is 34 - 30 = 4 units.

The "middle" bins are 4 to 6, so we add a usage of 6 (the bin size) to each of those bins.

The final bin is 7 which begins at time 52. The usage for bin 7 is therefore the job's stop time, 57, minus the bin's begin time, 52, equalling 5.

Note that the final usage values will be divided through by the bin size 6. This is done after the entire accounting file has been processed and just before reporting the usage. For example, the usage for bin 7 in the above example would be reported as 5 / 6, or 0.83.

That takes care of job E. Jobs A through D will also factor into the usage calculations--they were presumably processed prior to job E. Consider bin 5, which has seen jobs C, D, and E. During that bin, job C persisted for 6 time units, D for 5, and E for 6. Accordingly, the usage during the bin is (6 + 5 + 6) / 6, or 2.83.

qacct Internals

qusage Internals

Conclusion