Nov 16, 2009

Windows HPC Server 2008 and how to run your jobs

A twelve hours flight to Los Angeles is a very long time that can be used in a couple of way. One is sleeping, using the on board entertainment system (see on the right; so not to sleep) or reading.

One job you come across every couple of years (or even month as did I lately) as a Windows developer is running computational work that last for a couple of minutes, hours or event up to days. Microsoft Windows HPC Server may look promising to replace a lot of your distributed runtime infrastructure code. It contains as a major component a job scheduler I will make some notes about here.


HPC 2008 brings high performance computing in a cluster environment to the Windows platform. A so-called job may contain several tasks, which basically are single executables (a simple sequential program or an already parallel one) you plan to run. There are a lot of ways to (1.) submit your job:

  • Job management console
  • CLI & PowerShell
  • SOA APIs (WCF)
  • COM APIs
  • .NET APIs or
  • WS-HPC Basic Profile (Web service)

Job scheduling (2.) can get quite sophisticated using First-come-first-serve, exclusive scheduling, resource optimizing. Job execution (3.) goes through the states Queued, Running, Finished, Failed or Cancelled. The scheduler decides according to defined requirements which job to run. Failed jobs can get automatically re-run. Jobs can run in parallel on several nodes or on several local CPUs. Tasks are usually not designed to communicate (I may come back to this in another post). Jobs may even get preempted (killed or slowed down to let other jobs be run before).

The head note (which can also be a compute node) acts as the central job management point. Jobs are stored in an underlying SQL-Server (Express?) database and execute under the submitting users account. There exists only one queue for a cluster.

(source: Microsoft)

Job Submission

Jobs get priorities, allocated resources (nodes, CPU and run time), dependency information or node exclusivity upon submission. A job can be submitted from the command line like this:

job new [options] [/f:jobdescriptionxmlfile] [/scheduler:host]

The two primary client user interface tools to use are the Job Management Console and the Administration Console. Jobs defined with the new job wizard can be saved to a job description XML file (that is intended to be used or even automatically generated). Message-Passing-Interface (MPI) executables must be prepended with mpiexec.

(source: Microsoft)

Job input (data) and output (results) can be on the local node or on a file share. Large and static data should be copied to the nodes whereas small and dynamically updated data should be placed centrally. Data should be cut into pieces (fragmented) and run in so-called “sweep”, which is some kind of (for int i = 0; i < 100; i++) – loop index.

Jobs can also be submitted using the .NET API and the Microsoft.Hpc.Server namespace classes Scheduler (to connect and submit), ISchedulerJob job = scheduler.CreateJob(), job.AddTask(t), etc.

Another way is to call the scheduler using Web Services and the HPCPBClient class (hpcbp.CreateAcitivity(jdsl), hpcbp.GetActivityStatuses(), etc.).

Job Scheduling

The most simple scheduling is a FIFO queue. “Backfill” is a method where time windows are used for small jobs to run according to the definitions of the large submitted jobs. The “resource matchmaking” method is done by the scheduler according to compute, network and application requirements. “Job templates” are administrator or system owner predefined jobs that can be used by high performance computing client users.

Microsoft seems to have done a good job when it comes down to security. As mentioned above, jobs are executed under the submitting users account. Credentials are passed during submission and safely (encrypted, etc.) stored to be used when the job needs to run. Credentials are passed by secured .NET remoting channels to the nodes and deleted after the job run.

Picture: The fail-over topology from Microsoft

Please note the HPC 2008 Server supports the MPI Standard by it’s own MS-MPI implementation.

HPC 2008 Server looks reasonable priced at around 500$ per node (as I found out after a quick search). You are getting some Quality-of-Services you probably cannot code yourself for this money. So have a look yourself next time defining IJob, IJobScheduling interfaces in your project.

No comments:

Post a Comment