Condor for Debian ================= The documentation for Condor can be found on the World Wide Web at: http://www.cs.wisc.edu/condor/manual At that web site, you will find both an on-line version of manual and directions for downloading your own copy in a variety of formats. Package upgrades ---------------- When the Condor package is upgraded it will restart the Condor daemons. If this happens on the central manager it will result in jobs being killed on execute nodes. To avoid this, it is best to gracefully shutdown the Condor pool before upgrading. The Debian package does not perform a graceful shutdown on upgrades, because there is a variety of possible procedures, where subjective appropriateness depends on the configuration and utilization of a particular Condor pool. Please see this page for more information on graceful shutdowns: https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToShutDownCondor Snapshotting of vanilla universe jobs ------------------------------------- The Debian package comes with built-in support for snapshotting of vanilla universe jobs with DMTCP. This is useful for migrating running jobs between machines of a condor pool without having to relink the respective binaries with Condor's standard universe LIBC wrapper. The main advantage of DMTCP over Condor's native mechanism is that it also works for scripts (interpreter sessions like Python), and closed-source systems (e.g. Matlab). The necessary shim script (a wrapper around the actual job executable) is available at /usr/lib/condor/shim_dmtcp. The condor-doc package comes with example submit files for self-compiled executables and system-wide installed software. A submit configuration for a job with on-demand snapshotting on a Debian cluster could look like this: # DMTCP checkpointing related boilerplate (should be applicable in most cases) universe = vanilla executable = /usr/lib/condor/shim_dmtcp should_transfer_files = YES when_to_transfer_output = ON_EXIT_OR_EVICT kill_sig = 2 output = $(CLUSTER).$(PROCESS).shimout error = $(CLUSTER).$(PROCESS).shimerr log = $(CLUSTER).$(PROCESS).log dmtcp_args = --log $(CLUSTER).$(PROCESS).shimlog --stdout $(CLUSTER).$(PROCESS).out --stderr $(CLUSTER).$(PROCESS).err dmtcp_env = DMTCP_TMPDIR=./;JALIB_STDERR_PATH=/dev/null;DMTCP_PREFIX_ID=$(CLUSTER)_$(PROCESS) # job-specific setup, uses arguments and environment defined above arguments = $(dmtcp_args) some_command some_arg1 some_arg2 environment = $(dmtcp_env) queue 1 -- Michael Hanke Fri, 09 Mar 2012 8:19:25 +0200