Locating error codes for Condor and Globus

Condor

Local Condor errors

Local errors are echoed to standard output and of the form:

ERROR: this is the error message

Examples include:

ERROR: Can't find address of local schedd

or

ERROR: proxy has expired

Remote Condor Holds

If Condor encounters a problem that is not a fatal error it will often put a job on hold and assign the job a hold code to let the user know why their job has been held. There are several means of finding the hold code.

$ condor_q -hold

will display information for all held jobs (alternatively you can use the flag -all for information on all jobs).
Alternatively, you can check the condor log file for each job. (If running a DAG, the <job>.dagman.out file for a job is particularly useful.)
Frequently you will receive a hold code of 2, this means that a Globus error was responsible for the job being put on hold. The Globus error will also be specified and you can look at our list of Globus errors to try and identify the problem.

Globus

If doing globusrun or globus-job-run the errors will be echoed to standard out. When using Condor, issues that cause a job to be held often have a globus error related to them. See above information on finding Globus erros when using Condor.