140 likes | 183 Views
gLite Error Handling. Steve Fisher for JRA1-UK. Introduction. Good error handling is appreciated by users Bad error handling incurs the wrath of Stephen Burke. Examples from Stephen B. From long experience I think there's a hierarchy of bad error messages: 1) Crash, core dump etc.
E N D
gLite Error Handling Steve Fisher for JRA1-UK
Introduction • Good error handling is appreciated by users • Bad error handling incurs the wrath of Stephen Burke Errors - Brno
Examples from Stephen B • From long experience I think there's a hierarchy of bad error messages: • 1) Crash, core dump etc. • 2) No error message, so you think it worked when it didn't • maybe worse than 1 • 3) Something which might indicate an error or might not, e.g. “No results returned” from R-GMA. • 4) A catch-all error which translates to “something went wrong”, e.g. “ERROR: Failed to instantiate Consumer” from R-GMA. • 5) An error which assumes a particular cause when in fact there are many causes, e.g. “invalid argument” from the lcg-* tools. • 6) A message which can only be translated to the real cause by the initiated. Errors - Brno
Examples - continued • 7) A message which almost tells you what happened, but leaves out some vital information: “couldn't open file” - but which file?! • 8) A 50-line dump of everything the code can find, which has the real error buried somewhere in it, e.g. “expired host certificates” with GSI. • 9) A message which tells you what went wrong in a way which makes it clear that the code could have recovered itself but didn't bother, e.g. edg-rm giving up when the first replica fails even when there might be 30 others to try. • I would include 10), a helpful error message which tells you exactly what went wrong and what to do about it, but I don't think I've ever seen one of those ... Errors - Brno
So… • Good error handling is most important when one gLite component calls another • Error passed finally back to the user must be • Comprehensible • Comprehensive • It must be easy for the API user to take appropriate action • i.e. don’t expect the user to do pattern matching on an error message • 4 Areas • Internal to a service • The service interface (WSDL) • gLite API • Displayed by a gLite provided tool Errors - Brno
Internal to a service • There is no reason to suggest any rules • Services can preserve their autonomy • For R-GMA we use moderately deep exception hierarchy Errors - Brno
In the WSDL • Use a small number of WSDL faults: <element name="UnknownResourceException" type="rgma:UnknownResourceException"/> <complexType name="UnknownResourceException"> <sequence> <element name="errMsg" type="xsd:string" minOccurs="0"/> <element name="errNo" type="xsd:int"/> </sequence> </complexType> <wsdl:message name="UnknownResourceExceptionMessage"> <wsdl:part name="fault" element="rgma:UnknownResourceException"/> </wsdl:message> <wsdl:operation name="setTerminationInterval" parameterOrder="resourceId terminationInterval"> … <wsdl:fault name="UnknownResourceException" message="impl:UnknownResourceExceptionMessage"> </wsdl:fault> … </wsdl:operation> Errors - Brno
R-GMA set of faults • RGMAException • xsd:string errMsg(0..1) • xsd:int errNo • xsd:string trace(0..1) • UnknownResourceException • xsd:string errMsg(0..1) • xsd:int errNo • RGMASecurityException • xsd:string errMsg(0..1) • xsd:int errNo Errors - Brno
Could generalise • ServiceException • xsd:string errorMessage(0..1) • xsd:int errorNumber • xsd:string trace(0..1) • UnknownResourceException • xsd:string errorMessage(0..1) • xsd:int errorNumber • AuthException • xsd:string errorMessage(0..1) • xsd:int errorNumber errorMessage is free format string errorNumber is a “small” integer trace is free format string Auth rather than Security because of java.lang.SecurityException clashes If one service calls another which returns an exception it is the responsibility of the caller to generate a decent message and error number. Information from the underlying problem can be added to the trace. Errors - Brno
API view • Errors get passed from the Service back to the user in a style appropriate to the language. • For Java, C++ and Python use Exceptions matching the WSDL • For C we use an object like thing: if (RGMAPrimaryProducer_insert(pp, insert) != 0) { fprintf(stderr, "Failed to insert.\n"); fprintf(stderr, "<%s>\n", RGMA_getException(pp)->errorMessage); exit(1); } Errors - Brno
API Errors • Additionally some errors can be generated by the API: • RemoteException • unable to contact the service • AuthException • same as service returns but this time due to authentication problem • ServiceException • user does not know what is in the API and what is in the service. • from a user perspective the API is the service • Each API should provide a set of symbolic constants for the error numbers. • Changing the error numbers introduces an incompatibility • No attempt should be made to interpret the value of the number • The error messages are for humans and are subject to change Errors - Brno
The 4 types of exception • AuthException • User should ensure that he is authenticated and has the right authorization. He should not get back much information. • RemoteException • Unable to contact the service. You might want to try again. • UnknownResourceException • Try remaking the resource – though you want to wait a little while first or limit the number of attempts • ServiceException • This may be in invalid interaction with the service or it could be a faulty service. Consult the error message. Errors - Brno
CLI • The CLI will normally trap and handle errors • Unexpected errors should result in printing the error message but not the trace unless the CLI is being run in debug mode. Errors - Brno
Conclusion • Most of the issues about errors are non-technical • Error handling needs to be taken seriously with full attention to the messages: • Comprehensibility • Comprehensiveness • We should try to agree upon the principles Errors - Brno