OSF DCE SIG | R. Salz (OSF) | |
Request For Comments: 24.1 | April 1993 |
This document details OSF requirements for the serviceability support in DCE 1.1. It describes a message-reporting facility to be provided by OSF. All technology providers will be required to modify their code to use this facility at their existing message code-points. Additional messaging points will have to be added, so this document also presents criteria providers can use to determine where these code points are.
This document is OSF's response to the SIG requirements as detailed in [RFC11]. During the course of several meetings and discussions within OSF we prioritized the requirements and reviewed the design presented here so that reasonable goals could be set for DCE 1.1.
This document is not intended to be a complete description of the API's and facilities. The definitive reference is the manual pages and source that will be part of DCE 1.1; they are currently unavailable.
A number of changes have been made since this document was originally published [RFC24]. Most of the changes were made in response to comments made by suppliers. This section gives a high-level overview of the changes that were made, in no particular order:
The goal of serviceability is to make DCE easy to install, configure, and administer. The last area requires that DCE programs provide enough information so that an administrator can isolate faults and correct errors.
The primary method of accomplishing this is by having DCE programs detect error conditions and generate a message. All messages must be generated by calling a single routine. All components must be modified so that, e.g., \*(sBprintf()\*(sE calls are replaced by slightly different \*(sBdce_svc_printf()\*(sE calls. By requiring that all messages use this API, licensees can easily replace DCE serviceability facilities with their own, native, facilities.
All DCE messages must be consistent, providing a standard minimum amount of information. In order to guarantee this, all message calls will use a provided macro as their first argument. This macro will expand to several arguments. By using a macro, licensees can add additional information with minimal code changes to either DCE or customer source code.
All messages must meet internationalization requirements, as specified in [RFC23] (see also [Ogura], [XPG4], and [RFC34]). As a summary, however, all messages must be stored in message catalogs, they must not use fragmentation, and they may be displayed using the XPG4 language-independent \*(sBprintf()\*(sE format string conventions. Messages will be uniquely identified by message number assigned from the status code space. No component should use the last 100 message numbers in its number space; these are reserved for licensee use. In general, messages should not be re-used unless the exact same error occurs in different places in the code. Because of the practical difficulty in achieving this, this is a guideline rather than a requirement.
Each message must have external explanatory text. This will be done using the message file described in [RFC23]. Each component will have a separate chapter in the Problem Determination Guide that is constructed primarily from the explanatory texts.
Only one process will write to each file, although the interface will be encapsulated so that licensees can extend this to allow different processes to write to the same file.
For DCE 1.1, we believe it is sufficient to provide only local log files written in human-readable or native binary format. We will provide an API to read the binary logs along with a sample log viewer. \&Remote Notification as described in [RFC11] is a non-goal for DCE 1.1. We might move to IDL network representation (that is, pickling) after it is available. We expect to eventually migrate to EVS as described in [RFC36].
Because messages are written under the control of an arbitrary format string, the data within the message is, essentially, opaque. This makes it very difficult to do any pre-generation filtering (as opposed to post-mortem data reduction tools such as \*(sBgrep\*(sE). A publicized hook is available so that licensees can add whatever run-time filtering they wish.
It must be possible to specify the disposition of messages at a fairly fine level of detail. In addition, it must be possible to do this consistently across all components, providing boot-time defaults with an override, as well as dynamic remote reconfiguration.
While we want to detect and report faults as soon as possible, we recognize that this can have performance implications. For example, before-the-fact data capture is not a goal for this release. Because error recovery in the simple case is difficult, and automated distributed recovery is a black art, cascading error processing and extended error processing as outlined in [RFC11] is not a goal. It is sufficient to generate an accurate, detailed message, and terminate.
There have been requests for expanded \*(sBprintf\*(sE such as %m to expand to the text of a DCE message, and %p to print a pointer value. Unfortunately, our I18N goals require that the XPG/4 %n$ format be supported. We do not have enough resources to provide an XPG/4 implementation that also understands additional format specifiers. A %b specifier is provided for binary log files; using this specifier on text log files will result in undefined behavior. This is a run-time check that will, perhaps unfortunately, not be enforced.
We will meet these goals by providing a new DCE serviceability component, and by requiring all existing code to use it. The functional outline and requirements it imposes are detailed below.
In order to meet the DCE 1.1 serviceability goals, all providers will have to add messaging calls to their code. The following criteria should be used to determine these points:
It is not possible to use these criteria and assign an absolute serviceability value to some software. Nevertheless, we expect the providers to use them while making a best effort. At a minimum, if all existing message points have been converted, all C library routine failures have been captured, and all program exits and administrative requests have been captured, then the requirements have been meet. Additional guidelines and macros may be made available during DCE 1.1 development to help make the conversion easier.
We partition DCE into pieces known as components. This is similar the standard use of the term within the DCE development group, but at a finer grain. The DCE library is divided into the following components (name given in parentheses):
Component names are limited to three characters chosen from the DCE status code rad40 alphabet.\*(f! The
Source licensees can refer to the implementation of \*(sBdce_error_inq_text()\*(sE for the definition of rad40.names of all components are set by OSF, and are not subject to I18N requirements. Note that not all parts of the DCE API have serviceability requirements. For example, there seems to be little need to include the \*(sButc_*()\*(sE functions. It is also possible that providers may identify additional components not listed above; this would be handled as described in the next paragraph.
Some executables may share large portions of code that is in a private library and not part of the general DCE library. Providers can create a new component to capture this common code, after consultation with OSF to determine the name and scope of the component. For example, \*(sBcdsd\*(sE and \*(sBcdsclerk\*(sE might share a \*(sBcut\*(sE (CDS utilities) component.
In addition to the library, each server executable has at least one component.
Component-level logging does not provide enough serviceability control. Components are divided into sub-components. Each sub-component corresponds to an architecturally-distinct part of the component. The number of sub-components will vary greatly by component, and will be determined by the provider. For example, threads might have a mutex sub-component and a general sub-component, and little else. The RPC runtime, on the other hand, currently uses a 28-element debug table, where each entry would become a sub-component. Most components should fall between these two endpoints.
The same sub-component name may appear in more than one component -- for example, every component should have a catch-all sub-component named general. The sub-component should be a single-word mnemonic chosen by the provider. It is not subject to I18N requirements. Each sub-component also has a (short, one-line) description that is stored in a message catalog. OSF will also review all the sub-components that are defined, and determine a standard name if any overlap is found.
Each message is uniquely part of a sub-component within a component.
Each message has a severity level attribute. The level is determined when the code-point is created and must be chosen from the following list (the manifest constant to be used is shown in parentheses):
In addition, some levels will be reserved for licensee-value-added use.
Within a given executable, messages can be routed according to their severity level. The reference implementation will only route to local files. This reasonably addresses end-user needs: send all fatal error messages to the console.
Debug-level messages in an executable can be directed at the component level. For example, send all RPC runtime debugging messages to \*(sB/opt/dcelocal/adm/secd/foo\*(sE and all CMA debugging messages to \*(sB/opt/dcelocal/adm/secd/bar\*(sE. This is an important capability for DCE developers and licensees.
It is also possible to vary the amount of debug information that is generated. This is done by attaching an active level to each sub-component. If a message is generated with a higher Debug Level than the current active level of the sub-component, then no output is generated.
The following \*(sBtypedef\*(sE defines a sub-component:
The \*(sBsc_descr_msgid\*(sE field is used to translate the \*(sBsc_descr\*(sE field of the sub-component. It may be zero to indicate that the message should not be translated.
These sub-components are collected into the serviceability table.
By convention this is named \*(sBcomp
__svc_table\*(sE, where
comp
is the name of the component.
The table is expected to be globally known throughout the component.
For example (where \*(sBNNN\*(sE is the message code):
\*(sBSams\*(sE, the message-text tool mentioned in [RFC23], will be able to generate this table from a higher-level specification.
All components will be required to register their table with the serviceability facility using the following function:
For example:
A compile-time macro is also available to create a global handle variable from a seviceability table:
A companion call, \*(sBdce_svc_unregister\*(sE, can be called to free the resources associated with a handle. It is normally not necessary to call this routine as the regular process exit handling will, e.g., close all open files.
Once the table has been registered, the following routine is used for all service-related messages:
Note that no status code is returned. The \*(sBtable_index\*(sE specifies the offset into the service table associated with the registered \*(sBhandle\*(sE parameter. \*(sBSams\*(sE, can be used to create boilerplate macros that expand to the initial fixed argument list.
\*(sBDCE_SVC\*(sE is an opaque macro that provides initial arguments. Its first argument is the registered handle; the second describes the print arguments. The expansion of this macro is variable; for example, it might expand to the text string handle, __FILE__, __LINE__. The ... arguments indicate that this is a \*(sBprintf()\*(sE-style routine. A sample call would be:
This could generate a message like the following (terminal \e denotes line-continuation/screen-wraparound that would normally appear as a single line on output):
Additional action can also be performed by OR'ing in any of the following with the attributes parameter:
In addition specialized routing can be used by OR'ing in any of the following:
For example, the following will log an error message, display it to the user, and then exit:
For debugging messages, the following two macros can also be used:
These macros can turn into null code in a production environment; in a debugging environment they decide if the message should be generated. Because the C pre-processor does not allow macros with varying numbers of arguments, the double parentheses trick must be used here. The first version takes a text string, while the second resembles \*(sBdce_svc_printf()\*(sE in that it takes a message ID and a format specifier. Note that \*(sBdce_svc_printf()\*(sE can also be used for debugging statements.
It is very important that debug messages not be used for errors that can occur during ordinary operation. For example, if the RPC runtime receives a corrupted packet this should be reported as a correctable error.
It is possible to add per-component filtering. This is done using the following data types and API:
The \*(sBfilter_function\*(sE points to a function that takes a pointer to a prolog structure followed by the \*(sBprintf()\*(sE format string and arguments. A structure is used so that licensee-made changes in the macro need not affect filtering routine sources.
An additional hook is used to provide remote control of filtering by registering a control function. This function takes an array of bytes and parses it according to the licensee-supplied semantics to control the filtering.
Each non-debug level can be routed separately. When a component first calls \*(sBdce_svc_register()\*(sE, a file is consulted to determine where each level is routed. Then an environment variable is consulted (see below).
In addition to these initialization mechanisms, the following routine can be called to specify where messages get routed:
The \*(sBwhere\*(sE parameter is divided into three parts, separated by colons: the level, a routing identifier and a routing parameter. (A single parameter is used, rather than three, because it will typically come from a single command-line flag or an environment variable.) OSF will implement at least the routing identifier \*(sBFILE\*(sE, where the parameter specifies the filename, as in \*(sBFILE:/tmp/log\*(sE. Licensees can provided value-added features such as \*(sBIPC:syslog\*(sE to indicate that the local \*(sBsyslog\*(sE daemon should be used. A full specification would be \*(sBFATAL:FILE:/dev/console\*(sE. For this routine only, if the first part is omitted, then the routing is set for all levels.
All components will have a specified environment variable that
can be used as the default \*(sBwhere\*(sE parameter.
The variable will be named \*(sBSVC_COMP
\*(sE,
where COMP
is the name
of the component in all uppercase.
All server executables should have a standard flag -- we
propose \*(sB-w\*(sE -- that can be specified multiple times to set
multiple routine requirements.
For example:
Debug-level messages are routed using the following routines:
The first routine sets both the debugging levels and the routing. The second routine sets only the levels, typically after the routing has been set. These routines are typically used to process command-line flags.
Debug flags are specified as a comma-separated list of sub-component name
and numeric level separated by a period.
An asterisk indicates all sub-components.
The settings are parsed in order, so \*(sB*.1,cnauth.9\*(sE
could be used to
obtain minimal debugging within the RPC component, with extensive
detail for the AUTH/CN sub-component.
Again, after the config file an environment variable
\*(sBSVC_COMP
_DBG\*(sE,
where COMP
is the name of the component, specifies
the level and disposition of debugging messages.
The remote interface (described below) will call these routines to dynamically change message disposition.
All servers will be required to export the OSF-written management routines that allow for queries and redirection of messages. If GDS is not converted to use threads then GDS will not be required to support the remote interface. A reasonable alternative would be to have an external program that exports the interface and forwards all requests to GDS using appropriate IPC.
All components will be under a single ACL that must be implemented by the manager. If a server already has the concept of a management ACL, then that can be used to control this interface; otherwise OSF will provide a default ACL manager that servers can use.
Note that this puts an additional requirement on all servers: each server must change to its own working directory upon start-up. (Related servers such as \*(sBcds\*(sE and \*(sBcdsclerk\*(sE can share a directory.) The default directory should be something like \*(sB/opt/dcelocal/var/server\*(sE.
The interface will include the following:
Control programs should probably be enhanced to access this interface, although this is not a requirement. The DCE administrative shell [RFC30] will probably have commands to access this interface.
Using this API, a new command, \*(sBdcesvc\*(sE, will be provided. It can be used to remotely get and set the disposition of all messages for any DCE server. Typical uses are as follows:
Entities are names within the namespace. All servers will be required to export a UUID and enter it into an OSF-specified place within the namespace. This UUID must have an ACL attached to it.
The following list is intended as a guide for providers so that they can estimate the amount of work that will be involved in converting their offering to meet the requirements described here:
The following library routine is intended to show how the various parts of the serviceability component are used. The \*(sBmem_s_*\*(sE values are message IDs defined elsewhere.
Rich Salz | Internet email: rsalz@osf.org | |
Open Software Foundation | Telephone: +1-617-621-7253 | |
11 Cambridge Center | ||
Cambridge, MA 02142 | ||
USA |