OSF DCE SIG                                               Sue Kline (HP)
   Request For Comments: 39.0                             Alex McLeod (IBM)
                                                       Makoto Nishino (IBM)
                                                       David Obermann (IBM)
                                                     Francis X. Rojas (IBM)
                                                       Arne Thormodsen (HP)
                                                                 March 1993


           AN INTERNATIONALIZED DCE CHARACTER HANDLING PROPOSAL --
          INTERCHANGE OF CODED CHARACTERS CONVENTIONS AND MECHANISMS


   1. INTRODUCTION

      In today's global computer marketplace, there are numerous character
      set encodings available that are designed for supporting various
      national and international market sectors.  Many of these codesets
      are based on standards, some have become industry "defacto" standards
      due to their acceptance within a marketplace, while still others are
      vendor proprietary.  To be successful in the global marketplace, DCE
      must provide a framework whereby applications can interoperate in
      regional and global networks, even though differing codesets may be
      in use.  In other words, DCE should provide the enabling services to
      properly transport encoded character data from one system to another
      and the enabling facilities for the receiving system to decode the
      data into an acceptable representation for local processing.

   1.1. Abstract

      This paper proposes to:

        (a) Enable the interchange of data within a global heterogeneous
            networking environment.

        (b) Enable the interchange of data within a large variety of
            existing regional, relatively homogeneous, environments.

        (c) Define the conventions to be used to interchange coded
            character data of different encodings.

        (d) Provide solutions for the DCE 1.1 timeframe in addition to
            looking at possible longer term architectural solutions.

        (e) Provide solutions that retain backwards compatibility with
            existing DCE applications and do not modify the current RPC
            protocol.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 1


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   1.2. Problems with Existing DCE Support for Character Data

      DCE was not originally designed with provisions to support the vast
      number of character encodings that exist in the global marketplace.
      The concept of a "char" within IDL includes inherent restrictions to
      ASCII and (a single codepage of) EBCDIC, thereby precluding support
      for all other code-sets.  This existing IDL "char" behavior must
      continue to be preserved to ensure that existing DCE applications are
      not broken.  However, in order for DCE to be successful in the global
      market, DCE must also be able to support other (non-ASCII/non-EBCDIC)
      encodings.

   1.3. Regional Interoperability Requirements

      Regional interoperability requires that a solution be provided
      whereby systems within the network can successfully interchange
      character data that is in the same language, but may or may not use
      the same character set or encodings.  In this environment, the number
      of character sets and encodings is small, and they are well-known
      across the network.  In these cases it is possible to define
      conversions that result in little or no data loss under most
      circumstances.  Examples of such cases are numerous throughout the
      world, but to name a few specific "must-solve" cases:

        (a) Japan -- Shift-JIS and EUC encodings of Kanji characters.

        (b) Western Europe -- ISO-88591 standard character set and a wide
            variety of proprietary ones.

      Ideally, a regional interoperability solution will allow for the most
      efficient transfer of characters if a single codeset is in use
      throughout a network.  The emphasis is to avoid unnecessary data
      conversions at either the data sender or receiver ends.  For example,
      it should be possible to transfer data between two identical machines
      using the same local encoding (either standards-based or vendor
      proprietary) without any data conversion.  When conversions are
      necessary within a regional network, they should be optimal
      conversions.  A case in point example is the Japanese market, where
      Shift-JIS and EUC conversions are commonplace within applications
      today and consequently are highly optimized.

   1.4. Global Interoperability Requirements

      Global interoperability requirements involve the transfer of data
      between systems which may have no direct ability to process or
      convert the character sets and encodings used on other systems.  The
      character sets may be designed to support different languages, and so
      only share a small subset of common characters, or perhaps no
      characters at all.  In some cases proprietary sets may be in use, and
      only the local system they are used on can convert from these
      encodings to other, more widely used, encodings.  The best example of


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 2


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      these problems are found in a heterogeneous, global network of
      systems, such as is evident in most multi-national corporations.

      Global interoperability entails a more sophisticated solution than in
      the regional case.  DCE must be able to handle any character encoding
      that may be in use, and also identify those cases where
      interoperability is not possible (for example, when two entirely
      non-intersecting character sets are in use on different systems).

      One single, universal, encoding must be provided by DCE for on-the-
      wire representation of character data.  This allows interoperability
      between systems which otherwise cannot convert directly to each
      other's encodings.  It also allows systems which use encodings not in
      use on a given network to be integrated into the network anyway,
      provided that they convert to and from this universal encoding when
      accessing other systems.  A possible trade-off of optimal performance
      in favor of functionality may occur in order to fulfill the global
      interoperability requirements, as conversions to/from this universal
      data representation would be required in most cases.  Also, as
      mentioned, there will be cases where interoperability cannot be
      achieved.  These cases must be reliably identified, preferably before
      client-server communication is attempted.

   1.5. Overview of this Proposal

      This proposal provides solutions for addressing both the regional and
      global problem areas by focusing on enhancements within the RPC
      component and on related RPC-based services.

      The remainder of the paper is broken into the following Sections:

        (a) Terminology used in this paper.

        (b) A description of the functionality and implementation
            requirements.

        (c) A description of various proposed codeset conversion models.

        (d) A description of proposed extensions to the DCE NSI and some
            associated support API's.

        (e) A description of a proposed codeset identification scheme and
            some associated support API's.

        (f) Dependencies and assumptions.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 3


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   2. TERMINOLOGY

      The following terminology will be used throughout the remainder of
      this proposal:

        (a) "character set" -- A collection of symbols used to represent
            information, typically a written human language.

        (b) "codeset" -- A character set with assigned numeric codes.

        (c) "codeset_context" -- A set of information including the local
            codeset of a given client/server and the set of related codeset
            converters locally available.  This is explained in more detail
            in Section 5.7.

        (d) "codeset tag" -- Identifies the codesets involved in the
            request/reply round trip.  The codeset tag consists of two
            parts, the "transmit_id" and the "response_id".  This tag is
            also referenced as a "codeset_t" type in this paper.

        (e) "cs_type" -- The transmit_id and response_id (below) are of
            type "cs_type".

        (f) "local codeset" -- Whatever character type and encoding that is
            in use on a given client/server.  The possibilities include
            single-byte, multibyte and wide character types.

        (g) "reply" -- A data transmission issued by the server in answer
            to a specific client request.

        (h) "request" -- A data transmission issued by the client to a
            server.

        (i) "response_id" -- The expected code set of the next incoming
            data transmission.

        (j) "transmit_id" -- The code set of the current outgoing data
            transmission.


   3. DESCRIPTION OF REQUIRED FUNCTIONALITY

      In order to satisfy the goals outlined in Section 1.0, several
      enhancements or additions to the existing DCE are needed.  These
      additions include extensions to the RPC to support codeset
      identification and conversion, runtime library support for this
      extended functionality, and name service (NSI) extensions to allow
      clients to determine the codesets supported by a server.

      Below are a high-level description of the requested functionality,
      followed by a lower-level description of the implementation


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 4


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      requirements to implement this functionality.

   3.1. High-level View of New DCE I18N Functionality

      One of the major goals of this proposal is to help clarify whether a
      given feature is expected to be implemented as a feature of the IDL
      compiler, to be provided as a library function with the DCE product,
      or to be implemented by actual changes or additions to the logic of
      an application.  This proposal attempts to minimize this last item.

      Our goal is to permit a distributed application to be
      internationalized to some useful degree with only changes to the
      interface definition.  At the same time we recognize that some
      functionality can only be implemented by making changes in an
      application, for example by changing the logic used by a client to
      select a server.

      Therefore, as a guideline to implementors, it is indicated if a given
      area of functionality should be implemented by changes in one or more
      of the following areas:

        (a) "APP" -- The actual client and/or server implementations
            ("manager code").

        (b) "STUB" -- Logic flows in the IDL-generated stub code.

        (c) "LIB" -- Runtime support library functions, may be called from
            application code or stub code.

      The areas of new functionality required are:

        (a) A mechanism for clients and servers to determine their own
            local codesets, and a mechanism to make this information
            visible to the RPC stub code.  It is not anticipated that more
            than two codesets (one for each "end") will need to be
            supported per client/server connection.  (APP, STUB, LIB)

        (b) A mechanism, supported via the DCE NSI, for a client to
            determine if a given server supports particular codesets.  See
            Section 5.0 for more details.  (APP, LIB)

        (c) A specific protocol for a client to indicate to a server, and
            for a server to indicate to a client, what codeset is in use
            "on-the-wire".  (STUB, LIB)

        (d) A specific protocol which allows a client optionally to
            indicate to a server how and where conversions should be
            handled.  In particular, support for at least the conversion
            models discussed in Section 4 should be provided.  (APP, STUB,
            LIB)


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 5


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


        (e) A mechanism to do actual codeset conversions.  (LIB)

        (f) A mechanism for clients to handle codeset conversion errors
            which occur either locally or at the server.  This support
            should be integrated into existing RPC error-handling
            mechanisms.  (APP, STUB, LIB)

   3.2. Implementation Requirements

      The functionality outlined above translates into several specific
      implementation requirements.  These are listed below, with no
      prioritization:

        (a) Character data parameters appearing as scalars, fixed-length
            arrays and null-terminated strings, must all be handled
            properly.  Characters of various sizes, for example ISO 8859
            8-bit characters, UNICODE 16-bit characters and ISO 10646 32-
            bit characters, must all be handled properly.

            (NOTE: This paper does not address the issue of different
            character data sizes being used on clients and servers.  This
            is not an internationalization issue, and handling such
            conversions is outside of the scope of the basic DCE RPC
            mechanism.)

        (b) Character conversions which result in a change in the number of
            character elements in a data structure must be handled
            properly.  Note that in certain cases (i.e., character data
            embedded in a fixed-size structure) generation of an error may
            be the appropriate action.

        (c) The protocol used for clients and servers to indicate their
            local codeset to each other should not be visible at the level
            of an RPC call within an application.

        (d) The mechanism used for clients and servers to do actual codeset
            conversions (including data size issues, as in item B above)
            should not be visible at the level of an RPC call within an
            application.

        (e) Client and server applications must have a mechanism to
            indicate to each other what local codesets are in use, and what
            encoding is being used for "on-the-wire" data.

            A specific protocol must be provided for indicating the
            identity of codsets (see Section 6).  OSF should supply a
            process for the provision of these id's.  This process must
            provide specific identifiers for the codesets in Section 7.3.2
            as well as other sets found to be of general interest to the
            DCE community.  It must also allow for private user and
            vendor-specific extensions.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 6


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


        (f) A single server must be able to communicate with clients which
            support different conversion models (see Section 4).

        (g) The DCE source, as delivered, must define a set of internal
            interfaces that can be used by implementors to allow access to
            their own set of interoperability routines (i.e., codeset
            converters, conversion policy controls, etc.).  The DCE source
            must also provide at least one set of interoperability routines
            which may be used "as-is" by DCE applications, and which
            support the conversion models described in this proposal (see
            Section 4).  At a minimum, these should support the OSF/1-
            supported codesets as listed in Section 7.3.2.

            (NOTE: Conversions to/from the above mentioned codesets and a
            UCS encoding must be provided by DCE independently of the
            underlying OS.  This is because these conversions must be
            supported in all implementations of the DCE 1.1 to assure
            conformance to the interoperability goals of this proposal.)

        (h) A mechanism must be provided by DCE, presumably through the
            name services, to allow a client to determine the codesets and
            conversions supported by a server.  Furthermore, the client
            should be able to accept/reject binding to a server based on
            this information.  This facility, which is intended to provide
            a method of ensuring data integrity for client-server
            connections ("dynamic model"), is discussed in Sections 4.4 and
            5.0.

        (i) A method must be provided to allow servers which support this
            new internationalized behavior to be backwards compatible with
            older clients.  This will presumably be handled through the
            provision of multiple interfaces to a given server.

        (j) A mechanism must be provided for conversion errors at both
            client and server to be reported to a client.  Such errors may
            include (but are not limited to):  unknown character encoding,
            invalid character encoding, and buffer overflow during
            conversions.

            (NOTE: This proposal does not address the error handling issue
            as it was felt that it is highly implementation dependent).


   4. DESCRIPTION OF PROPOSED CONVERSION MODELS

      Three proposed conversion models are discussed in this Section:
      Universal Character Set (universal), Receiver Makes it Right (RMIR)
      and Dynamic (dynamic).  Three models were found to be needed in order
      to accommodate the various efficiency and functionality concerns
      which arose while forming this proposal.  The particular model is
      determined by the nature of the client.  Only one kind of server is


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 7


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      described here, it will support all conversion models.

      This last point is important to note.  The conversion models are
      determined not by the codeset converter itself, but by the values of
      the codeset tags (see below).  This important feature allows for a
      great deal of flexibility, and extensibility, within a simple
      framework.

      Conversion models other than the ones discussed below are possible.
      These models below were selected because they seemed to be the
      simplest set which could cover all of the design criteria.  Since the
      models are implemented in libraries external to both application and
      stub code, other models are possible within the framework of this
      proposal.

      Key to all of these models is the codeset tag, which consists of two
      parts as illustrated in the following diagram:

                         +---------------------+--------------------+
            codeset tag: |     transmit_id     |     response_id    |
                         +---------------------+--------------------+

      The "codeset tag" is a conceptual model for the information which
      must be passed between a client and a server to implement the models
      described below.  This tag consists of two parts, each of type
      "cs_type".  The "transmit_id" indicates the codeset of the current
      outgoing transmission (assuming there is one) while the "response_id"
      indicates the desired (and expected) codeset of the next incoming
      data transmission.  In most cases these values will be the same,
      however, as discussed below, the response_id may take on the special
      value "no value", to indicate that the client has no preferred
      response codeset (i.e., it assumes it can convert anything it
      receives to it's own local codeset).

      It is the intent of this proposal that this tag, in whatever form it
      takes in an actual implementation, not be visible at the level of an
      RPC call.  It should only appear in stub-stub communications, and
      should only be specified in ".idl" (and/or ".acf") files.

      To illustrate, assume that a client is sending requests in codeset
      "X", and requesting replies from the server in codeset "Y" (not a
      highly realistic situation).  The codeset tag for outgoing, client-
      generated request data would then be:

            client       +---------------------+--------------------+
            codeset tag: |         X           |          Y         |
                         +---------------------+--------------------+

      Correspondingly, the server in this case would be transmitting the
      reply back to the client in codeset "Y", and requesting data from the
      client in codeset "X".  The codeset tag for all outgoing, server-


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 8


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      generated replies to the client would be:

            server       +---------------------+--------------------+
            codeset tag: |         Y           |          X         |
                         +---------------------+--------------------+

            (NOTE: The "response_id" may be set to "no value", indicating
            that the transmitting side does not have a codeset preference
            for data it will receive.  This is used by the "RMIR"
            conversion model, see Section 4.3.)

      This model is conceptual only, for efficiency any actual
      implementation should minimize where possible the amount of tag data
      exchanged between clients and servers.  As an example, for a "[in]"
      only parameter operation, there is clearly no need for the server to
      send back any tag information, since in this case, no character data
      moves from the server to the client.  Similarly, there is no need, in
      most cases, for the server to send back a "response_id" unless it is
      known that more data will be sent from the client, and that it is
      possible for the client to convert to the server's preference.  These
      and similar issues are not discussed below, for the sake of
      simplicity.

      Also, note that in all cases the codeset "conversions" described may
      be "no-op" conversions, if the on-the-wire encoding is the same as
      the local codeset.  This situation is not described as a special case
      in any of the models.

   4.1. Determination of Conversion Models

      There are three conversion models discussed below.  A natural
      question which may arise is how the client determines which model to
      use.  In the case of the "dynamic" model (Sec 4.5) the actual
      interaction is determined at runtime.  However for the other two
      models there are several possibilities:

        (a) The conversion model may be statically determined at the "IDL"
            compile time by specifying attributes.

        (b) The conversion model may be determined at runtime by some
            information external to a client, such as a configuration file
            or environment variable.

        (c) The conversion model may be indicated by the client via an API
            which sets some global value which is in turn used by the stub
            code to determine the model.

      Each of these approaches has advantages and disadvantages.  We were
      unable to come up with a clear answer as to which might be the best.
      All three approaches would work, and all three can be made extensible
      to new conversion models (although in case 'a' this will require a


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen               Page 9


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      well-conceived design on the part of the IDL developers).  This issue
      needs to be resolved during the design and implementation process.

   4.2. Universal Character Set Model -- 'universal'

      The Universal Character Set (UCS) model is the simplest model to
      understand and implement, but necessarily not the most efficient.  In
      this case, local codesets are always converted to a UCS encoding
      before transmission.  Despite the possible performance impacts, this
      model is essential; it provides the only way to integrate a client
      which supports a particular codeset with a server which does not.  A
      high-level description of this model is below.

        (a) CLIENT:

              (i) Any character data sent to the server is converted from
                  the local codeset to the UCS encoding prior to
                  transmission.

             (ii) Both id fields in the codeset tag are set to the "UCS"
                  id.

            (iii) Any character data returned from the server is converted
                  from the codeset identified (which will be UCS) to the
                  local codeset.  This conversion (UCS to local) is
                  required to be available to the client.

             (iv) Any errors encountered while converting are returned to
                  the client.

        (b) SERVER:

              (i) Converts any client character data from the codeset
                  specified by the client's "transmit_id" (which will be
                  UCS) to the local codeset.  As with the client, this
                  conversion to the local codeset must be guaranteed.

             (ii) Any character data returned to the client is converted
                  from the local codeset to the UCS encoding.  (As with the
                  client, both id fields in the codeset tag will be set to
                  the UCS id.)

            (iii) Any errors encountered while converting are returned to
                  the client.

   4.3. Receiver Makes it Right Model -- 'RMIR'

      The Receiver Makes it Right (RMIR) model is closely analogous to the
      existing conversion mechanism which the DCE RPC uses for integer and
      floating point data types.  It is expected to be the predominant
      mechanism in a well-characterized network of similar-powered machines


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 10


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      supporting a limited number of codesets.  This situation is
      encountered, for example, in the US, Western Europe and Japan with
      well-characterized networks of PCs and workstations.  The RMIR model
      performs the minimum number of conversions, and distributes these
      evenly across all communicating machines.

        (a) CLIENT:

              (i) Any character data sent to the server is sent in the
                  local codeset, and identified in the client's
                  "transmit_id".  Any character data returned from the
                  server is converted from the codeset indicated by the
                  server's "transmit_id" to the local codeset (this
                  conversion may fail if the appropriate converter is not
                  available).

             (ii) Any errors encountered while converting are returned to
                  the client.

        (b) SERVER:

              (i) Attempts to convert from the codeset specified by the
                  client's "transmit_id" to the local codeset (this
                  conversion may fail if the appropriate converter is not
                  available).

             (ii) Any reply data back to the client is sent in the local
                  codeset with the "transmit_id" of the reply indicating
                  the local codeset identifier.

            (iii) Any errors encountered while converting are returned to
                  the client.

      (NOTE: The "response_id" of both client and server codeset tags is
      always "no value" for the RMIR model.  Also, the converters at either
      end is guaranteed at least to be able to convert the UCS encoding to
      the local codeset, allowing UCS clients and/or servers to always
      interoperate under this model)

   4.4. Dynamic Model -- 'dynamic'

      The dynamic model attempts to perform conversions in the most
      efficient manner for all cases.  It is especially suited to
      asymmetric cases, such as many clients accessing a single server.  In
      this particular case, this model will ideally result in clients doing
      all necessary conversions, thus offering the highest server
      performance.

      The dynamic model depends on the client having access to the server's
      local codeset and conversion capabilities as part of the binding
      process.  A determination can then be made as to which conversion


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 11


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      policy should be utilized at RPC runtime (see Section 5.0).

      Because of this feature, it is impossible to describe a definite data
      flow between client and server, as was possible for the two models
      above.  This model is discussed in detail in the following
      Subsections.

   4.4.1. Rationale for dynamic model

      The dynamic model's prime purpose is to determine the best codeset
      for a specific binding.  All operations within an interface are
      expected to share the results of this determination; hence, the
      determination need only occur once per binding.

      In order to provide enough information so that the data conversion
      can be fully optimized, the following information is needed:

        (a) Codeset information to identify both the client's and the
            server's respective local codeset.

        (b) Conversion capability information to identify what type of
            optimizations can be done respectively by the client and
            server.

      The dynamic model clearly must arrive at no conversions in a
      homogeneous network.  At the same time, the goal is to optimize the
      conversions in heterogeneous environments for all the situations
      described in the table which follows.  In this table 'X' and 'Y'
      refer to two different codesets, and 'knows' implies the ability to
      convert to the local codeset.

            +---+------+------+------+------+---------------------------+
            |   |client|client|server|server| note                      |
            |   |using |knows |using |knows |                           |
            +---+------+------+------+------+---------------------------+
            | 1 |  X   |  X   |  X   |  X   | using same one.           |
            | 2 |  X   |  X,Y |  Y   |  X,Y | both have X<->Y converter |
            | 3 |  X   |  X,Y |  Y   |  Y   | only client has X<->Y     |
            | 4 |  X   |  X   |  Y   |  X,Y | only server has X<->Y     |
            | 5 |  X   |  X   |  Y   |  Y   | neither has X<->Y         |
            +---+------+------+------+------+---------------------------+

      Given all of these possible situations, the dynamic model needs to
      arrive at the optimal conversion policy for each case.

      There are several possibilities for conversion policy:


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 12


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


            +-----------------+ +--------------------+
            | Configuration   | |    Dynamic Model   |
            +---+------+------+ +--------------------+
            |   |client|server| |   Possible Models  |
            |   |knows |knows | |  A   |   B  |  C   |
            +---+------+------+ +------+------+------+
            | 1 |  X   | X    | | HOMO | HOMO | HOMO |
            | 2 |  X,Y | X,Y  | | SMIR | RMIR | CMIR |
            | 3 |  X,Y | Y    | | CMIR | CMIR | CMIR |
            | 4 |  X   | X,Y  | | SMIR | SMIR | SMIR |
            | 5 |  X   | Y    | | UCS  | UCS  | UCS  |
            +---+------+------+ +------+------+------+

            Key: HOMO := Homogeneous network, no conversion
                 RMIR := Receiver makes it right conversion
                 SMIR := Server makes it right conversion
                 CMIR := Client makes it right conversion
                 UCS  := Universal Character Set conversion

      After careful analysis, model 'C' was found to be the best choice.
      While the analysis is not included in this discussion, it is safe to
      state that the main reason for choosing 'C' is that it offers (on
      average) the best server performance by offloading conversions to the
      client whenever possible.  This issue is of importance in
      configuration "2" above, where multiple possible conversion models
      exist.


   5. MECHANISM FOR EXCHANGE OF CODESET AND CONVERTER INFORMATION

      For the dynamic model to work optimally, the client must know the
      codeset and the conversion capabilities of the server (e.g.  the
      server's codeset_context).  While there are various way to do this,
      this proposal suggests that extensions be provided to NSI to allow
      the server to announce it's local codeset and set of related codeset
      converters.  Once the binding between a given client and server
      occurs, the conversion policy information, as determined as part of
      the binding process, should be made available to the RPC runtime via
      the server's binding handle.

      For each binding handle being used by a client for an RPC, the client
      queries the NSI to obtain the server's codeset_context.  The client
      then makes decisions using this information in conjunction with its
      own local codeset_context to determine the on-the-wire codeset for
      both directions.  The client must append the derived conversion
      policy as specified via a transmit_id and response_id (codeset_t) to
      the server binding handle.  This allows the RPC runtime to be
      informed of the conversion policy decision made at binding time.

      (NOTE: In this proposal the binding handle is used to store the
      conversion policy information because it is available, persistent,


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 13


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      and unique per RPC binding.  Other implementations are possible, but
      are not discussed here.)

      The next few Sections outline a set of extensions to the DCE NSI,
      along with some associated API's, which could be used to provide such
      support.  For the sake of clarity these Sections discuss specific
      API's.  It is expected that the actual implementation may depart from
      these specific API's, however the information discussed below must
      somehow be available to a client.

   5.1. Enhancements to NSI to allow the optional exportation of server
        codeset and available codeset converter information to NSI

      Enhancements to NSI are requested to facilitate the exportation of
      the server's codeset and set of available codeset converters.  This
      information will be used to support the "dynamic" model only.  It is
      useful for those applications which wish to ensure successful
      character data transfer as part of the client/server binding process.
      In some cases, only the client will be able to convert it's codeset
      to the server's codeset; other cases exist where only the server can
      convert.  To enable a decision as to the optimal conversion policy,
      the following enhancements must be provided:

        (a) At the time when the server is advertised to NSI, the server
            must be able to export it's codeset_context to NSI.  (See
            Section 5.2.)

        (b) NSI must be enhanced to accommodate storing the exported server
            information regarding the server's codeset and related set of
            codeset converters (together forming the "codeset_context").
            (See Section 5.7.)

        (c) When the client is engaged in selecting a suitable server to
            bind with, the client must be able to query NSI to obtain the
            server's exported codeset_context.  (See Section 5.3.)

        (d) If the server's codeset_context information is available, a
            determination is made by the client as to whether the client
            and server can bind, based on their respective local codesets
            and set of supported converters, and from that decision, what
            conversion policy is appropriate.  (See Section 5.4.)

        (e) If an acceptable server is found, the server's binding handle
            must be updated to reflect the derived conversion policy, which
            will be used in the subsequent (specified) RPC calls.  (See
            Section 5.5.)

      To implement these functions, the following set of API's are
      proposed.  (NOTE: The following API names are only suggestions.
      Actual names and the type/number of parameters depend on the
      implementation of NSI enhancements.)


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 14


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   5.2. void rpc_ns_extensions_codeset_add()

      This API allows the server to export it's codeset and the list of
      codeset converters to NSI.

      The API will utilize server's codeset_context obtained from a call to
      "rpc_local_inq_codeset()" (see Section 6.3).  Specifically, the
      following information will be exported to NSI:

        (a) Value of the server's codeset, identified by a (cstype) codeset
            value.

        (b) Number of related converters supported by the server.

        (c) Each converter, identified by a (cstype) codeset value.

   5.3. void rpc_ns_extensions_codeset_inq()

      This API allows the client to query NSI to obtain the server's
      codeset and related set of codeset converters.

      This API will retrieve the codeset_context exported to NSI by the
      server via the above mentioned "rpc_ns_extensions_codeset_add()"
      call.  This information includes:

        (a) Value of the server's codeset, identified by a (cstype) codeset
            value.

        (b) Number of related converters supported by the server.

        (c) Each converter, identified by a (cstype) codeset value.

      This API will allocate client-side storage at runtime to store the
      server-side information obtained and return a pointer to this storage
      to the caller.

   5.4. void rpc_local_resolve_encoding()

      This API allows the client to resolve if a binding with a given
      server can occur based on the codeset_context of the client and
      server, respectively.

      This API will utilize the client's codeset_context obtained via a
      call to "rpc_local_inq_codeset()" and server's codeset_context that
      was obtained by the previously described RPC call to NSI
      "rpc_ns_extensions_codeset_inq()".  The function of this API is to
      resolve whether the client or the server is capable of carrying out
      any required codeset conversions and establish the conversion policy
      for subsequent RPC operations between this client/server.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 15


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      This routine accepts a "resolution_level" parameter, which indicates
      the granularity of resolution required by the client.  It will return
      a status (success or failure) and appropriate codeset tags settings
      in the case of success through pointers.   If the codeset_context of
      the client and server match, then this routine will always return a
      success status.  Three resolution levels are suggested for an
      application to indicate the desired action when the codeset_contexts
      do not match.

      (NOTE: The numeric level values presented are purely arbitrary and
      are used for proposal purposes only.)

        (a) Level 0 -- Indicates that if the client and the server are not
            using the same codeset then a status should be returned
            indicating that no binding is possible.  A no binding  status
            may also be returned if (unspecified) error conditions exist.

            Level 0 is intended for those applications which require an
            absolute guarantee of data integrity and would rather not bind
            with a server if the server is not using the same codeset as
            the client.

        (b) Level 1 -- Indicates that if neither client nor server can
            directly convert to the other's codeset then a status should be
            returned indicating that no binding is possible for the
            codeset_context provided.  A no binding status may also be
            returned if (unspecified) error conditions exist.

            Level 1 is intended for those applications which want an
            assurance that either the client or the server has the
            necessary converters to convert the data to a local codeset.
            This assurance might be needed in cases where the overhead of a
            UCS conversion (see level 2 below) is not acceptable.  It might
            also be used as part of a optimized search for a server, where
            binding is sought first at a low resolution level, then at
            higher levels if lower ones fail.

            As data loss may occur during conversion between codesets, this
            level offers less guarantee of data integrity than Level 0.
            This data loss should be documented as part of the
            specification of the converter(s).

            (NOTE: Usage of this level indicates a willingness by the
            application to accept possible data loss in cases where the
            local character sets of the client/server do not perfectly
            match.  As mentioned this data loss should be well-
            characterized as part of the specification of the converters.)

        (c) Level 2 -- Indicates that if neither client nor server can
            directly convert to the other's codeset, the  client desires
            ISO-10646 UCS-2 network encoding as the conversion policy.  A


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 16


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


            status indicating that binding is possible will always be
            returned when  this resolution level is used unless
            (unspecified) error  conditions exist.

            Use of Level 2 indicates that the client is more concerned with
            connectivity than with conversion efficiency or with data
            integrity.  In cases where there are major character set
            mismatches (say between Arabic and Japanese) then only the
            "portable character set" as defined by OSF and DCE may be
            exchanged without data loss.  In cases where there are only
            "minor" character set mismatch, this will behave the same as
            Level 1 resolution level (i.e., possible data loss) but with
            potential dual conversion to/from ISO 10646.

            (NOTE: Usage of this level indicates a willingness by the
            application to accept possible data loss in cases where the
            local character sets of the client/server do not perfectly
            match.  Unlike level 1, this data loss cannot usually be
            characterized in advance as it depends on the interaction
            between two converters, one at each end of a data
            transmission.)

      In general, based on the logic defined, if a conversion policy can be
      successfully established, the codeset_t parameter is set to indicate
      the derived conversion policy.  This codeset_t parameter will also be
      annotated to the server binding handle by a subsequent call to
      rpc_binding_set_codeset_info(), described in the next Subsection,
      5.5.

   5.5. void rpc_binding_set_codeset_info()

      This API allows the client to set the derived conversion policy for a
      server binding handle.

      This will enable RPC calls specified with a "dynamic" model within
      the ACF "codeset_convert" attribute to access the per binding handle
      information of the derived conversion policy.  This policy was
      determined via a preceding call to rpc_local_encoding_resolve(), and
      is specified by the codeset_t parameter.  This API annotates the
      codeset_t parameter to the server's binding handle.

      It is important to note that this information must be stored in the
      selected binding handle as a client may choose to bind with a number
      of servers, and subsequently have a mixture of conversion policies
      established with these varied servers.

      A companion inquiry API, rpc_binding_inq_codeset_info(), could also
      be provided, although the usefulness of this API is not apparent.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 17


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   5.6. void rpc_free_codeset_ptr ()

      This API is needed to free the allocated local storage created by the
      call to rpc_ns_extensions_codeset_inq().

   5.7. Internals of Proposed NSI Extensions

      The NSI database must be able to store the server's exported
      codeset_context, and possibly other internationalization-related
      information that may be queried by the client.

      The following describes the possible format of such a structure:

                 +-------+----------------------------------+
            1)   |  ##   | version number of this structure |
                 +-------+----------------------------------+
            2)   |       | reserved ( 32 bits )             |
                 +-------+----------------------------------+
            3)   |  Y    | the local server codeset         |
                 +-------+----------------------------------+
            4)   |       | List of Conversions to/from Y    |
                 | ###   | number of codeset identifiers    |
                 | X1    | codeset identifier               | } optional
                 | X2    | codeset identifier               | } optional
                 | XN    | codeset identifier               | } optional
                 +-------+----------------------------------+

      Field 1 designates the version number of this structure, for
      backwards compatibility.

      Field 2 is reserved for future use.

      Field 3 designates the local code set of the server and is a codeset
      identifier type (cs_type).

      Field 4 is a list of codesets for which the server can support two
      way (round-trip) conversion with its own codeset as designated by
      Field 3.  This field consists of a count followed by the set of
      codeset identifiers.  It is possible that the list is empty, and
      thereby the count would be set to zero.

      Each entry in the list is a codeset identifier type (cs_type).  No
      assumptions may be made about the conversions ability to preserve
      invertibility (i.e., Xn->Y->Xn is not guaranteed to preserve all
      characters).  The conversions are, however, guaranteed not to fail
      due to an inability to convert particular characters.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 18


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   6. A DCE SUPPLIED CODESET ALIASING MECHANISM

      On stand-alone machines today, there is no standard method for
      identifying codesets or codeset converters.  Hence, each system
      vendor decided their own designations.  This has lead to a multitude
      of names being used for a single codeset.  For example, ISO-88591
      could be referenced by all of the following valid names:

            ISO-88591, ISO88591, ISO-LATIN1, ISO8859-1, 88591, LATIN-1,
            iso88591, iso-88591, iso-latin1, iso8859-1, 8859-1, latin-1,
            etc...

      To be useful in a distributed environment, these names must all map
      to one value.  Hence, it is recommended that a DCE-supported codeset
      aliasing mechanism is developed, whereby all these local names could
      be mapped to one codeset value supported, and understood, by DCE.
      This codeset value should be able to be transferred over the network,
      such as when being used by the IDL codeset_t tag field elements
      specified earlier.

      One proposal is that this mechanism could be a table which consists
      of a DCE-supported codeset value and a codeset string field.  The DCE
      codeset field should be the same type (cs_type) as the two codeset_t
      fields, transmit_id and response_id.  The codeset string field must
      be permitted to be modified by DCE licensees to their appropriate
      string namings.  In addition, it might also be beneficial to allow a
      "comment" field for each entry to state the proper name of this
      codeset.

      It is recommended that all OSF supported codesets are specified in
      this mechanism as part of the initial DCE interoperability offering.
      Also, it is expected that OSF will manage this list of codesets and
      encourage submissions of vendor-specific codesets.

      This proposal also mandates that a value be reserved to indicate no
      (cs_type) codeset value.  This is used by the codeset tag to indicate
      no conversions are specified by a selected conversion policy, such as
      the RMIR model when transmitting outgoing data.

      For this mechanism to be useful at runtime, several API's must be
      provided.

   6.1. cs_type rpc_codeset_lookup_id()

      This API performs the lookup of a codeset value, when supplied with
      the local system's codeset string value.  In the error case where the
      codeset_string has no equivalent codeset value the "no codeset" value
      must be returned.


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 19


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   6.2. char * rpc_codeset_lookup_string()

      This API performs the lookup of a local codeset string value when
      supplied with the codeset value.  In the error case where the codeset
      value has no string value associated with it, a "NULL" codeset_string
      value should be returned.

   6.3. void rpc_local_set_codeset()

      This API allows an application to set the value of the local codeset
      retrieved by "rpc_local_inq_codeset()".

   6.4. cs_type * rpc_local_inq_codeset()

      This API performs the lookup of a local codeset either as set by
      "rpc_local_set_codeset()" or through some default mechanism if
      "rpc_local_set_codeset()" has not been called.


   7. DEPENDENCIES AND ASSUMPTIONS

      This Section identifies the set of dependencies required by this
      proposal.

   7.1. Encoding of the 'universal' Model

      The proposed codeset to be used with the "universal" model is ISO-
      10646 UCS-2.  At least conversion to and from characters specified in
      Level 2 of this standard shall be provided.

   7.2. Local Codeset Query

      The XPG specified nl_langinfo(CODESET) function will be used on OSF/1
      implementation as the default means to determine the local codeset.

   7.3. Conversion API -- libiconv.a

      It is expected that any required character conversions utilities will
      use the iconv API to invoke the needed conversion.  From and to
      codeset names are expected to determined by the codeset tag field
      supplied to the RPC.

      The XPG4 defines a set of functions that can be used for conversion.
      The functions are defined as:

        (a) Initialize a conversion descriptor -- iconv_t iconv_open().

        (b) Invoke conversion on an input string -- int iconv().

        (c) Free conversion descriptor -- void iconv_free().


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 20


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


      It is expected that the libiconv.a and any DCE conversions will be
      made available to all DCE licensees.  If it is acceptable to place
      the OSF/1 implementation as a dependency, this will be done.
      Otherwise, the iconv library and conversions should be provided
      within the DCE deliverable via some mechanism.

   7.3.1. Iconv converters available on OSF/1

            SJIS      <-> AJEC
            ISO8859-1 <-> PC Code (IBM-850)
            ISO8859-1 <-> EBCDIC Code (IBM-500)

      Vendors are expected to supply their own iconv conversion modules for
      any proprietary codesets.

   7.3.2. UCS conversions needed

            From                To
            ==================  ==================
            ISO8859-1           ISO10646.1993-2
            ISO8859-7           ISO10646.1993-2
            ISO8859-9           ISO10646.1993-2
            SJIS                ISO10646.1993-2
            AJEC                ISO10646.1993-2
            eucKR               ISO10646.1993-2
            eucTW               ISO10646.1993-2
            ------------------  ------------------
            ISO10646.1993-2     ISO8859-1
            ISO10646.1993-2     ISO8859-7
            ISO10646.1993-2     ISO8859-9
            ISO10646.1993-2     SJIS
            ISO10646.1993-2     AJEC
            ISO10646.1993-2     eucKR
            ISO10646.1993-2     eucTW
            ------------------  ------------------
            ISO8859-1           ISO10646.1993-UTF2
            ISO8859-7           ISO10646.1993-UTF2
            ISO8859-9           ISO10646.1993-UTF2
            SJIS                ISO10646.1993-UTF2
            AJEC                ISO10646.1993-UTF2
            eucKR               ISO10646.1993-UTF2
            eucTW               ISO10646.1993-UTF2
            ------------------  ------------------
            ISO10646.1993-UTF2  ISO8859-1
            ISO10646.1993-UTF2  ISO8859-7
            ISO10646.1993-UTF2  ISO8859-9
            ISO10646.1993-UTF2  SJIS
            ISO10646.1993-UTF2  AJEC
            ISO10646.1993-UTF2  eucKR
            ISO10646.1993-UTF2  eucTW
            ==================  ==================


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 21


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   7.4. Testing

      All interoperability testing will be done using OSF/1 systems running
      different codesets but within a common character set, e.g., JIS
      characters in SJIS and EUC (UJIS) encodings.


   8. ACKNOWLEDGEMENTS

      Many people's ideas are incorporated here, we cannot hope to cite all
      of them.  However, we would in particular like to acknowledge the
      inputs from Dick Mackey (OSF), Sandra Martin (OSF), Nat Mishkin
      (formerly at HP, now at Atria) and Tony Hinxman (DEC).  Our apologies
      to anyone omitted.


   AUTHOR'S ADDRESS

   Sue Kline                                                Internet email:
   Hewlett-Packard                                    kline_s@apollo.hp.com
   300 Apollo Drive                              Telephone: +1-508-436-4960
   Chelmsford, MA  01824
   USA

   Alex McLeod                                              Internet email:
   International Business Machines            mcleod@nlsarch.austin.ibm.com
   MC 9652                                       Telephone: +1-512-838-8183
   11400 Burnet Rd
   Austin, TX 78758
   USA

   Makoto Nishino                                           Internet email:
   International Business Machines          nishino@ymtl01.yamato.ibm.co.jp
   (Please write c/o Francis X. Rojas)

   David Obermann                                           Internet email:
   International Business Machines                 obie@ausvm1.vnet.ibm.com
   ZIP 9340                                      Telephone: +1-512-838-0099
   11400 Burnet Road
   Austin, TX 78758

   Francis X. Rojas                                         Internet email:
   International Business Machines           fxrojas@nlsarch.austin.ibm.com
   MC 9652                                       Telephone: +1-512-838-8183
   11400 Burnet Rd
   Austin, TX 78758
   USA


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 22


   DCE-RFC 39.0          DCE I18N Character Handling             March 1993


   Arne Thormodsen                                          Internet email:
   Hewlett-Packard                                         arnet@cup.hp.com
   19447 Pruneridge Ave                          Telephone: +1-408-447-4798
   Cupertino, CA 95014
   USA


   Kline, McLeod, Nishino, Obermann, Rojas, Thormodsen              Page 23