OSF DCE SIG                                           A. Thormodsen (HP)
   Request For Comments: 13.0                                   August 1992


                  DCE 1.1 INTERNATIONALIZATION REQUIREMENTS


   1. INTRODUCTION AND SUMMARY

      This paper explains in detail the high-priority internationalization
      requirements for the OSF DCE 1.1, as recently determined by the DCE
      SIG.  It also provides some background on the motivations behind
      these requirements and on the relationship between these requirements
      and the generic internationalization requirements for all OSF
      components.

   1.1. Summary of Requirements

      The internationalization requirements on the DCE 1.1 are in two
      groups: mandatory base-level requirements and DCE-specific
      requirements.

      The base-level requirements are not prioritized.  All must be met to
      bring the DCE to a minimum level of internationalization.  These are:

        (a) REQT: 8-bit/multibyte "clean", no corruption of non-ASCII data,
            no unnecessary restrictions of text data to ASCII.

        (b) REQT: XPG-3-style message catalog support included for all
            user-visible text.  Messages should be in a common, agreed
            upon, format and default messages should be supplied for ease
            of serviceability.

        (c) REQT: Support for internationalization functionality provided
            with a single source/single binary model.

        (d) REQT: DCE components individually tested both for functionality
            and performance with non-ASCII, and in particular multibyte,
            character data.

      The DCE 1.1 specific requirements, as prioritized by the DCE SIG,
      are:

        (a) REQT: Homogeneous Interoperability (one language/char
            set/encoding)

        (b) REQT: Follow Internationalization Standards

        (c) REQT: Provide a Portable Character Set


   Thormodsen                                                        Page 1


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


        (d) REQT: Support Standard Locales

        (e) REQT: Support a Universal Character Set and Encoding

        (f) REQT: Support Character Set and Encoding Independence

      Based on OSF's investigation of the DCE source code (see [Ogur]), it
      is apparent that the base-level requirements have not yet been met.
      It is particularly important that the message catalogs and testing
      requirements be met, since without these it is not clear that the DCE
      will be acceptable in the international marketplace.


   2. BACKGROUND

      Until recently, support for more than one language or character set
      and encoding in an operating system or application was regarded as an
      "exotic" requirement.  This is no longer true.  The computer
      industry, and its customers, have become global enterprises.  Most
      large software systems are sold world-wide today, in fact they must
      be in order to return enough profit.

      The goal of world-wide sales has often been accomplished by designing
      multiple systems, one for each country or language.  A more efficient
      approach, however, it to provide support for multiple languages,
      character sets and encodings within one software system.  This
      approach is what is meant by "internationalization".

      The DCE provides some unique challenges for internationalization.
      The DCE is intended to allow multiple computer systems to efficiently
      interoperate.  From the standpoint of internationalization there are
      at least four possible generic situations encountered in a network of
      interoperating computer systems:

        (a) All systems share the same character set, encoding (single or
            multibyte) and language.

        (b) Systems use different character sets or encodings to represent
            the same language.

        (c) Systems use the same character set and encoding to represent
            different languages.

        (d) Systems use different character sets or encodings to represent
            different languages.

      The technology to address some of the issues raised by the situations
      described above will not be available for years.  Other issues,
      especially those posed by the first two situations, can be, and need
      to be, addressed today.


   Thormodsen                                                        Page 2


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      The requirements discussed in this paper are motivated primarily by
      the first situation above, in which all systems under consideration
      are operating using the same (human) language, character set and
      encoding.  (NOTE: It is also assumed in this case that all networked
      systems are running in the same locale, this assumption is discussed
      in more detail in section 3 below.)

      The second situation, in which the same language is represented with
      different sets, is partially addressed by the requirements discussed
      in this paper, but it isn't anticipated that a full solution to this
      problem will be needed until after the DCE 1.1 timeframe.  The
      "portable character set" and "universal character set" requirements
      (see 4c and 4e) will provide a framework to allow an application
      developed under DCE 1.1 to function in a network which supports
      different character sets and encodings.  The cost is a slightly more
      complex application design.

      The last two situations described above present unique difficulties
      which are not presented by the first two.  In particular, the DCE
      will need to deal meaningfully with data from different systems which
      may be using completely different linguistic and cultural assumptions
      about character data handling.  As a specific example, imagine a DCE
      application attempting to merge collated data from multiple systems,
      each of which is using a fundamentally different collation order.
      Despite the difficulties, some form of multilingual support will
      almost certainly be needed in the future, especially in economically
      important regions such as Western Europe.  Today's DCE designs should
      not make assumptions which would prevent multilingual support in the
      future.

      Appendix B contains a much more in-depth report on the issues
      surrounding data interchange in an internationalized DCE environment.


   3. REQUIRED BASE-LEVEL FUNCTIONALITY

      It is assumed that the DCE components will provide certain minimal
      internationalization functionality as a matter of course.  These
      requirements are:

        (a) REQT: 8-bit/multibyte "clean", no corruption of non-ASCII data,
            no unnecessary restrictions of text data to ASCII.

        (b) REQT: XPG-3-style message catalog support included for all
            user-visible text.  Messages should be in a common, agreed
            upon, format and default messages should be supplied for ease
            of serviceability.

            (Recently messaging has become a more critical issue due to
            concerns about serviceability in a networked environment.  A
            particular concern is the tracing and logging of messages


   Thormodsen                                                        Page 3


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            arriving from various remote sites, possibly using different
            languages and character sets or encodings.  This is discussed
            in more detail in Section 6.)

        (c) REQT: Support for internationalization functionality provided
            with a single source/single binary model.

        (d) REQT: DCE components individually tested both for functionality
            and performance with non-ASCII, and in particular multibyte,
            character data.

      These requirements are derived from [Klin].

      There is also an important requirement on the locale data used by the
      various systems connected via the DCE:

        (a) REQT: The NLS locale data on various systems connected via the
            DCE must be consistent.  Currently this must be done
            "manually", presumably by system administrators.

      This requirement is necessary because this is the only method
      currently available to insure that character data is handled the same
      on all DCE networked systems.  In practice, this requirement would
      probably only be important for applications which do extensive,
      distributed, text processing (such as so-called "groupware"
      applications).  In this case the application installation process
      itself would probably specify, or even provide, synchronized locales.

      In the future it is anticipated that the DCE itself may be able to
      resolve the complexities of locales in a distributed environment.
      Currently the technology to do this does not exist, although X/Open-
      UNIFORUM is currently examining and addressing some of these
      requirements.


   4. HIGH PRIORITY DCE 1.1 REQUIREMENTS

      Below are the top six internationalization requirements for DCE 1.1,
      in priority order, with very brief descriptions.  The entire
      prioritized list of requirements, and complete full descriptions, can
      be found in Appendix A.  Note that some of the requirements in the
      list below have bearing on future plans, these can be deduced from
      the lower-prioritized requirements in this Appendix.  In particular,
      several are prerequisites for regional heterogeneous
      interoperability, which fell just below the priority of the items
      below.

        (a) REQT: Homogeneous Interoperability (i.e., one language/char
            set/encoding)


   Thormodsen                                                        Page 4


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            The DCE must be able to support networks in which all clients
            and servers are using the same character set and encoding.

        (b) REQT: Follow Internationalization Standards

            The DCE shall follow all relevant formal standards in providing
            I18N functionality.

        (c) REQT: Provide a Portable Character Set

            OSF should identify, publish, implement, and promote the use of
            a collection of "portable characters" which may be used by any
            DCE application (see B.2 for a definition of a portable
            character set).

        (d) REQT: Support Standard Locales

            The DCE should support standard locales, if/when available,
            from these groups: ISO (highest priority), POSIX, X/OPEN.

        (e) REQT: Support a Universal Character Set and Encoding

            OSF should support a single, universal character set and
            encoding which may be used by any DCE component or application.

        (f) REQT: Support Character Set and Encoding Independence

            The DCE should be able to handle a wide variety of character
            sets and encoding methods, at a very minimum the character sets
            and encodings supported by OSF 1.1.


   5. DISCUSSION OF REQUIREMENTS

      The six internationalization requirements which were given highest
      priority by the SIG form an interdependent set.  Each requirement has
      implications which affect other requirements, as well as affecting
      the DCE in ways not directly related to any of the requirements.
      This section discusses the various implications of each requirement.

   5.1. REQT: Homogeneous Interoperability (i.e., One Language/Char
        Set/Encoding)

      This is really the "master" requirement for DCE 1.1.  Currently most
      computer networks share a common character set and encoding, and all
      users on a network share a common language.  This requirement implies
      that an application developed with the DCE should function equally as
      well when running on any such network.  For example, an
      internationalized application developed with the DCE should function
      both on a Japanese network using EUC encoded JIS, and on a French
      network using ISO 8859-1.  (NOTE: "internationalized" implies that


   Thormodsen                                                        Page 5


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      the application will not need to be relinked, recompiled or rewritten
      to work in these different environments.)

      Beyond the technical implications for the various DCE components,
      this requirement is directly dependent on at least three other DCE
      I18N requirements, which are briefly discussed below.  These
      requirements are also discussed in detail in sections 5.3, 5.4 and
      5.5.

        (a) Provision of a portable character set

            It will be necessary for a DCE application to name and identify
            objects, especially "DCE owned" objects, regardless of the
            particular character set and encoding in use by the application
            and the network.  This implies the existence of a "portable
            character set".

        (b) Support for standard locales

            It will be necessary for a DCE application consistently to
            perform character manipulations, data formatting, and similar
            locale-dependent operations.  This implies the use of the same
            locale throughout a DCE-supporting network.  For those cases
            where standard locales exist, it is anticipated that they will
            be used on such a network.  Therefore they must be supported by
            the DCE.

        (c) Support character set and encoding independence

            This requirement is obvious from the description of
            "homogeneous interoperability" given above.  A single,
            compiled, DCE application should be able to support any
            character set which the underlying system supports.  This
            implies the use of character set/encoding independent
            interfaces such as the proposed XPG4 WPI within DCE components
            (see next item below for more discussion).

   5.2. REQT: Follow Internationalization Standards

      This is a "good citizenship" requirement.  Basically it is requesting
      that all DCE components provide messaging, international character
      data processing, local conventions and collations via standard
      interfaces and using standard data where available.  The purpose of
      this request is twofold: First, it ensures that the DCE components
      are fully internationalized and second, it ensures that an
      application which is developed in conformance with international
      standards can be easily ported to operate under the DCE.

      An additional concern is the availability of standards-based
      interfaces on systems which the DCE will be ported to.  Unfortunately
      not all systems can be guaranteed to support all standards-based


   Thormodsen                                                        Page 6


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      interfaces.  It may be necessary to adopt a strategy similar to that
      used by the X Consortium for X-Windows.  In this case the X-Windows
      source is distributed along with a minimally functional set of
      certain standard interfaces, such as the XPG4 WPI wide character
      interfaces.  This facilitates rapid porting of prototype
      implementations.  In particular, OSF should investigate what
      standards-based routines it already "owns" and how these might be
      bundled with the DCE.

   5.3. REQT: Provide a Portable Character Set

      The DCE currently specifies a "portable character set" (see section
      B.2) without further specifying the use and implementation of this
      set.  This is a serious flaw; further specification is needed from
      OSF.

      Most importantly, the DCE specifications should indicate explicitly
      what purposes this portable character set should be used for.  The
      existing API's in the DCE components need to be classified as to
      whether they are restricted to use of the portable characters, a
      subset of these characters, or a superset.

      Furthermore, the specifications must be clarified with regard to
      encodings.  It is the opinion of the Working Group that the DCE must
      specify one preferred encoding of these characters (presumably ISO
      IRV646), while also supporting alternative, vendor-proprietary, sets
      on homogeneous networks.  Note that a mandatory encoding of the PCS
      is not being requested here, only a preferred default encoding.

      This requirement will clarify what is required to allow multivendor
      interoperability in a DCE network.  It is not a requirement that the
      DCE support the simultaneous use of different encodings of the
      portable characters within one network.  However, by specifying one
      standard default encoding the DCE can enable applications to
      interoperate, if they choose this encoding.

   5.4. REQT: Support Standard Locales

      The principle motivation for this requirement is discussed in 5.1
      above.  For a distributed application to behave in a consistent way
      (probably for it to be usable at all) it must have access to the same
      character handling behavior and data formatting information on all
      systems.  This information comes from the locale data.

      As mentioned in Section 3, it is the system administrators'
      responsibility to make sure that this data is consistent.
      Presumably, they will do this by resorting to some set of standard
      locales.  Hopefully X/Open (currently in unofficial cooperation with
      ISO) will have a database of standard locales available before the
      end of this year.  If not, OSF itself will need to supply such
      locales.  The only requirement on the DCE is that it be TESTED with


   Thormodsen                                                        Page 7


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      these locales to ensure that homogeneous interoperability can
      actually be achieved.

   5.5. REQT: Support a Universal Character Set/Encoding

      This requirement is a special case of the more general requirement
      for character set independence stated in 5.6 below.

      A universal character set/encoding (UCS) is capable of representing
      the writing systems of a large number of languages within one set.
      The most obvious current choice for a UCS is ISO 10646.

      While all of the possible uses of such a set are not apparent, there
      are some obvious applications:

        (a) A UCS can provide a convenient method of encoding characters
            for data interchange.  This could be useful when communicating
            with systems using an unknown character set/encoding, or for
            providing data in a universal format to systems supporting
            various sets.

        (b) A UCS can provide the basis for multilingual applications.

      Some application developers are already considering the use of ISO
      10646 (or the related UNICODE) for various personal computer
      applications such as multimedia mailers.  The DCE should be designed
      to permit such applications to interoperate via the DCE.  It is less
      important that the DCE use a UCS for internal purposes, such as
      object naming, but it should not prevent such use.  The ISO 10646
      standard is likely to affect other standards, such as X.500.  The DCE
      needs to track any such changes.

      More aspects of the implementation and use of such a set are
      discussed at length in the Appendix B.

   5.6. REQT: Support Character Set and Encoding Independence

      This requirement is also discussed under item 5.1 above.  The same
      requirement holds for any internationalized software, that it be able
      to support a variety of character sets and encodings with a single
      compiled version.

      This implies that the DCE components must use internationalized
      interfaces, such as those specified in the proposed XPG4 WPI, to do
      all character handling.  Furthermore, the DCE components should be
      designed in an internationalized way, without the use of hard-coded
      character constants and strings.


   Thormodsen                                                        Page 8


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   6. OTHER CONSIDERATIONS

      During the course of discussing this paper some issues came up which
      are related to DCE 1.1 requirements, but not explicitly tied to them.
      These are mentioned below, with no implied prioritization.

   6.1. Messaging

      The 1.1 requirements above discuss message catalogs, which provide a
      way to bind an application to native language messages on a
      particular system.  This model does not support the type of
      distributed environment implied by the DCE.  In particular, it
      doesn't address the problem of servers needing to provide text
      messages (such as error reports) to clients which may be working in
      different languages.

      There needs to be a way of referencing a message independently of the
      language and character set of the message.  This can be accomplished
      via unique identifiers embedded within localizable messages, or
      perhaps through some more advanced approach.  Note that there are two
      slightly different uses of such an identifier: one is to identify a
      message which has been received but cannot be interpreted, another is
      to communicate a message to a system which is working in an unknown
      locale.  As networks grow larger this serviceability issue will
      become critical.

   6.2. Distributed Locales

      At several points in the above discussion, it was necessary to assume
      homogeneous locales across a network.  This is a fairly strict
      requirement, and may not easily be met.  It would be better if the
      DCE could somehow synchronize locales between clients and servers.
      The fundamental technology to implement such a system is currently
      being investigated by X/Open, these efforts should be monitored by
      the DCE developers.


   APPENDIX A. COMPLETE LIST OF DCE 1.1 I18N REQUIREMENTS

      Below is the complete list of DCE 1.1 internationalization
      requirements as voted on by the DCE SIG in November, 1991.  The
      wording of these requirements is as they were voted on.  In certain
      cases (notably portable characters) the exact specifications have
      changed slightly during further discussions.  Refer to the main paper
      for more precise explanations.

        (a) REQT: Homogeneous Interoperability (one language/char
            set/encoding)

            The DCE must be able to support networks in which all clients
            and servers are using the same character set and encoding.  No


   Thormodsen                                                        Page 9


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            assumptions about this specific character set and encoding
            should be made beyond the assumption of a consistent encoding
            of the Portable Character Set.  In this configuration the user
            should not be able to distinguish between remote and local data
            access.  (NOTE: An implication of this requirement is that
            identical locales must be available on all systems in a DCE
            network, and that all processes communicating via the DCE must
            be running in identical locales, if consistent character and
            data processing is required)

        (b) REQT: Follow Internationalization Standards

            The DCE shall follow formal standards in providing the
            following functionality: message catalogs, international
            character data processing, local conventions, collation.  The
            recommended OSF prioritization of standards shall be applied in
            cases of conflicting standards.  Refer to the OSF I18N SIG
            requirements list for further explanation of the relevant
            standards.

        (c) REQT: Provide a Portable Character Set

            OSF should identify, publish, implement, and promote the use of
            a collection of "portable characters" which may be used by any
            DCE application, anywhere in the world, regardless of the local
            character set and encoding in use.  (NOTE: This is intended to
            imply that the ASCII name "ABC" is portable to an EBCDIC
            system, which further implies that "portable character" data is
            identifiable in some way so that the necessary conversions can
            be done.  This doesn't necessarily further imply tagged data,
            since "portable characters" could be implemented as a unique
            IDL type if desired.)

        (d) REQT: Support Standard Locales

            The DCE should support standard locales, if available, from
            these groups: ISO (highest priority), POSIX, X/OPEN.  If none
            of these group has created standard locales, OSF should provide
            its own definitions.

        (e) REQT: Support a Universal Character Set and Encoding

            OSF should support a single, universal character set and
            encoding which may be used by any DCE component or application
            for the transmission of any desired characters.  An example of
            this is the emerging ISO 10646 standard.

        (f) REQT: Support Character Set and Encoding Independence

            The DCE should be able to handle a wide variety of character
            sets and encoding methods, at a very minimum the character sets


   Thormodsen                                                       Page 10


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            and encodings supported by OSF 1.1.  DCE components will be
            required to support interfaces capable of being called with
            different code sets.  It should be possible to compile DCE
            components into a single object that can support various code
            sets, both within the workstation and on the network.

        (g) REQT: Regional Heterogeneous Interoperability

            The DCE must be able to support networks in which clients and
            servers are working in different codesets, provided that these
            different codesets are substantially or entirely
            interconvertible (i.e., ASCII and U.S. EBCDIC, or SJIS and
            UJIS) (NOTE: This will depend on either a tagged data type or a
            universal codeset, so it would be inconsistent to prioritize it
            above BOTH of those items)

        (h) REQT: Mechanism for Identification of Character Data

            OSF should specify a mechanism for allowing DCE components and
            applications to identify character data with (at least) the
            identity of the codeset of the data.  This identification could
            be granular, such as by string, character, node, or filesystem.
            The definition of granularity should be specified per
            component.

        (i) REQT: Character Data Tools

            Tools should be provided for DCE services to use the
            identification mechanism.  (If tools are not available, vendors
            can provide them, but the group prefers that DCE provide this.)

        (j) REQT: Support EBCDIC Encodings

            EBCDIC encoding should be supported (includes single and
            multi-byte).  In the multi-byte case, User-Defined Characters
            (GAIJI) should be supported.

        (k) REQT: World-Wide Heterogeneous Interoperability

            The DCE must be able to support networks in which clients and
            servers work in different codesets which cannot be
            interconverted.  (NOTE: This implies some useful "fallback"
            behavior, not a miracle.  It will probably depend on the
            existence of "portable characters", and so should be
            prioritized accordingly.)

        (l) REQT: Influence Internationalization Standards

            OSF should actively participate in standards groups meetings,
            seeking to influence internationalization standards in ways
            favorable to the success of OSF DCE.  In particular, these


   Thormodsen                                                       Page 11


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            areas are of concern to the DCE: Data tagging, Large character
            sets, Wide-character processing, Locale naming, Distributed
            locales.

        (m) REQT: Permit User-defined Locales

            In a distributed environment, it must be possible for such
            locales to be available on both a client and its associated
            server.  This may require manual replication on the part of a
            system administrator.  Such user defined locales must be
            defined on a network basis.


   APPENDIX B. DATA INTERCHANGE

      The following is an in-depth presentation of various architectures
      for data interchange in an internationalized DCE environment.  It is
      extracted (with minor re-wording) from [SKRS].

   B.1. Data Interchange in an Internationalized DCE

      The Interchange component of the DCE architectural model addresses
      issues associated with the communication of National Language
      sensitive data types between the processes comprising a distributed
      application.  More specifically, it addresses what encodings will be
      used to communicate National Language sensitive information and where
      in the system conversions are performed when the encodings used
      locally by the sending process differ from those used locally be the
      receiving process.  Additionally, the interchange architecture
      specifies how either the sender or the receiver determines that a
      conversion is necessary at all.

      The remainder of this section first introduces a set of useful
      terminology for discussing interchange in the DCE environment and
      then examines a progression of three DCE interchange environments
      ranging from simple to very complex.  These are not intended to
      describe every possible variation of interchange support, but rather
      to define large interesting categories of support and some of the
      issues associated with each.

   B.2. Terminology

      This section introduces a basic set of terminology for discussing the
      interchange architecture for the OSF DCE.  The terms defined here are
      used through the next three sections.

        (a) Native DCE implementation

            Refers to the level of functionality provided by the OSF/DCE
            1.0 source offering.  This includes straight ports (with no
            functional enhancements) to other platforms than the reference


   Thormodsen                                                       Page 12


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            implementations.

        (b) Character Set

            A collection of characters.

        (c) Coded Character Set

            A set of unambiguous rules that establishes a character set and
            the one-to-one relationship between each character of the set
            and its bit representation.

        (d) Codeset

            A coded character set that is used to encode all the characters
            in some locale.

        (e) DCE Process

            A process on a given host which is either running a DCE daemon
            or which is running an application which is using DCE
            facilities.

        (f) DCE Logical Network

            A set of given hosts which are physically connected by some
            communications media and which are configured and administered
            to behave as a single, logical DCE system.

        (g) DCE Portable Character Set

            The set of semantic characters in DCE R1.0 that are guaranteed
            to be supported in names within the CDS (Cell Directory
            Service), the file system, and Security.  The set consists of
            the following 95 characters (note that the space character is
            included, between the letters and the numerals):

                  [a-z][A-Z] [0-9]!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

            This set defines semantic, rather than visual or encoded
            characters.  That means, for example, that a character with the
            semantics of <backslash> must be supported, even if in some
            fonts, the backslash glyph has been replaced with a Yen sign or
            c-cedilla or other glyph.  In addition, it means that just
            because two glyphs look similar doesn't mean they are the same
            semantic character.  For instance, a double-byte 'A' or a Greek
            ALPHA appear very similar to the semantic <A>.  But these are
            *not* the same semantic character as the <A>.

            DCE guarantees to support these 95 characters in names, but it
            does not prohibit you from using additional characters beyond


   Thormodsen                                                       Page 13


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


            the 95.  Handling of additional characters is implementation-
            specific.  "Supported" means that it is suggested users
            exercise discipline in restricting themselves to just these 95
            characters (preferably as minimal a subset as they can get away
            with, such as alphanumerics), but that DCE would not check that
            they used only these 95 characters.  No encoding is defined.
            Use of IDL-chars and C-(unsigned)-chars is specified.  Handling
            of chars other than these 95 characters is to be unspecified
            and implementation-specific.

        (h) Encoding

            A single coded character set or a defined methodology that
            allows multiple coded character sets to be combined.  The
            latter case includes well-known rules for determining the
            single coded character set to which a character belongs.
            Examples of such rules are ISO 2022 and Compound Text, which
            include tags or escape sequences to identify the coded
            character set a character or characters belong to.

        (i) Network Interchange Encoding (NIE)

            The encoding used to transfer text strings between DCE
            processes.  The Network Interchange Encoding must be able to
            encode all characters that may exist in a DCE Logical Network.
            The set of characters that need to be encoded is defined by the
            union of all coded character sets by all DCE Processes in a
            given DCE Logical Network.

   B.3. The Homogeneous Codeset Environment

      This environment consists of a single Codeset that is supported by
      all DCE Processes in the DCE Logical Network.  It assumes that a
      single Codeset is used as the Network Interchange Encoding, and that
      the same Codeset is used locally by all DCE Processes in a DCE
      Logical Network (see Figure 1 below).

      This environment does not specify the specific Codeset to be used.
      Therefore different DCE Logical Networks could be configured to
      support different Codesets.  The only restriction is that the
      selected Codeset must be a superset of (contain at least the
      characters in) the DCE Portable Character Set.  (NOTE: This assumes
      that (1) the OSF DCE source base is 8-bit clean and (2) that it
      carries no dependencies on any codeset specific characteristics
      (i.e., like contiguous ranges of characters).)

      Since all DCE Processes in this scenario recognize a single Codeset,
      all input and output with these nodes must use only the characters
      defined in this Codeset.  Even if the nodes support other characters
      outside of this Codeset for environments beyond DCE, those characters
      would not be supported.  This, and the fact that all processes are


   Thormodsen                                                       Page 14


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   ------------------------------------------------------------------------

                   +-------- All DCE Processes    ------+
                   |         support THE SAME           |
                   |         character set              |
                   |                                    |
                   V                                    V
            +---------------+                  +---------------+
            |    DCE Process|                  |DCE Process    |
            | +---+         |                  |         +---+ |
            | | A |         |                  |         | A | |
            | | P |   +---+ |NIE=local codeset | +---+   | P | |
   +------+ | | P |   |   | |    used by all   | |   |   | P | | +------+
   |      | | | L |   | R | |    DCE Processes | | R |   | L | | |      |
   | Data |<+>| I |<->| P |<+------------------+>| P |<->| I |<+>| Data |
   |      | | | C | A | C | |                  | | C | A | C | | |      |
   +------+ | | A | | |   | |                  | |   | | | A | | +------+
      A     | | T | | +---+ |                  | +---+ | | T | |     A
      |     | | I | |       |                  |       | | I | |     |
      |     | | O | |       |                  |       | | O | |     |
      |     | | N | |       |                  |       | | N | |     |
      |     | +---+ |       |                  |       | +---+ |     |
      |     |  A    |       |                  |       |   A   |     |
      |-----+--+    |       |                  |       |   +---+-----|
      |     +-------+-------+                  +-------+-------+     |
      |             |                                  |             |

   CS=NIE          NIE                                NIE          CS=NIE

   Figure 1. Homogeneous Codeset Environment
   ------------------------------------------------------------------------

      using the same encodings for the characters they use in common,
      ensures that there is no loss of data when transmitting between the
      DCE Processes because no translation needs to be done by either node
      into a different Codeset.  This avoids the performance penalties
      accompanying the conversions and guarantees the integrity of the
      exchanged character data.  As a result, a string entered in one DCE
      Process and sent to a server in another DCE Process will always
      consist of the same set of characters when retrieved from a third DCE
      Process where the three processes belong to the same DCE Logical
      Network.

      This environment has the important restriction, however, that it
      relies on the identical Codeset being supported natively by all DCE
      Processes in the DCE Logical Network.  (NOTE: Where native means that
      it is supported by the underlying host operating system.) This
      significantly limits its practical application within the
      heterogeneous vendor environments being specifically targeted by the
      DCE.  An interesting example of one such heterogeneous environment is
      one which contains both ASCII and EBCDIC hosts.


   Thormodsen                                                       Page 15


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   B.4. The Homogeneous Network Codeset Environment

      In the Homogeneous Network Codeset Environment, all DCE processes
      within a given DCE Logical Network support the same Character Set.
      Each DCE process may use a different codeset to encode the character
      set locally, but all DCE processes within the DCE Logical Network use
      the same Network Interchange Encoding.  If the codeset used by a
      given DCE process is different than the Network Interchange Encoding,
      then that process is responsible for converting data between its
      codeset and the Network Interchange Encoding before sending and after
      receiving data from the network.

   ------------------------------------------------------------------------

                   +-------- All DCE Processes    ------+
                   |         support THE SAME           |
                   |         character set              |
                   |                                    |
                   V                                    V
             +------------+                       +------------+
             |            |                       |            |
             |DCE Process |  NIE=Network Codeset  |DCE Process |
             |            |<--------------------->|            |
   +----+    |            |                       |            |   +----+
   |DATA|<-->|            |                       |            |<->|DATA|
   +----+    +------------+                       +------------+   +----+
     A        A     |   A                           |   A     A        A
     |        |     |   |                           |   |     |        |
     |        |     V   |                           V   |     |        |
     |        |   +-------+                       +-------+   |        |
     |        |   |CS |CS |                       |CS |CS |   |        |
     +--------+   | | | A |                       | | | A |   +--------+
         |        | V | | |                       | V | | |       |
         |        |NIE|NIE|                       |NIE|NIE|       |
         |        +-------+                       +-------+       |
         |            A                               A           |
                      |                               |
    Data stored       +-------------------------------+      Data stored
    and processed                     |                      and processed
    using local                                              using local
    Codeset                 Each DCE Process which uses      Codeset
                            a local Codeset different
                            than the NIE must perform
                            conversions between its local
                            Codeset and the NIE.

   Figure 2.  The Homogeneous Network Codeset Environment
   ------------------------------------------------------------------------

      This environment represents a superset of the interchange support
      provided by the Homogeneous Codeset Environment.  Each DCE Process


   Thormodsen                                                       Page 16


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      may use a different code set and thus enables ASCII and EBCDIC based
      DCE Processes to coexist and exchange character data within a single
      DCE Logical Network.  At the same time, it maintains data integrity
      by requiring all DCE Processes within a given DCE Logical Network to
      support the same set of characters.  Therefore, if the DCE components
      themselves are properly internationalized and do not introduce any
      restrictions beyond the interchange architecture defined here, a
      given DCE Logical Network could be configured to support any
      arbitrary character set which is a superset of the DCE Portable
      Character Set.

      From a performance perspective, this environment can require up to
      two conversions to be performed on a single transmission between DCE
      processes if both processes are using a codeset which is different
      than the Network Interchange Encoding.  Note that this is true even
      in the case where the DCE processes are using the same codeset.

      Because all components of the base OSF/DCE assume that all character
      data passed to them via the RPC mechanism is already encoded in the
      codeset used by the DCE process on which the implementation is
      running, this environment cannot be supported by a native DCE
      implementation.  Architecturally, there are two approaches which may
      be taken to enable the support of this environment which are
      distinguished primarily by where the responsibility for performing
      the conversions between the Network Interchange Encoding and the
      codeset lies.

      The first approach, depicted in Figure 3 below, places the
      responsibility on the RPC layer of the system.  To support this
      approach, the RPC mechanism must be modified to be sensitive to the
      codeset and the Network Interchange Encoding and must perform
      conversions between the two when they are different.  (NOTE: When I
      am referring to RPC here I am lumping the processing performed by the
      stubs and the runtime together into a single layer.  This is done to
      simplify the discussion and is accurate to the degree that it is not
      the application programmer who is worrying about the conversions.)

      This approach has several advantages which are derived primarily from
      the fact that it maintains the RPC paradigm for making differences
      between basic data type representations on communicating systems
      transparent to the application programmer.  In particular, it would
      not require modifications to the DCE components themselves.  (NOTE:
      This assumes that the DCE components have already been
      "internationalized".)

      In the second approach, depicted in Figure 4 below, places the
      responsibility for conversion on each application.

      This approach has the advantage of providing complete flexibility to
      the application programmer in choosing how and when the conversion is
      to be performed.  However, it comes at the cost of requiring all


   Thormodsen                                                       Page 17


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   ------------------------------------------------------------------------

                   +-------- All DCE Processes    ------+
                   |         support THE SAME           |
                   |         character set              |
                   |                                    |
                   V                                    V
             +-------------+                     +-------------+
             |  DCE Process|                     |DCE Process  |
             |+---+        |                     |        +---+|
             || A |        |                     |        | A ||
             || P |        |                     |        | P ||
             || P |        |                     |        | P ||
             || L |  +---+ |                     | +---+  | L ||
             || I |  | R | | NIE=Network Codeset | | R |  | I ||
             || C |<>| P |<+---------------------+>| P |<>| C ||
             || A |A | C | |                     | | C | A| A ||
             || T || +---+ |                     | +---+ || T ||
             || I ||   |A  |                     |  |A   || I ||
             || O ||   ||  |                     |  ||   || O ||
   +----+    || N ||   ||  |                     |  ||   || N ||   +----+
   |DATA|<-->|+---+|   ||  |                     |  ||   |+---+|<->|DATA|
   +----+    +-----+---++--+                     +--++---+-----+   +----+
     A        A    |   |+--+                    +---+|   |   A        A
     |        |    |   |   |                    |    |   |   |        |
     |        |    |   V   |                    V    |   |   |        |
     |        |    |  +-------+               +-------+  |   |        |
     |        |    |  |CS |CS |               |CS |CS |  |   |        |
     +--------+    |  | | | A |               | | | A |  |   +--------+
         |         |  | V | | |               | V | | |  |       |
         |         |  |NIE|NIE|               |NIE|NIE|  |       |
         |         |  +-------+               +-------+  |       |
         |         |                                     |       |
         |         |                                     |       |
    Data stored    |   Data passed to applications in    |   Data stored
    and processed  +-- the local Codeset for the DCE   --+   and processed
    using local        Process within which they are         using local
    Codeset            running.                              Codeset

   Figure 3. RPC Based Architecture for Homogeneous Network Codeset Support
   ------------------------------------------------------------------------

      application programmers to deal with the added complexity of
      determining what the Network Interchange Encoding is and explicitly
      performing the appropriate conversions.  Additionally, this approach
      does not require any changes to the existing RPC mechanism itself.

      Both of these approaches will work and both of them require changes
      to some portion of the existing DCE system in order to allow the DCE
      to provide this level of interchange support.  However, the first
      approach is more aligned with the DCE's objectives of providing a


   Thormodsen                                                       Page 18


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   ------------------------------------------------------------------------

                   +-------- All DCE Processes    ------+
                   |         support THE SAME           |
                   |         character set              |
                   |                                    |
                   V                                    V
             +-------------+                     +-------------+
             |  DCE PROCESS|                     |DCE PROCESS  |
             |+---+        |                     |        +---+|
             || A |        |                     |        | A ||
             || P |        |                     |        | P ||
             || P |        |                     |        | P ||
             || L |  +---+ |                     | +---+  | L ||
             || I |  | R | | NIE=Network Codeset | | R |  | I ||
             || C |<>| P |<+---------------------+>| P |<>| C ||
             || A | A| C | |                     | | C |A | A ||
             || T | |+---+ |                     | +---+| | T ||
             || I | |      |                     |      | | I ||
             || O | |      |                     |      | | O ||
   +----+    || N | +------+---------------------+------+ | N ||   +----+
   |DATA|<-->|+---+        |       |             |        +---+|<->|DATA|
   +----+    +--+A---------+       |             +--------A+---+   +----+
     A        A |+--+              |                  +---+| A        A
     |        | |   |              |                  |    | |        |
     |        | V   |              |                  V    | |        |
     |        |+-------+           |                +-------+|        |
     |        ||CS |CS |           |                |CS |CS ||        |
     +--------+| | | A |           |                | | | A |+--------+
         |     | V | | |           |                | V | | |    |
         |     |NIE|NIE|           |                |NIE|NIE|    |
         |     +-------+           |                +-------+    |
         |                         |                             |
         |                                                       |
    Data stored        Data passed to applications in        Data stored
    and processed      encoded in the Network Codeset.       and processed
    using the local    Applications are responsible for      using Local
    Codeset            converting to the local Codeset
                       for processing.

   Figure 4. Application Based Architecture for Homogeneous Network Codeset
             Support
   ------------------------------------------------------------------------

      platform for developing distributed applications in a heterogeneous
      environment where the platform isolates the application programmer
      from as much of the complexity of the environment as possible.


   Thormodsen                                                       Page 19


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   B.5. The Heterogeneous Environment

      In the Heterogeneous Environment, all DCE Processes support the DCE
      Portable Character Set, but there is no single homogeneous character
      set which is shared across all DCE Processes in the DCE Logical
      Network (see Figure 5).  Each DCE Process may support its own
      character set(s) so long as it is a superset of the DCE Portable
      Character Set and may use its own codeset which may be different than
      the Network Interchange Encoding.  Like the Homogeneous Network
      Codeset Environment, if the codeset used by a DCE Process is
      different than the Network Interchange Encoding then that process is
      responsible for converting its data to the the Network Interchange
      Encoding.  Yet, the difference is that since different character sets
      may be used by different DCE Processes, there is no guarantee that
      data can be communicated except for the intersection of character
      sets from the communicating DCE Processes.

      This environment is the first that has been examined in this document
      which introduces potentially significant data integrity problems.
      Specifically, data integrity can not be guaranteed between DCE
      Processes that have incompatible character sets.  Only those
      characters in common at a given time (between requester/sender) are
      guaranteed to be correct.

      Therefore, any system interchange architecture which aims to provide
      this level of support must address the case where a piece of data
      arrives at a DCE Process for which the DCE Process has no defined
      conversion to its local representation.  The architecture must define
      which party is responsible for detecting the potential data loss, how
      and if that loss is communicated to the end user, and in what cases
      does the intended operation proceed in spite of the data loss, etc.

      This environment presents several issues relating to how data is
      passed through the DCE Logical Network and the integrity of the data
      being communicated.  Specifically, there are three approaches to
      building a Heterogeneous Environment:

        (a) Single Network Interchange Encoding (Canonical form)

        (b) Sender makes it right.

        (c) Receiver makes it right.

      (For both (b) and (c), it is assumed that there is some
      implementation method for cooperating DCE Processes to communicate
      the codeset of the information they are exchanging.)


   Thormodsen                                                       Page 20


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   ------------------------------------------------------------------------

                   +-------- Each DCE Process may ------+
                   |         support a DIFFERENT        |
                   |         character set              |
                   |                                    |
                   V                                    V
             +------------+                       +------------+
             |            |                       |            |
             |DCE Process | NIE = (see text, B.5) |DCE Process |
             |            |<--------------------->|            |
   +----+    |            |                       |            |   +----+
   |DATA|<-->|            |                       |            |<->|DATA|
   +----+    +------------+                       +------------+   +----+
     A        A     |   A                           |   A     A        A
     |        |     |   |                           |   |     |        |
     |        |     V   |                           V   |     |        |
     |        |   +-------+                       +-------+   |        |
     |        |   |CS |CS |                       |CS |CS |   |        |
     +--------+   | | | A |                       | | | A |   +--------+
         |        | V | | |                       | V | | |       |
         |        |NIE|NIE|                       |NIE|NIE|       |
         |        +-------+                       +-------+       |
         |            A                               A           |
                      |                               |
    Data stored       +-------------------------------+      Data stored
    and processed                     |                      and processed
    using local                                              using local
    Codeset                 Each DCE Process which uses         Codeset
                            a codeset that is different
                            than the NIE must perform
                            conversions between its local
                            Codeset and the NIE.

   Figure 5. A Heterogeneous Environment
   ------------------------------------------------------------------------

   B.5.1. The single network interchange encoding

      In this environment, there is a single Network Interchange Encoding
      defined for interchange in the DCE Logical Network.  The selection of
      the Network Interchange Encoding must be defined such that it can
      encode the union of character sets that may be supported by the DCE
      Logical Network.  The Network Interchange Encoding may be defined as
      either a tagging mechanism or as a large character set encoding that
      includes the union of all character sets found in the DCE Logical
      Network.

      Note that if a tagging mechanism is used as the Network Interchange
      Encoding then the initial state of the encoding could be defined to
      correspond to a primary character set(s) of the DCE Logical Network.


   Thormodsen                                                       Page 21


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      For example, the initial state could be defined to be either the
      Latin-1 (ISO8859-1) or the JIS (JISX0201) character set.  Such a
      strategy would eliminate the need to convert unless the data being
      sent is not contained in the initial state.  This optimizes the
      default case yet allows other DCE Processes whose character set(s)
      are not included in the initial state to exist in the DCE Logical
      Network.

      If the Network Interchange Encoding chosen is based on a large
      character set (Unicode or ISO10646) this would require conversion of
      all systems that do not support this as their coded character set.
      Yet, it would allow DCE Processes that support the large character
      set as their codeset to migrate to this environment and bypass the
      conversion on communication.

   B.5.2. Receiver makes it right

      In this environment, there is no single Network Interchange Encoding
      defined for the DCE Logical Network, but rather each DCE Process
      sends data using its own code character set as its Network
      Interchange Encoding.  The receiving DCE Process will have negotiated
      to accept the codeset of the sending DCE Process.

      In the case where a DCE Process receives data that is encoded using a
      different codeset than it is using locally, it is the receiving DCE
      Process's responsibility to perform the conversion from the sending
      DCE Process's codeset to codeset being used locally.  This implies
      that each receiving DCE Process needs to provide a converter for each
      codeset that may exist in the DCE Logical Network.  This may lead to
      N converters being available to each DCE Process where N is the
      number of codesets existing in the DCE Logical Network.

      This approach does eliminate the need to convert if both DCE
      Processes are using the same codeset.  Furthermore, the negotiation
      procedure is reduced to the receiver either being able to perform the
      conversion from the sender's Codeset to its Local Codeset or not.

   B.5.3. Sender makes it right

      This environment is a derivative of the previous "Receiver Makes it
      Right" with the exception that any conversion is the responsibility
      of the sender.  Additionally, the negotiation process in this model
      must be more sophisticated as it requires the sender to obtain
      knowledge of the codeset that the receiver would like to receive the
      data in before the data is actually sent.


   Thormodsen                                                       Page 22


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


   ------------------------------------------------------------------------

             +------------+NIE1=CS of DCE Process1+------------+
             |            |---------------------->|            |
             |DCE Process |                       |DCE Process |
             |     1      |                       |     2      |
   +----+    |            |NIE2=CS of DCE Process2|            |   +----+
   |DATA|<-->|            |<----------------------|            |<->|DATA|
   +----+    +------------+                       +------------+   +----+

   Figure 6. Receiver Makes it Right
   ------------------------------------------------------------------------

   B.6. Current RPC Facility

      Because of dependencies within the individual DCE components on the
      ordering of code points defined by the ASCII family of codesets, it
      is assumed that Native DCE implementations based on OSF/DCE 1.0 can
      only support the Homogeneous Codeset Environment described in section
      A.1.1.  However, the existing RPC facility does provide the
      capability to support an interesting subset (or special case) of the
      Heterogeneous Environment (see section B.1.4) through its definition
      of "ASCII" and "EBCDIC" tagging.  This special case can be defined by
      applying the following restrictions to the Heterogeneous Environment
      definition.

        (a) The number of Network Interchange Encoding which can be used in
            a given DCE Logical Network is restricted to precisely 2.  One
            of these must be "ASCII" based and the other "EBCDIC" based.
            (Heterogeneous Environment)

        (b) Each DCE Process in the DCE Logical Network must use one of the
            Network Interchange Encodings as its codeset.  The character
            set is the same for both Network Interchange Encodings as well
            as the codesets used within the DCE Processes.  (Homogeneous
            Network Codeset Environment)

        (c) Each DCE Process is responsible for converting data received
            from a DCE Process using a different codeset than what is being
            used locally.  (This requires a given DCE Process to maintain
            only one conversion table which provides the mapping from the
            other Network Interchange Encoding to the one being used
            locally.)


   REFERENCES

      [Klin]      S. Kline, "OSF I18N SIG Generic Requirements", I18N SIG
                  paper, August 21, 1992.


   Thormodsen                                                       Page 23


   DCE-RFC 13.0           DCE 1.1 I18N Requirements             August 1992


      [Ogur]      T. Ogura (with S. Martin), "DCE 1.1 I18N Workbook",
                  Preliminary Draft, June 5, 1992.  [This is an OSF
                  internal investigation report, available to DCE licensees
                  only.]

      [SKRS]      S. Snyder, H. Kushki, F. Rojas, E. Stokes,
                  "Internationalization in the OSF DCE, A Framework", DCE
                  SIG working paper, May 22, 1991.


   AUTHOR'S ADDRESS

   Arne Thormodsen                         Internet email: arnet@cup.hp.com
   CSO Internationalization                      Telephone: +1-408-447-4798
   Hewlett-Packard Co.
   19447 Pruneridge Ave.
   Cupertino, CA 95014
   USA


   Thormodsen                                                       Page 24