OSF DCE SIG D. Delgado (OSF) Request For Comments: 51.0 November 1993 DFS INTEROPERABILITY ISSUES FOR 32-BIT AND 64-BIT ARCHITECTURES 1. INTRODUCTION The current state of DFS with respect to interoperability between 64-bit and 32-bit architectures is that this functionality is only partially implemented. The wire protocol does support 64-bit architectures in that file sizes and offsets are declared as 64-bit entities, and these 64-bit entities are passed between client and server. However, it is how this information is actually used which creates interoperability problems between the two architectures. This document discusses in detail the interoperability problems with respect for DFS between the two architectures, and proposes solutions which will allow DFS to behave more reasonably. Note that we do not intend to discuss DFS 64-bit cleanliness; the work described in this document is dependent on DFS's being 64-bit clean. (Note however that there is some overlap between 64-bit clean and the issues presented here). The issue of 64-bit cleanliness will be dealt with in another forthcoming RFC. 2. TERMS We use the term "large file" to refer to a file whose size would go beyond what a 32-bit file size could hold. 3. PROBLEM STATEMENT As previously stated, the DFS protocol definition does support 64-bit quantities for file sizes and offsets for across-the-wire transfer of data. However, it is the actual use (or lack thereof) of this information once it reaches its destination which creates the interoperability problems between the two architectures. We would like to emphasize here that neither the problem nor the solution is restricted to the 64-bit platforms. The issues and solutions affect both architectures. The most typical example is that file offsets and sizes are declared to be AFShypers (64 bits), but in reality only the low 32 bits of each of these entities is actually used by the client or server. Both client and server, when transferring the data which has come across the wire (this is outside of the IDL stubs) to its internal Delgado Page 1 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 data structures, consistently ignore the high 32 bits of the file size or offset and only store the low 32 bits. In most cases, the high 32 bits are never examined to determine whether there is any useful data; it is always assumed that these fields contain nothing of value. The following is a list of problem areas for 32/64-bit interoperability: (a) The SAFS_FetchStatus call will zero out the high 32 bits of the file size attribute, which eventually is returned to the client. What should really happen here is that the server should pass the entire file size to the client, thus permitting the client to receive the correct value which represents the file's size. The side affect of this is that when the client intends to append data to the end of a file, it may not have accurate information regarding where the end of the file actually lies. The result is that data can be written to the wrong place in the file. (b) The SAFS_StoreData and SAFS_FetchData calls extract only the low 32 bits of the file offset and never examine the high 32 bits. What should happen here is that these calls should examine the entire file offset to determine which file block the client wishes to access. Again, the side affect of this is that data may not be stored where the client intended. (c) The cache manager's internal Scache structure has the file length declared as a long. While this is permissible for 64- bit architectures where long is implemented as 64 bits, it is incorrect for 32-bit architectures. In fact, this field should be declared as a hyper, so that the client will maintain accurate information regarding the file's size. (d) Readdir is a problem for the 32-bit client. We need to return an error if the directory is too large, once we pass the 4-Gig mark or alternatively disallow access to such large directories. The SAFS_Readdir function also ignores the high 32 bits of the 64-bit directory offset which is passed to it. (e) There are interoperability problems with token management. These problems relate to the notion of the value used to indicate that the caller is requesting tokens for the entire file. The current code uses the values: token.endRange=0x7fffffff token.endRangeExt=0 Delgado Page 2 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 The 64-bit world uses the values: token.endRange=0xffffffff token.endRangeExt=0x7fffffff This particular problem will be discussed in more detail in the section "Token Management". (f) Interoperability problems with locking. This has the same issue as token management with the value used to indicate the caller is requesting a lock on the entire file (discussed later). Each of these problems has several solutions, and the "correct" solution will depend on how much interoperability we want to provide. Also, when determining what is best, we must also keep in mind how much backward compatibility we intend to provide for 1.0.3-based DFS clients and servers. 4. SERVER-SIDE SOLUTIONS For the calls SAFS_Readdir, SAFS_FetchData, SAFS_StoreData, the problems on the server side are similar and so, too, are their solutions: (a) 64-bit server -- The 64 bit clean changes will resolve this, since the effect will be to transfer both high and low values of file offsets and sizes. (b) 32-bit server -- The server code should be modified to examine the high 32 bits and return EFBIG in cases where the specified file offset is larger than what the server supports. Similarly, the issue SAFS_FetchStatus will also be resolved on the server side by the 64-bit clean changes. However, it is worth noting that for this particular case, the 64-bit clean changes will not solve the problem on the 32-bit client side because of the definition of file length field in the cache manager's internal data structures. In addition to these changes, we must also decide what client and server will do in the case where the file's size is larger than what each supports. 5. 32-BIT CLIENT, 64-BIT SERVER What should the DFS client do when presented with a file whose size is larger than the size which the DFS client can handle? How should the size be represented to the client's OS, for example, in a stat operation? Delgado Page 3 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 There are several solutions to this: (a) Don't allow 32-bit clients to access large files. This could either be enforced on the server or the client side. Enforcing this on the server side would imply that the server has knowledge about the maximum file size of its DFS clients. The current DFS protocol does not support the exchange of such information between client and server. However, we can implement a means to exchange such information; this is discussed in more detail in the section, "Backward Compatibility". Enforcing this on the server side would require all SAFS functions, except those which do not operate on files or directories, to add a macro which will perform checks to return EFBIG if the requested operation is on a large files if the client is a 32-bit client. To enforce this on the client side, the cache manager would be modified so that each of the cm vnode functions performs a check on the size of the file, and returns EFBIG if the file is too large. With both methods, one must still deal with the issue of whether to show the client large files in readdir operations. The simplist solution would be one which allows such files to be displayed during readdir but not accessed. Filtering out directory entries during readdir is more complicated. We'll leave the investigation of the latter for later, if it is decided that we are to proceed in that direction. Either solution can also become complicated in the case that our 32-bit client is accessing a file which is not too large and then that file suddenly grows (e.g., because of a data write by another client) so that it is now considered too large. Our client would then have to flush all modifications back to the server (most of this would have been done during token revocation anyway), invalidate all the cached data for that file, and then disallow access to that file. Of course, this assumes that the client always requests tokens on the entire file, which it currently does. When backward compatibility with 1.0.3 clients is considered, then the only choice is to implement the checks on the server, since 1.0.3 clients will always attempt to perform operations on files regardless of their size. (b) Allow clients to see even the large files, but allow them to only modify or access data up to the limit which the client supports. Under this solution, the read, write, lseek, and locking calls would all return EFBIG, once the requested operation goes beyond the file size supported by the client. Delgado Page 4 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 First we would require each platform to define its maximum supported file size with a constant OSI_MAX_FILESIZE. The key here is to be sure that we always return an error code to the client when it crosses the OSI_MAX_FILESIZE boundary which would indicate to an application that there is more data available but this data is not accessible (EFBIG comes to mind). We are cautious regarding this because we do not want programs such as cp (copy) to return success, when in fact, we are not able to copy the entire file. The modifications to effect all of this would be made on the cache manager side. The idea is to prevent reads and writes beyond the client's OSI_MAX_FILESIZE even if we are not yet at EOF. The following paragraphs briefly describes the necessary modifications to cm_read and cm_write to implement this model. (There are changes in other areas of the cache manager as well; these are enumerated in the Appendix.) (i) cm_read() We'd read until the file position is >= file length or until file position is >= OSI_MAX_FILE_SIZE. If we hit the latter case, then return EFBIG. We will also need to adjust the amount of data to actually read to accommodate the case where OSI_MAX_FILE_SIZE offset does not fall on a cache manager chunk boundary. (ii) cm_write() This performs a special check for ioflag & IO_APPEND. For this case, don't blindly set file position to be the length of the file. Return EFBIG if the files m.Length is >= OSI_MAX_FILESIZE. In the case where OSI_MAX_FILE_SIZE does not fall on a chunk boundary, we'll need to adjust the amount of data we'll allow the user to write so that the file's size does not grow beyond OSI_MAX_FILESIZE. This is very similar to what the cache manager already does with the ulimit value for file sizes. 5.1. getattr, lseek Operation The cm will return -1 for the file size for large files for the get attributes (stat) operation, as specified by the POSIX 1003.8 TFA draft. lseek is also supposed to return EFBIG, however the DFS VFS interface does not provide a seek operation. So for systems with interfaces which support this, the os2vfs routines which provide the mapping from the native OS VFS to the DFS VFS can implement this functionality. Delgado Page 5 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 5.2. Section Summary The simplist solution is the one which allows clients to know of the existence of large files but does not permit them to read or write such files. This has the disadvantage, of course, of the 32-bit client not being able to access data for such files. Note that the option to permit reads and writes up to the maximum file size allows a larger degree of interoperability and it also conforms to the behavior specified in the POSIX 1003.8 TFA draft. Although newer versions of DFS would presumably implement one of the aforementioned schemes, there are still outstanding issues with old 32-bit clients, who still are using 32-bit file sizes. As previously described such clients do not have an accurate view of the file's true length which a large file is involved. After having considered all the options, the best solution (i.e., one which provides maximum interoperability with backward compatibility) would be one in which a 64-bit server could determine that a particular client is an "old 32-bit client" and thus, prevent that client from accessing large files, while allowing clients who have the 32/64-interoperability code to access large files up to the client's OSI_MAX_FILE_SIZE. (The server could possibly make such a determination by using a modified AFS_SetParams call.) 6. 32-BIT SERVER AND 64-BIT CLIENT What should the 32-bit server do when a 64-bit client extends a file such that it becomes too large for that 32-bit server? With the current implementation of DFS, the 32-bit server will ignore the high 32 bits of the file offset, using only the low 32 bits as the offset at which to write the data. The end result is that the data is not written where the application intended. At a minimum, the server code should be modified to check the full 64-bit file offset and return EFBIG if the client is requesting an operation which it cannot support. However, because of the DFS asynchronous store back mechanism, the client application may not receive the EFBIG error until it has written a substantial amount of data. The client may receive the error on a subsequent write call, after the actual write call which generated the error. Thus, the client cannot be certain how much data was written back to the server. We could avoid this situation if the DFS cache manager could prevent applications from writing files whose sizes are beyond the limits of the DFS server on which they are actually stored. There are some options here: Delgado Page 6 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 (a) Add functionality so that the client and server communicate information about maximum file size during first contact, much like what is done for the TSR parameters in AFS_SetParams. There seems to be spare space in the afsConnParams structure which is passed in the AFS_SetParams call. We could probably use one of these spare fields as the server's maximum file size specification. (The DFS interface declares afsConnParams structure as in/out, so we can use it to return some information to the client.) This is the current definition of the afsConnParams structure from common_defs.h: typedef struct afsConnParams { unsigned32 Mask; unsigned32 Values[20]; } afsConnParams; #define AFS_CONN_PARAM_HOSTLIFE (0) #define AFS_CONN_PARAM_HOSTRPC (1) #define AFS_CONN_PARAM_DEADSERVER (2) We could use slots 3 and 4 to effectively function as a hyper which would hold the server's maximum file size: #define AFS_CONN_PARAM_MAXFILE_HIGH (3) #define AFS_CONN_PARAM_MAXFILE_LOW (4) We would need to modify the SAFS_SetParams call to handle a request from the client for the maximum file size information. On the cache manager side, cm_write would require modification to prevent the client from writing beyond the server's OSI_MAX_FILE_SIZE. We would also need to add code in the cm to initiate a AFS_SetParams call to retrieve this information from the server. (b) Force the first write to file locations beyond 4-Gig to occur synchronously. If EFBIG is returned, then disallow subsequent reads or write to the above 4-Gig regions of the file. The disadvantage is that we are required to make an rpc to the server for each file whenever we are performing a write beyond 4-Gig for the first time. What should be done in the case of a timeout for the first write beyond the 4-Gig mark? We can't take advantage of the backgroup retry mechanism already present in the DFS cache manager, since this would mean that the client application might have the opportunity to write even more data before it discovers that this data cannot be stored on the server. This may prove to be more work than it first appears. Delgado Page 7 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 (c) Do nothing and live with the uncertainty. One problem with this approach is that the client application must perform extra steps after receiving the EFBIG error in order to attempt to recover the lost data. In such instances applications can never be certain exactly how much data was written to the server before the error occurred. We believe that option (c) should not be considered, and we should select either (a) or (b). Solution (c) will have grave consequences for large files with old 32-bit servers since these servers will continue to disregard the high 32-bits of the offset when storing data. Consequently client data intended to be written at locations above the 4-Gig mark will be stored unexpectedly in data blocks below the 4-Gig mark. Options (a) and (b) have the benefit of protecting clients from old servers, as well as avoiding such scenarios which may cause clients to loose data. Of the two remaining candidates, solution (a) is easier to implement. 7. Token Management As previously mentioned, the notion of the value which represents "entire byte range of the file" is different between the two architectures. One solution would be for everyone's notion for the entire file to change to match the 64-bit version. This implies that routines such as cm_GetTokens and cm_GetDSLock would require modifications to reflect this. Any solution assumes that DFS has 64-bit clean token management, which it currently does not. There is also the question here of backwards compatibility. An old 32-bit client will ask for only up to 0x7fffffff. This is probably acceptable for any server who has the new notion of the entire file value. However, the old 32-bit server will have a problem with any client using the new format, becase it only examines token.endRange field which in the new world is 0xffffffff (or -1). Preliminary experimentation showed that this caused the 32-bit server to hang. There is a additional issue worth noting that we could use a minor protocol revision (although I know that people are generally opposed to such things) to prevent new clients from interoperating with old servers, and thus averting possibly grave consequences. The alternative to having both architectures retain the same "entire file byte range" notion, would be to have a flexible notion for "entire file byte range" according to the server's OSI_MAX_FILE_SIZE. For 32-bit servers, this would essentially be the same as what is currently done for DFS: the token endRange field is set to 0x7fffffff. For 64-bit servers, the notion of "entire file byte range" would extend to the 64-bit representation of the largest file Delgado Page 8 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 offset. This particular solution would require that cm_GetTokens, which requests tokens for the entire byte range of the file, to use 0x7fffffff as the value for the endRange field for 32-bit server and use 0xffffffff and 0x7fffffff for endRange and endRangeExt for 64-bit servers. Any callers of cm_GetTokensRange would have to set the token end range appropriately as well. Naturally, this method assumes that the client has a means of determining in advance the server's OSI_MAX_FILE_SIZE. Refer to the section "BACKWARD COMPATIBILITY" for more information. Again, the "best" solution must also take into account the ability to accommodate any old 32-bit servers which pre-date the 32/64-bit interoperability code. Ideally, we would like all DFS clients and servers to share the same notion for "the entire byte range of a file", however the consequences with respect to old 32-bit servers seem rather grave. The solution which employs the variable notion for the byte range of the file is not as clean as the solution which promotes a single view of the universe, however it will allow new 64-bit clients to interoperate with old 32-bit servers. 8. File and Record Locks Like the token management code, the locking functionality also has interoperability problems with the notion of the "entire byte range" of a file being fixed at the 32-bit value 0x7fffffff. Locking, unlike token management, has the problem that the application is aware of whether it has locks on the entire file or not. Again we'd like to change the notion of the entire byte range to reflect whether the server is a 32-bit or a 64-bit server. There is a second issue with file locking, namely what to do when the 32-bit client requests a lock on the entire file, when the file's size is greater than the client's OSI_MAX_FILESIZE. Again, there are multiple possibliities: (a) When the caller requests to lock the entire file and that file's length exceeds OSI_MAX_FILESIZE, we'll return EFBIG. (b) The cm would allow the client to lock the entire file, but disallow reads or writes to the portions of the file beyond 4- Gig. This solution should be used if we are allowing the 32- bit clients to access data up to 4-Gig. Both options require changes to cm_lockctl (cm_lockf.c) to support this. (Note: cm's internal cm_lockf startPos and endPos fields are declared as long's; the proposed 64-bit clean changes will change these to effectively function as hypers.) Delgado Page 9 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 9. BACKWARD COMPATIBILITY Probably the most compelling reason to attempt to implement a scheme which allows old 32-bit servers to interact reasonably with DFS clients which contain the new 32/64-bit interoperability code is the potential for data corruption. Most of these problems can be avoided if the client or server knows ahead of time what type of server or client it is dealing with. This could be effected by a modified AFS_SetParams call in which new clients and servers could specify a flag to indicate what type of 32/64-bit compatibility they support. New Client Old Server ---------- ---------- AFS_SetParams ==> Old servers don't return anything client sets 32/64 in AFS_SetParams, although the interop flag. structure is declared in/out; they do not return an error even if they don't recognize the Server did not return <== flags being passed to them. a 32/64 compat flag; client assumes this to be an old server. Use old-style byte ranges and 0x7fffffff as the server's OSI_MAX_FILE_SIZE. In the case where a new client is contacting a new server, the new server will return its OSI_MAX_FILE_SIZE to the client. The default behavior for new servers is to assume the client does not support 32/64 bit interoperability. The effect of the new client performing the modified AFS_SetParams call is to change the default assumption of the new server. 10. SUMMARY The minimalist solution requires that new 64-bit servers and clients protect themselves from old 32-bit clients and servers. This solution consists of: (a) Exchange of client/server options via a modified AFS_SetParams. Note that the alternative to this would be a minor protocol revision (to protect servers from old clients). (b) Servers check entire 64 bits of file offsets and sizes; return EFBIG where appropriate. (c) Denial of access to large files for old 32-bit clients; this assumes that we are using the modified AFS_SetParams calls rather than a minor protocol revision. Delgado Page 10 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 (d) Denial of access to large files for all 32-bit clients. (This can be exchanged for another solution if more interoperability is desired). (e) Variable notion for entire byte range for token management and locking. (f) Clients prevent writes beyond server's OSI_MAX_FILE_SIZE. (This protects them agains old servers). The minimalist solution is more of a protection model; i.e., protecting ourselves against danger, rather than a model of true interoperability. To provide better interoperability with reasonable semantics for DFS, while attempting to accommodate existing 32-bit servers, we propose a solution which combines the following: (a) Include items (a-c, e-f) of the minimalist solution. (b) Partial access method between new clients and new servers. This is recommended for POSIX conformance. 11. OTHER ISSUES Thus far, we have only discussed issues relevant to the basic filesystem operations such as reading and writing files. There are also similar issues regarding the DFS advanced fileset features and large files as well. 11.1. Replication Replication should fail gracefully for filesets with large files. 11.2. Fileset Move Certainly there are issues about moving filesets with large files. to 32-bit systems. At a minimum, we would want the move to fail gracefully if not all of the files can be moved to their target host. This particular item requires verification to determine if this is already the case. APPENDIX A. Cache Manager Changes This Appendix describes the areas in the cache manager which would require modification to implement the partial access for large files on the 32-bit client. These changes assume that all clients will store the full 64 bits in the scache Length field. (a) cm_scache.c: Delgado Page 11 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 (i) cm_ScanStatus() This function fills in the status information to be stored back to the server. In the case where the file's length has been modified,the cm should be using both high and low words for the length. This routine should use a macro to move the entire hyper. (b) cm_tknimp.c: (i) Revoke_DataToken() Here the token revocation code needs to store back all modified chunks until EOF is reached. We should probably change the check for EOF from: if (filePos >= scp->m.Length) to something which will also check for the OSI_MAX_FILE_SIZE boundary: if (hlencmp(filePos, scp->m.Length) || (filePos >= OSI_MAX_FILE_SIZE)) where hlencmp is a platform dependent macro which can perform the correct comparison for the filePos and the file's length. (ii) TKN_GetCE() This is a debugging RPC call -- we have no idea what it's purpose is, and it probably does not matter since this is for debug use only. (c) cm_dcache.c: (i) cm_ComputeOffsetInfo() cm_ComputeOffsetInfo is attempting to compute the offset into a particular chunk given a byte offset file position. It will also compute the amount of data remaining in that chunk, taking into account the file's size so that we don't read beyond EOF. This computation should be modified to take into account the client's OSI_MAX_FILE_SIZE, and lenp should be adjusted accordingly. (ii) cm_FetchDCache() This tries to determine how much of the chunk represents actual data. If chunk to base >= maxGoodLength, this Delgado Page 12 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 represents beyond EOF (we are probably appending), and we can return a chunk of 0's. Here, we want to perform a check for fetching beyond the client's OSI_MAX_FILE_SIZE. So we should perform a comparison of: if (cm_chunktobase(chunk >= OSI_MAX_FILE_SIZE) { do-any-cleanup(); return EFBIG; } before checking whether we are beyond EOF. (iii) cm_TruncateAllSegments() cm_TruncateAllSegments determines whether a change in the file's length necessitates truncation. This function should examine all 64 bit of the file's length to determine whether or not to truncate. (iv) cm_StsoreDcache() cm_StoreDcache makes a reference to the scache m.Length field: tlen = scp->m.Length; "tlen" is not used for anything we should probably remove this line of code. (v) cm_UpdateDCacheOnLine() cm_UpdateDCacheOnLine uses the following comparison when determining what to do with tokens: (otknp->endRange > scp->m.Length)) { Note that this should be a macro comparison to compare hypers. (d) cm_vnodeops.c: (i) cm_ustrategy() This has the line: len = MIN(len, scp->m.Length-dbtob(bufp->b_blkno)); This is platform dependent code in cm_ustrategy; the MIN macro will probably require some adjustment. Delgado Page 13 DCE-RFC 51.0 DFS INTEROPERABILITY ISSUES November 1993 (ii) cm_getlength() cm_getlength, is a VFS interface routine. This should probably be modified to return something appropriate for large files. (This would be either -1 for the length or just an errorcode EFBIG.) Its counterpart cm_setlength requires the use of a macro to set the scache Length field appropriately. (iii) cm_MergeStatus() cm_MergeStatus stores only the low 32 bits of the file's size in the local scache: scp->m.Length = fetchStatusp->length.low; Use a macro here to store the full 64 bits on the client. (iv) cmattr_to_vattr() cmattr_to_vattr returns file attributes, using only 32 bits of the file's length: attrsp->va_size = scp->m.Length; This should be modified to use -1 if scp->m.Length is greater than OSI_MAX_FILE_SIZE. REFERENCES [POSIX1] IEEE Std 1003.8 Draft 6, "Transparent File Access Amendment to Portable Operating Systems Interface ", Jan 1993. [STRANG] S. Strange, " DCE/DFS Source Code Cleanup to Support Both 32-bit and 64-bit Architectures ", Oct 1993. AUTHOR'S ADDRESS Diane M. Delgado Internet email: delgado@osf.org Open Software Foundation Telephone: +1-617-621-7283 11 Cambridge Center Cambridge, MA 02142 USA Delgado Page 14