10/26/04 ICSC ITWG Meeting Minutes Taking minutes: HP (FW) Y HP Jim Hamrick (JH) (after 15 minutes) Y HP Jay Rosser (JR) Y HP Fred Worley (FW) Y IBM Fredy Neeser (FN) N NetApp Arkady Kanevksy (AK) N Sun Matt Pearson (MP) cascading ascii art attendance diagram. (if you have more than 1 minus visible, you are not eligible to vote.) hp ibm netapp sun ---- ----- ------ --- m-3 + + - - m-2 + + - - m-1 + + - - m-0 + + - - Next Meeting: Tuesday 11/2/04 Minutes: 0) Additional agenda items - RNIC PI version number issue - Memory management - Section on Remote Invalidation can be reviewed 1) Approval of previous minutes Minutes to approve: - Email from Fred Worley, subject "ICSC ITWG draft minutes, 9/28/04", sent 10/21/04 3:57PM PT - Email from Jay Rosser, subject "draft ICSC ITWG minutes for 10/5/04", sent 10/5/04 7:10PM PT Minutes approved 2) Action item review 2a) AI - FN - Emit new requirement using above syntax that captures notion of an API where applications could be requested to return memory to the O/S (or invalidate STags). Should be a phase 3 requirement. - Sent out to reflector today by FN - Problem that OS has no way to revoke existing memory registration - Agreed to delay addressing this issue to Phase 3 CLOSED 2b) AI - FN - request Global Behavior section from editor and update as above ("above" to be found in AI review from 10/5/04 minutes) - Global behavior man page being edited CLOSED 2c) AI - MM group to create text describe potential hazard of use of stale STags with fast re-registration use model. - Added text to it_lmr_link() man page CLOSED 2d) AI - JH to determine if this flag is truly necessary or can be handled with existing flags (refers to "REMOTE ACCESS FLAG" - MM-D1.1.2) See email from Jim Hamrick, subject "My AI to determine if existing IT-API memory access control flags suffice for iWARP", sent 10/21/04 6:23PM PT - JR made argument for why existing flags suffice - with existing flags, can't disable ability to bind to a MR - JR doesn't see great value in disabling ability to bind to MR - API does not currently provide ability to create STAGs with write only access - not sure why one would want to do this - can't create STAGS without any access at all allowed through them with existing ITAPI - prohibited to use the IT_PRIV_NONE flag in the man page - ability to create LMR that no-one can access; not sure how this would be used, but if someone comes up with a use model, then this may need to be addressed - no control over fast register vying for remote access are disabled for a given STAG - with verbs, can disable fast register operations - no flag in ITAPI to enable or disable fast register (no concept for fast register in current API) - simplifying assumptions may be possible based on the privileged level required to use fast register operations - still under investigation by MM group - FN: no compelling reason for exposing ability to disable remote access flag - JH: recalls that this issue was related to conserving hardware resources for certain RNIC implementations - Summary: Some things we can't do, but not clear that any of these things would ever need to be done - therefore, no need for change FN: Will post to reflector that we have voted not to expose the remote access flag of an LMR and allow community to provide feedback - provided by MR in iWARP - was exposed as LMR attribute in draft requirements - propose always enabling this attribute and not exposing it to iWARP consumer Closed, pending feedback 2e) AI - FN will add text stating why the restriction is necessary (refers to use of single scatter element from RDMA read restriction) - FN added sentence to requirement MM-10.4.D3.6.2 - Discussion: - Why is the limitation necessary (that when posting an RDMA Read only a single data sync STag can be used) - Do not awnt the UP to completion partially - If RMR can be invalid or a subset of syncs can be invalid, then a partial completion is possible - Partial completion would be a Bad Idea CLOSED 2f) AI - FN/JR - Can an RMR or LMR be linked or unlinked via an Endpoint in the unconnected state on InfiniBand? Current IT API pages deny this - see if InfiniBand verbs allow it. - still pending OPEN 2g) AI - MM group - determine if QP can be destroyed or reset if MWs associated with it (for IB, IB VE, and iWARP). - still pending OPEN 2h) AI - JR - emit proposal on how to deal with turning off IOH for DAT compatibility. - still pending OPEN 3) iWARP CM issues 3a) Email thread started by Jay Rosser, subject "[Fwd: First draft of CM man pages]", sent 9/28/04 3:10PM PT 3b) Email thread started by Fredy Neeser, subject "Re: First draft of CM man pages", sent 9/29/04 8:08AM PT 3c) Email thread started by Jay Rosser, subject "[Fwd: CM review call with Fredy 9/29/04", sent 10/4/04 9:03AM PT 3d) Email from Jay Rosser, subject "Second draft of CM man pages", sent 10/14/04 11:34AM PT - Discussion postponed to allow more time for review 4) Ladder diagram discussion: - FN: Having transport independent ladder diagrams may be helpful to a consumer that is new to the protocol - could be included with man pages for it_ep_connect, it_ep_accept, it_ - JH: Could address in the introductory section (section 1.3) - FN: Could talk about TII and TDI models in this section as well - Agreement - Any objections to including the transport independent ladder diagrams? - could see TII going in ep_states man page and TDI ... - JH: if we go this route, then the MM man pages will have more data on linked and unlinked LMRs, how you get from one state to another, etc? - General question: Should introduction be enlarged a bit? It is quite brief compared to the detailed man pages - no strong objection to adding this detail, but some concern about consistency of the document; if one section is enlarged and others are not similarly enlarged, then the document will be somewhat out of balance; this is not a correctness issue, however. - compromise: add additional ladder diagrams to the man page rather than the introduction - could also show ULPs (e.g. SDP, iSER) for example in TDI documentation - somewhere have to explain the difference between the TDI and the TII - connections established in a different way, but both provide an endpoint that is used in the same way - Concern about going into it_socket_convert in the introduction - this delves into the transport *dependent* use model, which is inconsistent with the overall goal of the API which is to provide transport independence - propose subsection in introduction on transport dependent aspects - mention connection establishment feature that is only applicable to iWARP, any other things - give reference to the man pages without providing other details - allows explaining TII and TDI in the first chapter, with clear ID of the TDI as TD and minimal additional text (the bulk of the explanation provided through links to the man pages) 2 sets of ladder diagrams: - implementation perspective - will add to imp. guide - consumers perspective - will add to man pages Should consumers perspective ladder diagram be included, and if so where? - propose subsection in introduction on transport dependent aspects - mention connection establishment feature that is only applicable to iWARP, any other things - give reference to the man pages without providing other details - allows explaining TII and TDI in the first chapter, with clear ID of the TDI as TD and minimal additional text (the bulk of the explanation provided through links to the man pages) 5) RNIC PI version number discussion - See mail from FN, Subject: Interoperability and RDMAP/DDP version number problem; Date: Wed, 27 Oct 2004 18:58:53 +0200 - Question of compatibility between RDMAC and IETF devices - RNIC PI WG appears to not understand the ITWG's intent w.r.t. interoperability; should clarify - would like to enable interoperability between new devices (e.g. IETF compliant devices) with existing devices (e.g. RDMAC compliant devices) Issue: - RDMAC uses version number 0; IETF uses version number 1 - protocol version number appears to the only (or most significant) difference between the two versions - question is what the device is required to do when there is a version mismatch - Specs specify that must be reported as error - Proposed solution: - Assume for RDMAC devices, the RDMAP and DDP version numbers will be 0 and can not be changed - From discussion in RNIC PI WG, assumption that the version numbers can be set globally, such that device could act like either RDMAC or IETF device - FN: would prefer to have behavior set on a per-endpoint, not a per IA / RNIC, basis - could be done by modify-qp-to-rts verb - not sure if this should be mandatory or optional feature [Proposal by FN, in blue text and Courier font; discussion notes in black text and Arial font]: > As can be seen in the RNIC-PI Minutes, 10/21/04, some IHVs do > not show a strong interest in RDMAC/IETF interoperability. > However, for OSVs and Standards bodies, interoperability is > typically an important consideration, particularly if it is easy > to achieve. > > So here's a proposal to resolve the problem: > > - Assumption: The RDMAP/DDP version is 00b on an RDMAC-based > RNIC and cannot be changed. > > - An RNIC for which the RDMAP/DDP version could be set > to 00b globally isn't of much help for RDMAC/IETF > interoperability, so we will not consider this further. > > RNIC-PI Requirements > -------------------- > - RNIC-PI provides an RDMAC compatibility mode that can be > enabled per QP through the Modify-QP-to-RTS verb. > If this mode is enabled, RDMAP/DDP generate and expect > RDMAP/DDP version number 00b. > [Should this be a mandatory or optional RNIC-PI feature? > If it is optional, then a boolean RNIC attribute is required.] - JR: Only difference is the version number; why are they different? - FN: for RDMAPI, should be parameter for the QP - for ITAPI, should be possible to select compatibility mode (e.g. with it_ep_connect) > IT-API Requirements > ------------------- > - If MPA Startup is suppressed (for TDI only - socket conversion), > RDMAC compatibility mode must be enabled. > [Prevents use of MPA startup suppression for purposes other > than IETF/RDMAC interoperability] > - If this rule is violated, an immediate error is generated. > > - If MPA Startup is not suppressed and an RDMAC compatibility > mode is supported by the RNIC, then the IT-API Consumer > must be allowed to enable the RDMAC compatibility mode, > both for the TII (it_ep_connect, it_ep_accept) and > for the TDI (it_socket_convert). > - For it_ep_connect, use the transport-specific connection > attributes conn_attr to enable the RDMAC compatibility mode. > - For it_ep_accept ??? Does this go in accept or listen? - JH: could avoid having in accept by handling in listen; add a new listen flag - FN: don't know in advance if you should enable the RDMAC compatibility; may want to try without compatibility mode and then it fails; you would then try again enabling it - if you did with listen, how would you do this? - on active side, it is obvious; on passive side, how does it work? - Agreed that retry mechanism for passive side is not clear thinking aloud: - side that initiated will receive terminate message - What about TII: on passive side, how does passive side find out that the active side didn't like its version? - argument would be that active side is programmable device and not a storage target and could therefore be upgraded - who discovers a version number mismatch? - active side will discover? Will get terminate message when first DDP frame with bad version number is sent - is there any exchange of version number prior to the first data frame? - don't believe so - MPA does not exchange a version number - if only discoverable when data transfer is initiated, this is not very recoverable; best to avoid the problem by ensuring that mismatched connections don't happen - Example: suppose you know remote is RDMAC device - could set app to always call for version 0 - would be undesirable to call it_ep_connect; would try version 1, see that it doesn't work, then fall back to version 0 - opinion posed that it is best to select by application, not under the covers by the implementation Does this need to be exposed to app level? - recovery mechanism if exposed to app is burdensome on the app - if we don't want consumer to do recovery, then it should be up to sysadm, etc, to set version number - Is it system issue? Why did IETF change version number? - use for 3 bits were not clearly specified in version 0; now clearly specified in version 1 - in terminate header - p27 of RDMAC version of RDMAP specification - concerns the HDRCT bits - figure 8 - terminate control field of the terminate header - 3 bits in HDRCT block of header - position of the 3 bits has not been specified in RDMAC document - Would IHV community be willing to change the device per QP? Agreed that this is a thorny problem Not clear that devices would be able to provide a user-specified version number, particularly on a per-EP basis > - For it_socket_convert, use the flags argument to enable > the RDMAC compatibility mode. > - If a Consumer attempts to enable the RDMAC compatibility > mode but the feature is not supported, an immediate error > is generated. > > Note that an RDMAC compatibility mode is a transport dependency, > but adding it seems to be the only way to make RDMAC-based > devices IETF compliant (well, IETF compliant except for the > RDMAP/DDP version number). > > How could we enable RDMAC compatibility mode for it_ep_accept? Additional discussion: If version number is hard coded (in the RDMAPI) - Quite unlikely to be able to fix a deployed device so that it will emit version 1 instead of version 0 for RDMAP - can emit terminate control field in the version 1 format - may not be able to change the version emitted by or checked for RDMAP and DDP headers - Options: - ask IHVs that have RDMAC devices to fix devices that are in the field - note that they are broken already - however, unlikely that IHV would fix What happens if bit order in the terminate header is wrong (does not match)? - two RDMACs with different interpretations of those bits will not interoperate - makes the information in the terminate header useless - fields indicate what data is in terminate header - application will not know that the terminate header fields are wrong - speculating on theoretical worst case: getting a bad terminate header could cause the card's state machine to get into an unknown/unrecoverable state - only an error in the failure case - does not create trouble from a functional perspective in the API Can it_ep_connect() still work? - yes, because connect portion of the API is not where the problem would be discovered - version number rmismatch would only be discovered on first data packet (DDP header) - would like to catch the problem sooner, as mismatch after data transfer begins results in loss of connection Expect that IHVs will fix their implementations to use the order specified in version 1 - may or may not be able to change the version number to match the specified order - therefore, may find that bit order is a non-issue (all IHVs comply with "correct" ordering per IETF/RDMAC errata) Is there a need to attempt to determine at connect time (before RDMA data exchange) if there is a mismatch? - could only do this for it_ep_connect, where we already define a ULP - JH: If we want to resolve this problem generally, we should do so outside our ULP header Assume that there is a table that allows device on per EP basis to determine v1 or v0 - how would this be programmed? - sysadm? Is it possible to constrain this at connection establishment time? - ignoring RTR state, one side will not be in connected state until the first send is done - therefore, could manifest the failure as a connection establishment failure on receipt of first message - this doesn't work for initiator/active side - it must be in connected state to send the first RMDA message Very first IETF DDP and RDMAP draft specs (from 2002) specified a version number of 1 - from discussion by initial participants in the IETF effort, expectation was that there would be a gap between implementations - currently only difference is the version field (with the caveats discussed above) - not clear that new version number is necessary; need to validate for both RDMAP and DDP - Jay will investigate differences between DDP v0 and v1 ACTIONS: AI: JR - Look at RDMAP to determine if there are broader interoperability issues than the header bits AI: All - Review proposal by FN, provide feedback, particularly w.r.t. providing a solution that could be accepted by IHV, OSV communities. 4) MM issues 4a) Email thread started by Fredy Neeser, subject "MM specs: Difficulties if 0 can be a valid virtual address", sent 10/21/04 8:00AM PT 4b) Email thread started by Fredy Neeser, subject "PBL types", sent 10/19/04 10:09AM PT - not reached in agenda 5) Next steps Focus on man page generation, occupy spare time in telecons with errata review, next round of detailed requirements to be prioritized. 6) Any other business None Meeting adjourned, 12:07