Resent-From: icsc-nativewg@opengroup.org From: Fredy Neeser Date: August 25, 2004 5:13:53 AM EDT Resent-To: icsc-nativewg@opengroup.org To: icsc-nativewg@opengroup.org Subject: draft meeting minutes 24/08/2004 24/08/04 ICSC ITWG Meeting Minutes Taking minutes: IBM (FN) Time: 10:00 am - 12:00 am PDT Participant code: 356608 US Dialin: 1 866 874 0872 Intl Dialin: +44 1452 562 905 UK Dialin: 0845 146 2019 Meeting attendance diagram (if you have more than 1 minus within the last four ITWG meetings, you are not eligible to vote): mtg date hp ibm netapp sun --- ----- -- --- ------ --- m-3 m-2 + + - + <- enforcement started m-1 + + - + m-0 24/08 + + - + Present: jim hamrick (jh) [HP] fredy neeser (fn) [IBM] matt pearson (mp) [Sun] jay rosser (jr) [HP] Minutes to approve Email from Matt Pearson, subject "draft meeting minutes 2004/08/17", sent 8/17/04. Minutes approved. Action item review fn i had an AI to post to reflector and gather input on exposing local port number. just sent out a note before the meeting capturing two issues regarding the transport-independent interface (TII): - Inability to bind to a specific local port on the active side - Non-availability of a bind-like concept in IT-API and its impact on the passive side (as recently pointed out by Jay). (discussion on support of multihomed systems) jr burden on implementation to support a multihomed system. a socket listen potentially needs to listen on every interface. fn i believe a single socket listen should do, if a Òlisten socketÓ was bound to all local IP addresses of an IA or IA spigot. server binds Òlisten socketÓ to a local port and all local ip addresses (wildcard parameter for local ip address). (minuterÕs comment: not sure if i caught the entire discussion Ð issue seems to be whether it is possible to bind to all local ip addresses of an IA, rather than all local ip addresses of the host.) jh local ports are associated with a spigot. itapi should not preclude listening on multiple ip addresses of a spigot. jh concern is non-rdma traffic coming in on a listen point. might not be possible to distinguish rdma traffic and non-rdma traffic by server port. (minuterÕs comment: this concern seems to be related to the possibility that an rnic handling an rdma service could suddenly be out of resources, in which case it would be desirable to offer the same service through a soft implementation of iwarp on the host Ð see also jrÕs iscsi example below for the transport-dependent interface. please comment whether my interpretation is correct.) jh we have an ip address aggregation mechanism already. multiple ip addresses per port. jh if ip addresses are statically determined, should not be difficult for consumer. probably, consumer would use TDI (transport-dependent interface) in this case. jh dynamic address aggregation complicates consumerÕs life Ð harder to figure out the local ip addresses he wants to work with. jh a possibility is to use one spigot to represent all local ip addresses and, in addition, separate spigots representing one ip address per spigot. fn or a subset of ip addresses per spigot. jh spigots are essentially opaque objects that IA vendors are free to use in different ways. jh vendors may say separate spigots represent separate failure domains. jr does a spigot correspond to a single physical interface? jh yes. jr spigot most naturally corresponds to one physical port on your adapter. (discussion on ip-address binding resumes) jh typically transport stacks allow you to bind to one local ip address or to all local ip addresses. jh listening on a subset of ip addresses (not one ip address, not all ip addresses) requires itapi implementation to use multiple listens and multiple listen sockets under the covers. (we seemed to be uncertain if binding to all ip addresses indeed affects all interface adapters) (discussion on separation of rdma and non-rdma traffic continues) jr example of iscsi traffic Ð if an rnic has no resources left, we may have both rdma traffic and non-rdma traffic on the same local ip address. when rnic is out of resources, might resort to a soft implementation of iwarp on the host. iWARP CM issues jr connection management. lot of feedback right now, still needs to be digested. not quite ready to put on reflector. jr some issues are being discussed. one is control of markers. jr discusses three classes of devices proposed by fn. - rdmac legacy device - ietf device with marker/crc restrictions - ietf compliant device (minuterÕs comment: proposed device definitions will be put on reflector) (discussion on marker control Ð notes are spotty) jr ietf compliant device - may expose an mpa rx marker preference - may or may not allow controlling mpa rx markers jr use model brought up by jh. scenario is itapi implementation communicating with a non-itapi implementation. different wire protocol. jr another possible use model was brought up by jh. some ulp might be defined where conversion initiator sends last ulp message Ògoing into rdma modeÓ and immediately thereafter sends mpa request. would not be supported by our current conversion interface. two possibilities to solve this: - expose mpa request/reply to consumer (not so nice) - flag to allow switching of Conversion Initiator/Responder roles. not sure if we need to support this. jr TII model. ird/ord negotiation is required to achieve transport independence with itapi (ird/ord exposed to consumers). fn TII model. might ask ietf to add missing ird/ord to mpa request/reply and to increase mpa version number. avoiding the itapi-defined ird/ord header would improve interworking with non-itapi peers. fn there are a few remaining dissimilarities between TII and TDI. we do have symmetry regarding the roles of active and passive endpoint states. moreover, the transitions that are allowed out of each state are (mostly) independent of TII/TDI model. two differences remain: - TII has timeout transitions, TDI doesnÕt. - generation/parsing of mpa req/rep: depends on presence or non-presence of itapi ird/ord header jr/fn no timeouts in TDI state diagrams to reduce burden on implementation. moreover, event dispatchers support timeouts for waiting on events. timeouts in TII state diagrams cannot be avoided because they are part of InfiniBand. jr,jh,fn agreed that implementation can be expected to keep track of whether TII or TDI is used to move an endpoint to IT_EP_STATE_CONNECTED. jr mismatch between marker settings. ietf compliant device asks a remote ietf device with marker/crc limitations to turn off markers. remote device cannot follow this request Ð itÕs itapi implementation must send an mpa reply with the reject bit set. jh in case of a socket convert, there could be a mismatch with marker settings. if you ask to turn off mpa rx markers but your rnic doesnÕt support that, you should get an immediate error. jr terminology. mpa tx markers inserted into transmit stream. mpa rx markers used by the mpa receiver. fn mpa tx markers can be on or off, as requested by the peer. mpa rx Markers can be on or off, as requested by the local rnic (possibly overridden by itapi consumer). fn two flavors of marker control for an ietf compliant device. Modify-QP-to-RTS may support: marker control for both mpa tx and rx markers and ability to query mpa rx marker preference or marker control for mpa tx markers only, while mpa rx markers are fixed. MM requirements jr should we continue with MM requirements? mp continuing with CM is fine for this meeting. would like to use next meeting for MM. fn i was late sending feedback on MM detailed requirements to matt. he only received it today. agree itÕs better to use next meeting for MM. Next steps Continue review of detailed requirements as available, occupy spare time in telecons with errata review. jr will generate new CE detailed requirements. state of ladder diagrams and state diagrams? fn state diagrams pretty much up-to-date. there is one more transport dependency that we need to resolve, namely the iwarp requirement of posting the first rdma send on the rdma initiator and the corresponding requirement of first posting a receive dto on the rdma responder. jh In IB CM protocol, RTU may get lost. A three-way handshake is used, REQ, REP, RTU. Out-of-band control data. Unreliable datagrams are used, so RTU may get lost. On the passive side, receiving RTU or a first RDMA Send message is sufficient to move the QP to RTS state. Therefore, it is considered good practice to post the first RDMA Send DTO on the active side and to post an RDMA Receive DTO on the passive side prior to calling it_ep_accept. Applications adhering to this good practice already satisfy the additional iWARP requirements. fn states for tracking this would not be present for IB. jh having extra endpoint states to track this might introduce source compatibility problems. should describe this transport dependency in the manpages. jr AI Ð Capture handling of iwarp transport dependency regarding the first rdma send message and the matching rdma receive dto in a requirement / input for manpages. Any other business Meeting adjourned on time.