Resent-From: icsc-nativewg@opengroup.org From: Matthew Pearson Date: July 20, 2004 4:51:28 PM EDT Resent-To: icsc-nativewg@opengroup.org To: icsc-nativewg@opengroup.org Subject: ICSC ITWG minutes 2004/07/14 DRAFT present: Jim Hamrick jh (hp), Matt Pearson mp (sun), Jay Rosser jr (hp), dt Dick Treumann (dt) • Agenda bashing, approve minutes ◦ Minutes to approve: ▪ Email from Matt Pearson, subject "minutes 6/23 DRAFT", sent 6/25/04 12:52PM PT ▪ Email from Fred Worley, subject "ICSC ITWG second draft minutes,  6/30/04", sent 7/6/04 5:36PM PT so approved. • Action item review ◦ JR - follow up with Caitlin still open. ◦ MP - Send ballot to vote on adoption of new version of MM-12.0 by Monday, July 5th, have call for vote by Tuesday, 7/13/03 mp totally dropped the ball on this, i'll add 7 to the dates and do it right this time. ◦ JR - issue proposal for possible high level requirements on IETF vs RDMAC transport-dependent connection API characteristics jr in progress. • SRQ man pages (discussed last week) jr plans to release a new draft? jh not unless i receive additional feedback. jr from a release management perspective should we try to get closure on this? i realize our original plan was to vote as a whole, but makes little sense to have this sit on the shelf for a while. jh would be nice to have people look at it intently right now, but probably only an impending deadline will get people to look it over. think all i need to do right now is drop my lock on the files in question; when we get all the features of the man pages lined up we can rev up a review. jh so, what does it mean to drop a lock, who has the most recent man pages, etc. jr should have an editor. anyone want to be an editor? [ silence ] jr i will be the editor. when you drop the lock i will take ownership of the files. ◦ Email from Jim Hamrick, subject "First official draft of the S-RQ man pages, and lock notification", sent 6/30/04 3:34PM PT ◦ Email from Jim Hamrick, subject "Review feedback on it_srq_create()", sent 7/16/04 11:31AM PT jr we can discuss this if people have had time to prepare? mp i have not read this. jr table it for now. • iWARP CM detailed requirements ◦ Email thread started by Fredy Neeser, subject "iWARP CM: Unifying the two conversion schemes", sent 6/30/04 9:24AM PT ◦ Email thread started by Jay Rosser, subject "draft CM state diagrams", sent 6/30/04 5:56PM PT ◦ Email thread started by Jay Rosser, subject "draft high-level votables (RDMAC/IETF capabilities)", sent 7/12/04 6:04PM PT jr number of threads on this topic. i have not had enough time to pay proper attention. can discuss high level requirements (last message) though. so we can solidify what we need to do. three topics. first, 7.0 says we shall support TI access to both iWARP types of device, with sub-bullets regarding markers. jr email discussion on this has the unspoken requirement that we will use the MPA req/rep framing and IRD/ORD ULP that we call the ITC EP. perhaps that needs to be spelled out explicitly. jr this should make it clear there is a single API regardless of the rnic under the covers. jr next is 7.1. this describes the TD reqs for RDMAC devices. this indicates whether we shall support the sub bullets. 1.1 captures whether or not in the TD api we will allow a consumer to use an RDMAC rnic without causing req/rep frames across the wire (raw mode). 7.1.1.1 we would also offer the api implementation to create that framing if the user requested it - "fake" these frames in streaming mode before switching to rdma mode. belief is that this is how ietf would work - mostly in sw, hw would not likely generate these frames. jr next is IETF support in 7.2. not much to say here because req/rep frames are assumed. jr next is 7.3 exposing markers, forbidding defeating markers on an rdmac device. jr fredy talks about various flavors of these; i am concerned that he is talking about an earlier version of mpa that does not correspond to the release version we will use. he refers to the difference between ietf mpa and iwarp mpa, and i think the latter refers to an early draft from the rdma consortium that does not allow framing or markers to be changed. need to respond to fredy on this. jr state diagram emails. fredy sent out another edition. not sure what version of iwarp we are discussing. believe he is using at the plain rdmac version but need to investigate further. not prepared to walk through this. could walk through the cm state diagrams that i sent out. [ these notes are spotty because your editor does not fully understand the issues... ] jr first state diagram covers rdmac active side. initial state QUIESCED refers to the state of the socket that is input to this. these diagrams are only for TD api which converts an existing tcp/ip socket into an ep. requirement for entry is that the socket be quiesced with no traffic. means by which this is done is outside api. only one event occurs here - socket conversion api takes the socket and ep, moves the ep into the ACTIVE_PENDING state. moves the qp into RTS state and as a side effect sends the last streaming message as part of the consumer ULP. jr first FPDU received moves the ep to the connected state. the rest of these concern error recovery, closing and cleanup. jr rdmac passive side - convert moves the passive side into RTS, does not send a streaming message, then generates a connected endpoint. mp question about timing, jr answers that the race condition is closed because both sides have negotiated with ulp when to begin bootstrapping to rdma mode. jr ietf active side again starts with quiesced socket. only action is convert(). bring it to ACTIVE_PENDING1 state. note this is confusing - not corresponding active_pending1 in existing api. active pending 1 state sends a mpa req frame as part of transition. when the rep is received the underlying ep is in the connected state, but we call it active_pending2. init state in both these. when first message sent, we first move ep to RTS - no streaming mode message - and go to connected state jr ietf passive - again start with quiesced. call listen(). move to MPA_LISTEN state. when a req is received, we generate the req event and move to the MPA_ACCEPT state. (this is a new event type.) normal thing to do is call convert to accept the event and move to passive pending, which as a side effect sends the last streaming mode. when the first message is received, move to connected, generate an event. • MM detailed requirements mp goes over requirements ◦ Email from Matt Pearson, subject "MM requirements draft 2", sent 7/2/04 2:53PM PT thanks to Jay Rosser for taking these notes while I was talking: ◦ Overview ▪ Had agreed to release MM in two phases in phase 2 ▪ Narrow window stuff in first draft ▪ Section 9 covers binding narrow memory windows ▪ Save nitty gritty stuff for second draft of phase 2 ▪ Includes privileged stuff ◦ MM wants to cover the stuff required for the first draft ▪ Fredy and Matt have been marking up a Word doc and figure they'll be ready for review in a couple of weeks • Matt asks folks to look at MM requirements draft 2 ◦ Matt discusses the concept of "linked" and "unlinked" LMRs and RMRs ▪ Link LMR is validate ▪ Unlink RMR is equivalent to it_rmr_unbind ◦ Matt discusses the new API proposed "it_post_rdma_read_to_rmr" ▪ Handles RMR as a local sink for RDMA read ◦ Matt discusses new "it_lmr_link()" call ▪ Intended for fast registration ◦ Matt discusses "it_rmr_link" ▪ Some of the functions will be expanded with new signatures for new capabilities ▪ Old calls (e.g. it_rmr_bind) will be retained with IT API 1.0 signatures to preserve backward compatibility ◦ For first draft, functions will have signatures that will allow support of all possible uses ◦ Remote invalidations appear to be tricky ▪ Too much work to get into the first draft of the IT API phase 2 ▪ Matt asks for any objections - none heard ◦ Matt skips over MM-4 and MM-5 since these will not make it into the first draft ◦ MM-6.0 ▪ Matt covers the new requirements to support zero-based addressing ▪ Matt suggests that transport-independent programmers would use it_lmr_create2() with IT_ADDRBASE_VADDR ▪ JR asks if such a Consumer would instead use it_lmr_create() unchanged? ▪ MP suggests the rationale is to keep the programming models similar to use of RMRs ▪ Must use it_rmr_create2() to get transport-independent windows (i.e. specify don't care kind of window (IT_RMR_TYPE_DEFAULT) and the API will do the right thing - it_rmr_create() will not) ◦ MM-9.0 ▪ Big tweak made to add the it_rmr_create2() call ▪ MM-9.3.D2 ▪ MP notes that this is a subtle requirement imposed by the verbs ▪ Permissions may be on a region that window can exceed ▪ Okay ▪ Permissions may also be on an EP (e.g. RDMA Rd Enable) that are more restrictive than on a Window bound to EP ▪ First time a write is done on Window as above, will fail due to EP permissions ▪ MP notes that these issues are under local control and should not cause problems. ◦ MM-12.0 ▪ MP notes that we haven't voted the high level requirement but was willing to bet that we would vote with him on this! ▪ This is the source of the it_post_rdma_read_to_rmr() new API. ◦ MM-13.0 ▪ New fencing for local fence for local invalidation ▪ New fence flag, IT_LOCAL_FENCE_FLAG ▪ Very tall fence - everything preceding finishes before fence and nothing submitted after will start if WR has the fence flag is set • Next steps ◦ Continue review of detailed requirements as available, occupy spare time in telecons with errata review. errata. left off on number 12. 12/16/SUN4 agree with suggested fix. mm group to fix. 13/21/SUN9 propose adding global behavior to with similar weasel words to what happens when a bad/invalid/null params address is passed to a query function. jr to fix. 14/22/SUN10 jh original intent was to indicate you attempted to do a conversion that no matter what source address was supplied, it would not work. but what it says is wider - could mean "in this particular case this address did not have a translation. do we want to indicate a difference? it's analogous to a DNS lookup where you can't contact a server vs. you have a bad name... jh was this a perceived problem or a real problem? mp it was me wearing my "what if hat." jh this is wrong philosophically but should we at this point in time be making this kind of a change - should we make a change in the code? mp could just change text to indicate two kinds of errors can generate this error code - one, you can never do that conversion, two, this particular conversion didn't pan out. wouldn't break things, and since users haven't pressed for a fix here we don't need to run the risk of breaking b/w compat. jr okay with that jh to fix. 15/43/OG23 jh concern with source compat. not bin. compat. so no objection to using a macro, really. but, what would someone do with this information if they had it? am willing to make a change that doesn't change anything. jr hazards of macros incumbent of implementation. sometimes those hazards could be exposed to users. jh only visible issue is taking pointers to functions? can't do that with macros. mp not sure if that's a common use model... jh issue of philosophical purity. efficiency v. portability. would annoy users who assumed function pointers on one system and found macros on another - wouldn't work. mp not sure if i want to work hard for a consumer doing something so tricky. jh besides they could encapsulate the macro in a function if they really wanted to do this. jh impl's free to implement as macros or functions and we do not legislate what a lib call is. jr in general if you look at the api it looks unlikely you'd use any of them as a macro? jh sync routines possibly. jr get/set context? everything else seems rich. jh think potential for breaking source compatibility (consumer assumes functions) is low. don't want to see make_rdma_addr() forced to be a function call... jr could id possible macro calls and mark specific pages. mp or mention it once in global behavior. jr means if you're going to do something exotic with these calls you've got to be careful. or more likely you won't do anything exotic. jh global behavior indicates calls might be macros, and possibly advice to implementors that it's ok to use macros. potentially consumer visible so we should document it outside the implementation guide. consumers should not have to read that to use api. jr to fix 16/46/SUN12. real nit. doesn't say what happens to assoc. rmr context. change "On return, the Handle lmr_handle may no longer be used" to "On return, the lmr_handle and associated rmr_context, if any, may no longer be used". jr see line number 3710. jh this is not right. mp change "may" to "will" for both. 17/47/SUN13 accept change with minor edits 18/48/SUN14 skip so mp can do some research 19/50/HP10 jh conflict in the two sentences here mp think second sentence is confusing, implies on some impls this works and on others it doesn't, but it's really an issue of scheduling... 20/52/HP12 21/53/HP13 bad handles jr language makes too strong a guarantee here. volunteer to fix both. ai mp to review SUN14 and SUN16. HP10 tabled. • Any other business adjourned at 5:58 pm EDT.