ICSC ITWG Meeting Minutes, 9/28/04
Taking minutes: HP (FW)
y HP Jim Hamrick (JH)
Y HP Jay Rosser (JR)
Y HP Fred Worley (FW)
Y IBM Fredy Neeser (FN)
n NetApp Arkady Kanevksy (AK)
n Sun Matt Pearson (MP)
cascading ascii art attendance diagram. (if you have more than 1 minus
visible, you are not eligible to vote.)
hp ibm
netapp sun
---- ----- ------ ---
m-3 +
+ - +
m-2 +
+ - +
m-1 +
+ + -
m-0 +
+ - -
Next Meeting: Tuesday 10/5/04
Minutes:
1) Approval of minutes Minutes to approve:
a) ICSC ITWG draft minutes, 8/31/04, sent by Fred Worley
Clarification on need for votable on naming conventions for link vs bind
- FN understood this issue to be resolved
- Proposal: "link" replaces "bind"
- it_rmr_bind can be used if the appropriate compile time flag is set
- it_rmr_bind maps to it_rmr_link if address type is set to virtual
- allows backward compatibility
- no objections to this proposal
Minutes Approved
b) ICSC draft meeting minutes 2004/09/14, sent by Matt Pearson
Minutes Approved
2) Action item review:
2.1) FN - create additional text for Global Behaviors section to clarify what an asynchronous call is in the IT-API.
Pending
2.2) FN to revise proposal so that version 2.0 Consumers will need explicitly to cause the 2.0 behavior to be manifested and version 1.0 consumers need do nothing.
Proposal::
- new compile time flag: ITAPI_ENABLE_V2_BINDINGS
- default setting would be v1 behavior (flag not set)
- consumers who want to use v2 function bindings (call parameters) must #define this flag before including itapi.h
Documentation of memory management calls
- Discussed the version of each call that should be documented in the v2 documentation
- agreed that v2 documentation should document v2 call bindings (new parameter lists)
- documentation can mention how to invoke v1 functionality
- documentation will mention how to ensure v2 functionality (matching documentation)
- Discussion of logistics / implementation for documentation:
- discussed creation of global v1/v2 behavior section that describes v2 flag / enablement in detail
- further description in implementation guide
- open question of whether/how to address backward compatibility on man page (or in global section, e.g.) - see suggestions below
- open question of how to reference / link / provide v1 docs
- discussed not maintaining v1 documentation as part of v2 documentation
- discussed including links to v1 document as part of v2 document (but maintaining the two as separate documents, not one superseding document)
Suggestion for how to annotate man pages, from FN:
BACKWARDS COMPATIBILITY
The IT-API v1.0 call it_rmr_create1 assumes a Wide RMR and does not work on iWARP transports, which only support Narrow RMRs. it_rmr_create (v2.0) called with IT_RMR_TYPE_DEFAULT behaves differently than it_rmr_create1.
Suggest "see also , "
- agreement
Suggestion to document as if the ITAPI_ENABLE_V2_BINDINGS compile flag is set
- agreement
Does the man page every say "it_rmr_create2" or always "it_rmr_create"?
- FN: shouldn't clutter docs with the numbered versions
- JR: any one of these man pages with v1 and v2 component, for the v2 spec the man page should specify that you must #define the ITAPI_ENABLE_V2_BINDINGS flag to get it to function as described
Suggestion that document should refer to v1 and v2 versions as, e.g., "it_rmr_create (v1.0)" and "it_rmr_create (v2.0)"
- agreed
ACTION: Create examples for review at the next meeting
OWNER: FN
2.3) JR to send revised version in PDF to reflector by Wednesday for
ITWG to comment upon with intent to generate version to share with RNIC-PI WG ASAP.
Reviewed internally by HP and IBM
Sent to RNICPI group by FN
2.4) MM group to create text describe potential hazard of use of stale
STags with fast re-registration use model.
Pending
2.5) JR - find out if PBL as defined in IB Verb Extensions / iWARP has
differing element sizes and if we need or want to support such.
See also discussion on PBL type below
Verbs Extensions Anex, released (publicly - OK for these minutes) 18 June 2004 by SWG of IBTA
- description of a physical buffer list in "http://www.infinibandta.org/specs/register/publicspec/Verbs_Ext_Annex.pdf", p93, l21-22
[Ed. Note: The URL above will only work if your company is registered for downloads from the IBTA web page. If you are unable to access this link, start here: http://www.infinibandta.org/specs/register/. IBTA members can also access this document through the members area]
- text is somewhat confusing, but phys buffer list will either contain the phys addr of each buffer or, in case of different page sizes per page, will contain a page size
- IB - page sizes must be same for block type, can be different for page type (and HCA supports multiple page sizes per registration)
- iWARP - all page sizes must be the same whether they are block or page
- FN: believes we already have the tools to provide support for this
2.6) FN - check if MM-4.0.D4.6 (now MM-4.0.D4.4) is accurate. Currently
it_pbl_t has no pbl_type member.
See also discussion on PBL type below
Closed - added type member
ACTION: Resolve what checking (pbl type checking) needs to be done depending on the supported PBL types and update MM-4.0.D4.4 accordingly (specifically, there are some valid cases where consumer can specify a different behavior than the current IA setting that would not constitue an error (IA in block mode, consumer in page mode; block mode is strict superset of page mode)
OWNER: FN
2.7) FN - eliminate MM-4.0.D4.3. Consumer must do their own checks for invalid virtual addresses
Zero can be a valid virtual address on some architectures: There are currently 41 references to NULL in ITAPI 1 (rmr_context pointer etc.). How do we address these?
Should be resolvable for the new MM calls
Could be more serious problem for the legacy calls, however
- 40 references to NULL in ITAPI v1 spec
- need to review on a case-by-case basis
- many are pointers for output; using NULL to indicate that we can not use this output
3) MM Detailed Requirements (v095a sent to Reflector on 09/24)
3.1) LMR Link - two ways to register a memory region
new requirement to describe how memory region can be registered
2 ways:
- 4.0.d1.2.1
- wording is OK (does not prohibit 0-based addresses)
- IT_ADDRBASE_VADDR flag
- 4.0.d3 it_rmr_link()
3.2) LMR Create - four ways to register a memory region
6.2.D1.2 - create and register a memory region in 4 ways:
6.2.D1.2.1 and 2.2 are the same as work request it_lmr_link
6.2.D1.2.3 and 2.4 are the same as previous 4.0.d1
Discussion:
The two new cases:
6.2.D1.2.1: valid address, virtual addressing
- ITAPI v1 semantics
6.2.D1.2.2: valid address, can still request 0-based addressing
Q: would you expect user-space consumers to use zero-based addressing?
- Yes; then you don't have to tell the peer the virtual address
Q: Why should it not be possible to create an LRM directly?
it_lmr_link allows you to register a memory region by providing a PBL; why shouldn't one be able to do the same when you just create an LMR? You can now pass a PBL at create time and achieve the same effect
Q: would this be done under the covers as a fast register?
- No: normal register non-shared memory region verb (has a PBL as one of its arguments)
- need an endpoint to do fast register; don't need an endpoint to use a PBL
What to document as the use model for it_lmr_create
- providing these 4 use models will simplify the use of this call
2nd set of use models (6.2.D1.2.2) provides for case with valid address and zero based addressing
- underscores the fact that you can not see from the address_base argument whether or not you expect the address to be valid
3.3) LMR Create Unlinked
3.4) PBL types (See MM-4.0.D2 and MM-8.3)
Provides PBL type and element length, but does not currently allow multiple page sizes per page list (as supported by IB v2, see above)
- could potentially be done by overloading BLT types
- e.g. provide a union structure discriminated by the PBL type
Additional discussion beyond page list
- iWARP does not provide concept of PBL type for fast register or non-shared memory (no verbs support for PBL type there)
- does support PBL mode (page list or block list mode)
- assumption is if you need block lists, you would enable that mode on the IA
- Issues:
- if consumer has only page lists, could have lower performance or less robustness if IA is in block list mode
- with verbs, can not guarantee that everyone can open his own IA in the mode he likes
- consumer that opens the RNIC sets the mode
- Discussion: in theory, one could open the RNIC in several modes as logical RNICS
- disagreement - verbs specify that once an RNIC is open, can not be opened again until closed
- discussion - there is an error that an implementation can return, but is the implementation *required* by the verbs to return this error? Could the implementation simply return a different virtual handle in response to subsequent open request?
- Quote from verbs extensions: "HBA can be opened in either block or page mode, but not both modes concurrently"
- this refers to the physical HBA only (specs all refer only to physical HBA - no concept of logical HBA)
- for RDMAC verbs, two interpretations may be possible
- Note: closing the RNIC resets the RNIC and reallocates the resources
- section 5.1.4, p29 line 37, draft-hilland-iwarp-verbs-v1.0
- same text in IB v1.1 spec
- Appears that these sections regarding resets clears up ambiguity elsewhere in the document about the ability to open an adapter more than once
- expect RNIC vendor community to produce cards that can not function in both modes simultaneously
- FN: would like to allow for the possibility of an implementation that does not have this limitation
- Discussion: why specify PBL type:
- in page list mode, addresses have to be page aligned (by definition)
- there is an error for misalignment of address lists
- in block list mode, check is not performed
- if you have an RNIC that can be used in both modes and you open it in block mode, then the alignment check will never be performed
- less robustness for those consumers that can use page list mode
- block list is a superset of page lists
- perfectly legal to be in block list mode and pass a page list to the fast register verb
- however, alignment check will not be done in block list mode
- additional application: by including PBL type in PBL data structure, could enable vendors to do alignment check underneath
- potential problem: assume consumer expected to use page lists ; develop app on platform that supports / turns on block lists ; took app to system that supports page lists; app breaks
- additional flexibility allows detection of page boundary crossings (block that crosses a page boundary)
- is this a debugging tool only?
- if consumer knows that he will only use page lists, then makes sense to let verbs provider know about this to make registration easier
- could leave this as an ITAPI configuration issue - may not be necessary to address through API
- what is the use model?
- storage consumer would use block mode
- page list consumer?
- specs are in place so you don't have to "switch hit" in your hardware
- if your hardware does "switch hit" (page and block at the same time), could, e.g., turn off page lists if that gives better performance
- FN: believes that not all consumers on a host would want to use the same PBL type
- some lack of clarity remains in the block list specification in the RDMAC
- note that block list mode is optional
- JH: suspect that IHVs will provide block lists as this may be straightforward to implement
Potential issues for supporting page and block lists in ITAPI implementation:
- ex: IPC client and iSER client on the same RNIC
- concern: if a page-list specific check is performed like "address must start on page boundary", then if we were forced to put the RNIC into block mode, then the hardware would not be able to do this check for us; would not want to force the software implementation to do this check in software on hardware that does not support it
- FN: agree - there is already an alignment error defined for page lists
- JH: Clarification: verbs specification; completion error, not immediate; expected to be done by HW; difficult to do in SW
- p213 of RDMAC verbs, top left: Invalid physical buffer list entry - for page mode, entry must start on page sized boundaries
- therefore completion error / checking will be done in page list mode but not in block list mode
- 4.0.D3.7 documents completion error for this
If underlying device is in block list mode and you pass in a page list that is not page aligned, it would be difficult for the software to provide this check
- JH: unless you provide weaseling such as "if the implementation can detect this error, this is the error it will return"
- JR: mandatory to run in page list mode; block list mode is optional
- therefore could make argument that for debugging you could put the device in page mode
- trying to create space in the API to support potential future IHV devices that can support both page and block lists
- would you really want to mix both page and block list modes on the same IA?
- see requirement 8.3.D1
MM-8.3.D1 it_pbl_types_t shall be defined as a set of flags as follows:
Support in
Case pbl_types_t iWARP IB
----------------------------------------------------------------------
1 0 No Only for IB 1.1
2 IT_PBL_PAGE_LIST Yes IBVE
3 IT_PBL_BLOCK_LIST Option Option(IBVE:BMM)
4 IT_PBL_PAGE_LIST|IT_PBL_BLOCK_LIST Propr. Propr.
Extension Extension
For IB, if you have verb extensions, you can have page lists (case 2 and 3)
can have no page list mode for IB if you don't have verb extensions (case 1)
case 4 is mix-and-match mode; if both flags are set, RNIC would have capability to distinguish
Q: you want to use your RNIC for storage and IPC
- from app perspective, IPC only works in virtual addresses; has no concept of BPLs; ITAPI imp. handles this for consumer
- block storage on same piece of hardware
- if iSER driver put it in block mode, will we deny cluster communications applications from using the RNIC?
- implementation can do the mappings from block lists, but doesn't have the alignment checks that the device would provide in page mode
Agreed, however:
- do you want to have support for both page and block lists on the same IA?
- if you want to have all your completions on the same EVD, you could not do this with that model
- single application opening a single IA that wants pages (virtual addressing) and block lists at the same time is a conceivable use model
What is the underlying RNIC?
- if underlying RNIC does not allow you to open the RNIC more than once as a logical RNIC, then you can not do this (using multiple IAs on a single physical RNIC)
JH: In PBL itself, specify page lists or block lists
- also, can we instead encode information in the IA associated with the PBL (either fast register operation or lmr_create)
- specify directly for lmr_create and indirectly through the endpoint for fast-register
- could specify for a given IA that it uses page lists or block lists but not both
- problem: completions for fast register operations done for block operations and page operations could not go to a common EVD (unless IA can support both modes)
Accepted that the PBL having a type makes sense
- FN: might enable future extensions where we can support variable sized page list
- would be just another PBL type, added to the PBL types set of flags
Variable page size support:
- not defined for RDMAC verbs
- IB VE do define variable page size support
- harmless, although it busies the interface for iWARP devices (don't have to use it if you don't need it)
- does it make sense to allow variable size page lists?
- agreement that variable sized page lists makes sense to include
- concern expressed that proposal of an additional type may not provide sufficient information about the address list
ACTION: Create and distributed proposal for support of multiple page sizes in address list
OWNER: FN
Purpose of discussion is to provide motivation for inclusion of the PBL type
Conceded that future IHVs devices might support both block and page mode
3.5) Handling of Privileged and Non-Privileged QPs
Security concerns
Reasons for a Privileged IT-API Consumer to selectively
enable/disable Privileged Mode on Endpoints (or their
underlying QPs)?
Think about for next meeting
3.6) LMR Unlink (Local Invalidation of Memory Resources)
4) Man page generation
Feedback on draft CM man pages
Discuss any difficulties
Proposed incorporation of JH comments into the man pages draft
Question on implementors guide:
- agreement to move certain things to implementors guide
- what about it_socket_convert? (if all iWARP related implementation specifics are in one chapter that might be most intuitive for consumer)
- JH: just want internal stuff separated from consumer visible stuff
5) Next steps
Focus on man page generation, occupy spare time in telecons
with errata review, next round of detailed requirements to be
prioritized.
Any other business
Meeting adjourned 12:10pm PDT