This document is a first attempt at describing how to write a NAL
for the Portals 3 library.  It also defines the library architecture
and the abstraction of protection domains.


First, an overview of the architecture:

    Application

----|----+--------
         |
   API  === NAL        (User space)
         |   
---------+---|-----
         |    
   LIB  === NAL        (Library space)
         |
---------+---|-----
          
    Physical wire      (NIC space)
          

Application
    API
API-side NAL
------------
LIB-side NAL
    LIB
LIB-side NAL
   wire

Communication is through the indicated paths via well defined
interfaces.  The API and LIB portions are written to be portable
across platforms and do not depend on the network interface.

Communcation between the application and the API code is
defined in the Portals 3 API specification.  This is the
user-visible portion of the interface and should be the most
stable.



API-side NAL:
------------

The user space NAL needs to implement only a few functions
that are stored in a nal_t data structure and called by the
API-side library:

	int forward( nal_t *nal,
		int	index,
		void	*args,
		size_t	arg_len,
		void	*ret,
		size_t	ret_len
	);

Most of the data structures in the portals library are held in
the LIB section of the code, so it is necessary to forward API
calls across the protection domain to the library.  This is
handled by the NAL's forward method.  Once the argument and return
blocks are on the remote side the NAL should call lib_dispatch()
to invoke the appropriate API function.

	int validate( nal_t *nal,
		void	*base,
		size_t	extent,
		void	**trans_base,
		void	**trans_data
	);

The validate method provides a means for the NAL to prevalidate
and possibly pretranslate user addresses into a form suitable
for fast use by the network card or kernel module.  The trans_base
pointer will be used by the library everytime it needs to
refer to the block of memory.  The trans_data result is a
cookie that will be handed to the NAL along with the trans_base.

The library never performs calculations on the trans_base value;
it only computes offsets that are then handed to the NAL.


	int shutdown( nal_t *nal, int interface );

Brings down the network interface.  The remote NAL side should
call lib_fini() to bring down the library side of the network.

	void yield( nal_t *nal );

This allows the user application to gracefully give up the processor
while busy waiting.  Performance critical applications may not
want to take the time to call this function, so it should be an
option to the PtlEQWait call.  Right now it is not implemented as such.

Lastly, the NAL must implement a function named PTL_IFACE_*, where
* is the name of the NAL such as PTL_IFACE_IP or PTL_IFACE_MYR.
This initialization function is to set up communication with the
library-side NAL, which should call lib_init() to bring up the
network interface.



LIB-side NAL:
------------

On the library-side, the NAL has much more responsibility.  It
is responsible for calling lib_dispatch() on behalf of the user,
it is also responsible for bringing packets off the wire and
pushing bits out.  As on the user side, the methods are stored
in a nal_cb_t structure that is defined on a per network
interface basis.

The calls to lib_dispatch() need to be examined.  The prototype:

	void	lib_dispatch(
			nal_cb_t		*nal,
			void			*private,
			int			index,
			void			*arg_block,
			void			*ret_block
	);

has two complications.  The private field is a NAL-specific
value that will be passed to any callbacks produced as a result
of this API call.  Kernel module implementations may use this
for task structures, or perhaps network card data.  It is ignored
by the library.

Secondly, the arg_block and ret_block must be in the same protection
domain as the library.  The NAL's two halves must communicate the
sizes and perform the copies.  After the call, the buffer pointed
to by ret_block will be filled in and should be copied back to
the user space.  How this is to be done is NAL specific.

	int lib_parse(
			nal_cb_t		*nal,
			ptl_hdr_t		*hdr,
			void			*private
	);

This is the only other entry point into the library from the NAL.
When the NAL detects an incoming message on the wire it should read
sizeof(ptl_hdr_t) bytes and pass a pointer to the header to
lib_parse().  It may set private to be anything that it needs to
tie the incoming message to callbacks that are made as a result
of this event.

The method calls are:

	int	(*send)(
			nal_cb_t		*nal,
			void			*private,
			lib_msg_t		*cookie,
			ptl_hdr_t		*hdr,
			int			nid,
			int			pid,
			int			gid,
			int			rid,
			user_ptr		trans_base,
			user_ptr		trans_data,
			size_t			offset,
			size_t			len
	);

This is a tricky function -- it must support async output
of messages as well as properly syncronized event log writing.
The private field is the same that was passed into lib_dispatch()
or lib_parse() and may be used to tie this call to the event
that initiated the entry to the library.

The cookie is a pointer to a library private value that must
be passed to lib_finalize() once the message has been completely
sent.  It should not be examined by the NAL for any meaning.

The four ID fields are passed in, although some implementations
may not use all of them.

The single base pointer has been replaced with the translated
address that the API NAL generated in the api_nal->validate()
call.  The trans_data is unchanged and the offset is in bytes.


	int	(*recv)(
			nal_cb_t		*nal,
			void			*private,
			lib_msg_t		*cookie,
			user_ptr		trans_base,
			user_ptr		trans_data,
			size_t			offset,
			size_t			mlen,
			size_t			rlen
	);

This callback will only be called in response to lib_parse().
The cookie, trans_addr and trans_data  are as discussed in send().
The NAL should read mlen bytes from the wire, deposit them into
trans_base + offset and then discard (rlen - mlen) bytes.
Once the entire message has been received the NAL should call
lib_finalize() with the lib_msg_t *cookie.

The special arguments of base=NULL, data=NULL, offset=0, mlen=0, rlen=0
is used to indicate that the NAL should clean up the wire.  This could
be implemented as a blocking call, although having it return as quickly
as possible is desirable.

	int	(*write)(
			nal_cb_t		*nal,
			void			*private,
			user_ptr		trans_addr,
			user_ptr		trans_data,
			size_t			offset,

			void			*src_addr,
			size_t			len
	);

This is essentially a cross-protection domain memcpy().  The user address
has been pretranslated by the api_nal->translate() call.

	void	*(*malloc)(
			nal_cb_t		*nal,
			size_t			len
	);

	void	(*free)(
			nal_cb_t		*nal,
			void			*buf
	);

Since the NAL may be in a non-standard hosted environment it can
not call malloc().  This allows the library side NAL to implement
the system specific malloc().  In the current reference implementation
the libary only calls nal->malloc() when the network interface is
initialized and then calls free when it is brought down.  The library
maintains its own pool of objects for allocation so only one call to
malloc is made per object type.

	void	(*invalidate)(
			nal_cb_t		*nal,
			user_ptr		trans_base,
			user_ptr		trans_data,
			size_t			extent
	);

User addresses are validated/translated at the user-level API NAL
method, which is likely to push them to this level.  Meanwhile,
the library NAL will be notified when the library no longer
needs the buffer.  Overlapped buffers are not detected by the
library, so the NAL should ref count each page involved.

Unfortunately we have a few bugs when the invalidate method is
called.  It is still in progress...

	void	(*printf)(
			nal_cb_t		*nal,
			const char		*fmt,
			...
	);

As with malloc(), the library does not have any way to do printf
or printk.  It is not necessary for the NAL to implement the this
call, although it will make debugging difficult.

	void	(*cli)(
			nal_cb_t		*nal,
			unsigned long		*flags
	);

	void	(*sti)(
			nal_cb_t		*nal,
			unsigned long		*flags
	);

These are used by the library to mark critical sections.

	int	(*gidrid2nidpid)(
			nal_cb_t		*nal,
			ptl_id_t		gid,
			ptl_id_t		rid,
			ptl_id_t		*nid,
			ptl_id_t		*pid
	);


	int	(*nidpid2gidrid)(
			nal_cb_t		*nal,
			ptl_id_t		nid,
			ptl_id_t		pid,
			ptl_id_t		*gid,
			ptl_id_t		*rid
	);

Rolf added these.  I haven't looked at how they have to work yet.
