<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:37:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3840] llapi_layout API design discussion</title>
                <link>https://jira.whamcloud.com/browse/LU-3840</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Extensions to liblustreapi have been posted for review here. &lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/5302&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5302&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The goal of the API extensions is to provided a user-friendly interface for interacting with the layout of files in Lustre filesystems, and to hide the wire-protocol details behind an opaque data type.&lt;/p&gt;

&lt;p&gt;This issue will provide a forum for potentials users and developers to discuss and critique the design of the proposed API extensions.&lt;/p&gt;
</description>
                <environment></environment>
        <key id="20631">LU-3840</key>
            <summary>llapi_layout API design discussion</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="6" iconUrl="https://jira.whamcloud.com/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="nedbass">Ned Bass</assignee>
                                    <reporter username="nedbass">Ned Bass</reporter>
                        <labels>
                    </labels>
                <created>Tue, 27 Aug 2013 17:07:37 +0000</created>
                <updated>Tue, 3 Feb 2015 19:14:46 +0000</updated>
                            <resolved>Tue, 3 Feb 2015 19:14:46 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="65185" author="afn" created="Tue, 27 Aug 2013 18:32:47 +0000"  >
&lt;p&gt;Hello folks,&lt;/p&gt;

&lt;p&gt;I am Andy Nelson. I am a user of the current Lustre API and expect&lt;br/&gt;
to be a user of the proposed new API in the future. Probably more&lt;br/&gt;
than anyone else, my complaints about the previous version are the&lt;br/&gt;
reason it is getting attention at all &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt; In that context, I want to&lt;br/&gt;
make sure it is done right, so as to meet my needs.&lt;/p&gt;

&lt;p&gt;As a bit more background, I maintain the IO capabilities in a large&lt;br/&gt;
code project at Los Alamos National Lab. This code has to run portably&lt;br/&gt;
(and for the most part, does so) across many very large systems, on&lt;br/&gt;
top of Lustre, Panasas, GPFS and anything else we happen to get handed&lt;br/&gt;
when a new machine is procured or we go someplace else. As an indicator&lt;br/&gt;
of scale, we have various tracking profiling in the codes we manage, which&lt;br/&gt;
indicate that our users typically consume around 500-1500 CPU centuries per&lt;br/&gt;
week and do IO at a sustained rate (i.e. averaged over all jobs during a&lt;br/&gt;
week or some such) of 500MB-a few GB/s, all the time. Specific jobs get&lt;br/&gt;
IO rates up to 32GB/s or so, to single files on Lustre 1.8, where we&lt;br/&gt;
are rate limited because 1.8 supports striping across only 160 OSTs.&lt;br/&gt;
I&apos;m working with the LLNL folk to implement the new API on our codes&lt;br/&gt;
as they get ported to sequoia, and I&apos;m hoping we can get much higher&lt;br/&gt;
rates there. Early on, Chris Morrone did a bit of profiling using&lt;br/&gt;
a synthetic, and as I recall guessed that we could get up to perhaps&lt;br/&gt;
200GB/s to single files there, once various bugs/limitations/etc were&lt;br/&gt;
worked out. We&apos;ll see.&lt;/p&gt;

&lt;p&gt;Anyway, before the new API gets implemented, I want to voice my input,&lt;br/&gt;
so that it can be made to work for me in its deployed form. I had been&lt;br/&gt;
providing input to LLNL bug tracking tickets, but I think the stuff&lt;br/&gt;
below would benefit a broader audience and so am pushing it here as well.&lt;br/&gt;
As new AP draft versions come out, I&apos;m guessing I&apos;ll have more comments&lt;br/&gt;
and inputs, but what you see below is what I&apos;m able to do for this revision.&lt;br/&gt;
The punchline is: &quot;Not ready yet, needs more work&quot;&lt;/p&gt;

&lt;p&gt;Andy Nelson&lt;/p&gt;



&lt;p&gt;On the API itself&lt;br/&gt;
=================&lt;/p&gt;

&lt;p&gt;I have seen the traffic on various linux lists about the merging&lt;br/&gt;
of Lustre into linux. Given this fact, the Lustre API will become&lt;br/&gt;
a part of the Linux API as well, which has its own benefits/constraints.&lt;br/&gt;
Right now, as a filesystem specific interface, the fact that&lt;br/&gt;
it is filesystem specific may not be particularly problematic.&lt;br/&gt;
As time passes though, there will probably be various thoughts along the&lt;br/&gt;
lines of providing a generic interface to the same/similar functionality&lt;br/&gt;
that exists in other fs&apos;s, that either exists now or may grow&lt;br/&gt;
in the future. Various bits of such things as these have historically&lt;br/&gt;
been provided by ioctls, which we have seen on sequoia get rather&lt;br/&gt;
problematic, not to mention fs specific and riddled with inconsistencies&lt;br/&gt;
and hacks to get someone/something going (not talking about Lustre&lt;br/&gt;
only here...I think its pretty ubiquitous).&lt;/p&gt;

&lt;p&gt;Seems to me that it would be a good idea to float the Lustre interface&lt;br/&gt;
to some list like linux-fsdevel, for feedback. If they aren&apos;t interested&lt;br/&gt;
now, you at least want to be well positioned to say &quot;I told you so&quot; and&lt;br/&gt;
&quot;You&apos;re late to the party, the wheel has already been invented&quot; 5-10 years&lt;br/&gt;
down the line when they do start having that interest. No matter when they&lt;br/&gt;
express interest, its a guarantee that they will complain about something&lt;br/&gt;
though. Better to be early and prepared than late and left out. Then at&lt;br/&gt;
least you can be justified when you insult them back for being behind the&lt;br/&gt;
times and not thinking of things that are in your bread and butter land,&lt;br/&gt;
after they insult you for being clueless idiots who can&apos;t write code and&lt;br/&gt;
such (you &lt;b&gt;do&lt;/b&gt; know that is exactly what they will do don&apos;t you???).&lt;/p&gt;


&lt;p&gt;On llapi_layout_t&lt;br/&gt;
=================&lt;/p&gt;

&lt;p&gt;The idea of an opaque object to hold all of the various filesystem&lt;br/&gt;
specific information for a given &apos;layout&apos; is a good idea and the right way&lt;br/&gt;
to go. I have a suspicion that this implementation is not going to fly&lt;br/&gt;
very well though.&lt;/p&gt;

&lt;p&gt;The problem I see is that you are handing the user a pointer that points to&lt;br/&gt;
some opaque blob of memory. I don&apos;t know whether this is kernel memory or&lt;br/&gt;
user memory but either way it is not a good option. If it is kernel memory,&lt;br/&gt;
its is problematic on all sorts of levels. It gives malicious users not just&lt;br/&gt;
a hole, but an actual, wide open door to stroll on through to get access&lt;br/&gt;
to/change OS data. While I am certainly not qualified to comment wisely on&lt;br/&gt;
the sundry details, it seems dangerous to me. If the layout data are in the&lt;br/&gt;
user&apos;s own space, it still seems like a more dangerous thing than it needs to&lt;br/&gt;
be. In userspace, a code can accidentally scribble on the layout data, so&lt;br/&gt;
you&apos;d have wierd bugs cropping up due to invalid layout data being pointed&lt;br/&gt;
at by a valid layout pointer.&lt;/p&gt;

&lt;p&gt;Given all that, it seems to me that a better solution is to provide the user&lt;br/&gt;
with some sort of thing that looks and behaves like a file descriptor.&lt;br/&gt;
Basically an int or something that system code can use in its own way&lt;br/&gt;
to point to its own opaque data structure for what it needs, but which the&lt;br/&gt;
user never ever sees directly. For the sake of having a name, call it a&lt;br/&gt;
layout_fd for the moment. It would be valid after being created and can be&lt;br/&gt;
applied to any future file creations on the filesystem for as long as the&lt;br/&gt;
job runs, or it is destroyed by some &apos;layout_fd free&apos; function. Of course,&lt;br/&gt;
that means a bit of cleanup infrastructure to handle cases where the job&lt;br/&gt;
either aborts or forgets to free the layout_fd itself.&lt;/p&gt;

&lt;p&gt;On the llapi_layout man page&lt;br/&gt;
============================&lt;/p&gt;

&lt;p&gt;While having a &apos;master&apos; man page that holds all of the api functions&lt;br/&gt;
is good, it seems to me to be a much better thing to have individual&lt;br/&gt;
man pages for each of the different functions, with only a summary&lt;br/&gt;
man page in place of the current llapi_layout page, that includes a&lt;br/&gt;
&quot;see also&quot; section at the bottom for the rest, like many other system&lt;br/&gt;
functions have. Seems a better idea to have individual pages for each&lt;br/&gt;
function, with the the get/set variants of each function in the same page.&lt;br/&gt;
That way the user doesn&apos;t have to pore through lots of useless-in-the-moment&lt;br/&gt;
stuff, just to find the one function that is needed in that moment.&lt;/p&gt;


&lt;p&gt;On some of the llapi functions&lt;br/&gt;
==============================&lt;/p&gt;

&lt;p&gt;There appears to be much more functionality available via lfs than by&lt;br/&gt;
llapi. Much of this functionality would be useful to various people from&lt;br/&gt;
llapi as well. In fact, some of the lfs produced information appears&lt;br/&gt;
pretty much required to be able to use some of the current llapi functions.&lt;br/&gt;
What plans do you have to extend llapi to include this stuff?&lt;/p&gt;

&lt;p&gt;I see no function for getting the number of OSTs in the filesystem.&lt;br/&gt;
This is an absolutely essential function to have: How can I set valid&lt;br/&gt;
&apos;stripe count&apos; values if I don&apos;t know how many stripes I can have?&lt;/p&gt;

&lt;p&gt;Near as I can tell, the llapi_layout_by_fid takes as its an argument&lt;br/&gt;
some sort of lustre internal-ish name (the &apos;fid&apos;)...but there is no&lt;br/&gt;
way to get this from any llapi function. Need to have that, if&lt;br/&gt;
llapi_layout_by_fid is to be useful to anyone.&lt;/p&gt;


&lt;p&gt;llapi_layout_by_path: &quot;likely, but not guaranteed to fail&quot; on non-Lustre&lt;br/&gt;
filesystems???? It is unacceptable to provide guarantees that aren&apos;t&lt;br/&gt;
guarantees like this. Make it fail all the time on non-Lustre filesystems.&lt;/p&gt;

&lt;p&gt;llapi_search_fsname is mentioned in the description to llapi_layout_by_path&lt;br/&gt;
but nowhere else. Describe it.&lt;/p&gt;

&lt;p&gt;llapi_layout_stripe_size needs to have its get/set value be a 64bit integer.&lt;br/&gt;
As far as I understand, even the current API permits stripe sizes&lt;br/&gt;
up to 4GB and these would overflow a 32bit integer. As filesystems get&lt;br/&gt;
bigger, I can imagine someone somewhere wanting bigger stripes than&lt;br/&gt;
this too.&lt;/p&gt;

&lt;p&gt;llapi_layout_pool_name&lt;span class=&quot;error&quot;&gt;&amp;#91;_set&amp;#93;&lt;/span&gt;: I am not sure whether or not pools can be&lt;br/&gt;
defined from userspace. Is it correct that they can be?  Also, it would&lt;br/&gt;
be of some benefit to have some understanding of what a pool actually&lt;br/&gt;
is, why I might want one, and how to apply a pool to my IO. At least somewhere&lt;br/&gt;
if not in the man pages. I&apos;m a bit confused on the concept right now.&lt;/p&gt;

&lt;p&gt;On the error handling infrastructure of the API functions&lt;br/&gt;
=========================================================&lt;/p&gt;

&lt;p&gt;These comments are reproductions of some comments I made in a ticket&lt;br/&gt;
to the LLNL JIRA system for dealing with Sequoia Blue Gene issues,&lt;br/&gt;
reproduced here for the broader audience.&lt;/p&gt;


&lt;p&gt;The current error handling interface is not well put together. Get&lt;br/&gt;
functions return the value gotten, except when they don&apos;t, and when&lt;br/&gt;
they don&apos;t their return value is negative, except that some valid&lt;br/&gt;
values are negative so it isn&apos;t always true that a negative value&lt;br/&gt;
is an error, and so go check the errno to be sure. That is just a&lt;br/&gt;
broken by design way of doing an interface.&lt;/p&gt;

&lt;p&gt;Ned mentioned in correspondence to me, the possibility of having&lt;br/&gt;
the get functions never fail. This is not an acceptable alternative&lt;br/&gt;
because the get call requires an argument from user code, typically&lt;br/&gt;
the pointer to the layout thing. Since this comes from the user,&lt;br/&gt;
it may be invalid: users and applications coders are inevitably&lt;br/&gt;
more inventive than you can ever hope to predict at design time (i.e. now).&lt;br/&gt;
Without an error/fail path, the code will end up in some sort&lt;br/&gt;
of unconditional &quot;Splat!&quot; in the lustre library code itself, which&lt;br/&gt;
is a very poor result.&lt;/p&gt;

&lt;p&gt;Please, just go back to the unix philosophy that has existed for&lt;br/&gt;
decades and have one thing do one thing ONLY, but do it well, rather&lt;br/&gt;
than overload those return values with error characteristics in some&lt;br/&gt;
cases, but not even consistently for that.&lt;/p&gt;

&lt;p&gt;On the asymmetries between the get/set functions&lt;br/&gt;
================================================&lt;/p&gt;

&lt;p&gt;argument lists&lt;br/&gt;
==============&lt;/p&gt;

&lt;p&gt;In current form, the &apos;get&apos; functions return whatever it is they get&lt;br/&gt;
as a return value rather than as a function argument. The &apos;set&apos;&lt;br/&gt;
functions take an argument of whatever it is they are setting. This&lt;br/&gt;
is asymmetric, and not a good thing. The way it is now, the get&lt;br/&gt;
functions have their return value be the thing that was gotten,&lt;br/&gt;
and the set values having it as an argument. In other words, it is&lt;br/&gt;
intrinsically a messier interface, exactly as noted above with the&lt;br/&gt;
error handling business.&lt;/p&gt;

&lt;p&gt;Instead, they should both have arguments that define what it is they&lt;br/&gt;
are get/setting.&lt;/p&gt;

&lt;p&gt;There is other precedent for gotten values being in the argument&lt;br/&gt;
list of the function itself. Namely stat/statfs, which each take a&lt;br/&gt;
struct argument that aggregates getting a bunch of stuff into one&lt;br/&gt;
call. It would be a good thing to follow this same model, though&lt;br/&gt;
obviously with the variation that here getting a non-opaque struct&lt;br/&gt;
of lots of stuff together is replaced by getting a non-opaque single&lt;br/&gt;
value. That leave the possibility for extensibility a lot more open,&lt;br/&gt;
since you no longer have to worry about new/old versions of some struct.&lt;/p&gt;

&lt;p&gt;Speaking of versions and extensibility: there does not seem to be any&lt;br/&gt;
query function for getting the API version. This would be a good thing&lt;br/&gt;
to have.&lt;/p&gt;


&lt;p&gt;naming&lt;br/&gt;
======&lt;/p&gt;

&lt;p&gt;The get and set functions in the api are paired together according&lt;br/&gt;
to purpose. One half of the pair gets, the other half sets.  Here&lt;br/&gt;
is one example:&lt;/p&gt;

&lt;p&gt;     int llapi_layout_stripe_count(const llapi_layout_t *layout);&lt;/p&gt;

&lt;p&gt;     int llapi_layout_stripe_count_set(llapi_layout_t *layout,&lt;br/&gt;
                                       int stripe_count);&lt;/p&gt;

&lt;p&gt;But there is naming asymmetry. The &apos;set&apos; variant includes the &apos;set&apos; string&lt;br/&gt;
in its name, the &apos;get&apos; variant does not. This is not good.&lt;/p&gt;

&lt;p&gt;On an aesthetic level, when I read the &apos;llapi_layout_stripe_count_set&apos;,&lt;br/&gt;
I immediately know what it does...it sets stripe count. The asymmetry&lt;br/&gt;
of not having the get variant say &apos;get&apos; in its name, means that I don&apos;t&lt;br/&gt;
have that same self-documentation, which is jarring somehow to me. It&lt;br/&gt;
leaves me with some sort of a `what does it do?&apos; hole in my expectations&lt;br/&gt;
or something.&lt;/p&gt;

&lt;p&gt;Ned made an argument to me for omitting &apos;get&apos;, based on the idea&lt;br/&gt;
that the get functions are or should be most easily usable as &apos;inline&apos;&lt;br/&gt;
things that can appear in various mathematical expressions and such.&lt;br/&gt;
In some sense, I see this as an implication that the functions are to&lt;br/&gt;
be read as actually being the thing they are getting rather than&lt;br/&gt;
being somehow separate from the thing they are getting. This is certainly&lt;br/&gt;
inconsistent with the way I already use various get/set functionality for&lt;br/&gt;
lustre and other other filesystems, and what I&apos;d expect others in my&lt;br/&gt;
position to do as well.&lt;/p&gt;

&lt;p&gt;Here&apos;s why:&lt;/p&gt;

&lt;p&gt;Anyone using these api functions is with little doubt fairly sophisticated,&lt;br/&gt;
in terms of wanting to get the best performance etc. That means that they&lt;br/&gt;
also have a fairly large codebase to maintain and-&lt;del&gt;most importantly&lt;/del&gt;-support&lt;br/&gt;
on lots of different platforms, of which Lustre is only one. Panasas is&lt;br/&gt;
another for which it is possible to do raid/striping/fs parameter optimization&lt;br/&gt;
and setting from inside a job. I think there are probably others as well.&lt;br/&gt;
GPFS in my experience has a bit, but not much usable by the likes of me (mostly&lt;br/&gt;
for sysadmins and such). XFS? Haven&apos;t used it except on my linux box&lt;br/&gt;
at home, but I know it runs on large machines in various places and has an&lt;br/&gt;
extensive API interface by way of something called &apos;xfsctl&apos;. Etc.&lt;/p&gt;

&lt;p&gt;The point that I&apos;m getting to here is that every one of these interfaces is&lt;br/&gt;
different and requires different code and functions to get the various&lt;br/&gt;
parameters that are needed from the filesystem and to set them. Why is that&lt;br/&gt;
important? It is important because it means that the &quot;right&quot; way&lt;br/&gt;
for an application to code to that sort of portability requirement&lt;br/&gt;
is to have comparatively thin &apos;shim&apos; interfaces to each filesystem, with&lt;br/&gt;
the bulk of the code in some common, platform neutral form that can&lt;br/&gt;
be used on all or most of the different platforms without change.&lt;/p&gt;

&lt;p&gt;The thin shim interface does only the get/set stuff. All of the actual&lt;br/&gt;
manipulation gets done in the application&apos;s platform independent code layer&lt;br/&gt;
because once in that form, the application can do all of its own&lt;br/&gt;
manipulation in one place on its own data structures, rather than having&lt;br/&gt;
many duplications of the same thing, one for each platform. So in that model,&lt;br/&gt;
the get functions aren&apos;t doing anything except the actual get itself,&lt;br/&gt;
and so won&apos;t be used inline in expressions, as you were assuming.&lt;/p&gt;

&lt;p&gt;That makes having a &apos;get&apos; in the name a useful thing, since the function&lt;br/&gt;
is getting something other than itself, rather than actually being the&lt;br/&gt;
thing gotten as is assumed in Ned&apos;s model.&lt;/p&gt;
</comment>
                            <comment id="65186" author="afn" created="Tue, 27 Aug 2013 18:36:51 +0000"  >&lt;p&gt;BTW: I should probably mention that my comments just above are based on what I&lt;br/&gt;
believe is patch version 15, as enumerated in the review.whamcloud site.&lt;/p&gt;</comment>
                            <comment id="65187" author="afn" created="Tue, 27 Aug 2013 18:39:58 +0000"  >&lt;p&gt;Doh. Just discovered a typo (not particularly important though). It&lt;br/&gt;
should be: 5-15 CPU centuries per week, not 500-1500CPU centuries per week.&lt;br/&gt;
The latter is rather a lot &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="65218" author="nedbass" created="Tue, 27 Aug 2013 23:24:59 +0000"  >&lt;p&gt;Andy, thanks for your detailed input.  Some of your comments go beyond the scope of the current effort, so I&apos;m going to limit my response to the points we can move forward on.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The problem I see is that you are handing the user a pointer that points to some opaque blob of memory.&lt;br/&gt;
...&lt;br/&gt;
Given all that, it seems to me that a better solution is to provide the user with some sort of thing that looks and behaves like a file descriptor.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The llapi_layout_t * points to user-space memory, and we&apos;re dealing with C here, so a user can scribble over the data no matter how many layers of abstraction are added. A file descriptor-like object would add complexity and overhead for no real additional protection. However, to help detect corruption we may want to add an internal magic value to the layout.&lt;/p&gt;

&lt;p&gt;The interface between the library and the kernel are well-established input-sanitizing system calls, namely getxattr and setxattr, so no new attack vector is introduced.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;While having a &apos;master&apos; man page that holds all of the api functions is good, it seems to me to be a much better thing to have individual man pages for each of the different functions, with only a summary man page in place of the current llapi_layout page, that includes a &quot;see also&quot; section at the bottom for the rest, like many other system&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Agreed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There appears to be much more functionality available via lfs than by llapi. Much of this functionality would be useful to various people from llapi as well. In fact, some of the lfs produced information appears pretty much required to be able to use some of the current llapi functions.  What plans do you have to extend llapi to include this stuff?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Actually, lfs uses llapi to implement its functionality.  The problem is that llapi is largely undocumented, and if you think the proposed extensions are bad you would be horrified by the existing API.  What we are trying to do here is implement a sane, portable, and well-documented set of functions to let users read and configure striping attributes.  Overhauling and documenting the rest of llapi is sorely needed but well beyond the current scope.&lt;/p&gt;

&lt;p&gt;That said, there are existing ways to do the things you mention, but some of them rely on ioctl() so won&apos;t work on BGQ systems.  I&apos;ll briefly mention the relevant llapi functions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I see no function for getting the number of OSTs in the filesystem.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This is the API function.  It uses an ioctl() that IBM claims to have implemented function-shipping support for on BGQ.  However, it didn&apos;t work in my tests, so we need to continue working with them to fix it.&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;/* @mnt    - contains a path within the Lustre filesystem.
 * @count  - value returned here
 * @is_mdt - get number of OSTs &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; 0, &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; get number of MDTs
 */
&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; llapi_get_obd_count(&lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *mnt, &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; *count, &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; is_mdt);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Near as I can tell, the llapi_layout_by_fid takes as its an argument some sort of lustre internal-ish name (the &apos;fid&apos;)...but there is no way to get this from any llapi function. Need to have that, if llapi_layout_by_fid is to be useful to anyone.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;A function exists:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; llapi_path2fid(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;char&lt;/span&gt; *path, lustre_fid *fid);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;However, it relies on ioctl() so isn&apos;t portable.  Typical users need not be concerned with lustre FIDs. They are primarily used by developers, filesystem monitoring software, or Hierarchical Storage Management (HSM) systems.  It is the exactly the type of internal implementation detail that a proper API should hide from users.  I did not include this function in my original design, but an Intel reviewer requested it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;llapi_layout_by_path: &quot;likely, but not guaranteed to fail&quot; on non-Lustre filesystems???? It is unacceptable to provide guarantees that aren&apos;t guarantees like this. Make it fail all the time on non-Lustre filesystems.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;It is explictly not a guarantee.  In other words, don&apos;t rely on this, and instead use llapi_search_fsname() (which is admittedly undocumented, but we can provide an example).  This lets the application check once on the directory, rather than the library blindly checking on every single file lookup, which could get rather expensive.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;llapi_layout_stripe_size needs to have its get/set value be a 64bit integer.  As far as I understand, even the current API permits stripe sizes up to 4GB and these would overflow a 32bit integer. As filesystems get bigger, I can imagine someone somewhere wanting bigger stripes than this too.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Lustre stipe size is a 32-bit type with a maximum valid value of 2GB.  See the definition of &lt;tt&gt;lov_user_md&lt;/tt&gt; in &lt;tt&gt;/usr/include/lustre/lustre_user.h&lt;/tt&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;llapi_layout_pool_name&lt;span class=&quot;error&quot;&gt;&amp;#91;_set&amp;#93;&lt;/span&gt;: I am not sure whether or not pools can be defined from userspace. Is it correct that they can be?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;No, assuming that by &quot;from userspace&quot; you mean &quot;by an unprivileged user&quot;.  Currently only system administrators can manage pools in Lustre.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Also, it would be of some benefit to have some understanding of what a pool actually is, why I might want one, and how to apply a pool to my IO. At least somewhere if not in the man pages. I&apos;m a bit confused on the concept right now.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Quoting the Lustre manual:  &quot;&lt;tt&gt;OST pools allows the administrator to associate a name with an arbitrary subset of OSTs in a Lustre cluster. A group of OSTs can be combined into a named pool with unique access permissions and stripe characteristics.&lt;/tt&gt;&quot;&lt;/p&gt;

&lt;p&gt;LLNL does not use pools, and AFAIK it is not a commonly used feature in general.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The current error handling interface is not well put together. Get functions return the value gotten, except when they don&apos;t, and when they don&apos;t their return value is negative, except that some valid values are negative so it isn&apos;t always true that a negative value is an error, and so go check the errno to be sure. That is just a broken by design way of doing an interface.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This is the biggest flaw in the current design from my perspective.  However, I think it can be addressed without abandoning the convention of getting values via the function return value and while still gracefully handling invalid user inputs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There is other precedent for gotten values being in the argument list of the function itself. Namely stat/statfs, which each take a struct argument that aggregates getting a bunch of stuff into one call. It would be a good thing to follow this same model, though&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;In my view, the only good case for returing data via a pointer argument is to get a compound data type like a struct stat.  In our case we are dealing with scalar integer types, and I don&apos;t forsee the need to change them to compound types in the future.  After all, the point of the API was to abstract away the compound type &lt;tt&gt;struct lov_user_md&lt;/tt&gt; behind simple accessor functions.  Adding a pointer to the argument list increases the potential for user error and complicates the API specification without any real added utility.&lt;/p&gt;

&lt;p&gt;The real source of messiness is not that the return value can be overloaded with error information if something goes wrong (a well-established convention).  Rather, the problem is that the API exposes the literal value (-1) of the exceptional return codes, rather than abstracting them behind macros.  In error conditions the API can return the macro LLAPI_ERROR.  Similarly, macros LLAPI_USE_ALL_OSTS and LLAPI_MDT_CHOOSES_OFFSET can be use in place of -1 to indicate those special cases.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But there is naming asymmetry. The &apos;set&apos; variant includes the &apos;set&apos; string in its name, the &apos;get&apos; variant does not. This is not good.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I am willing to makes the names symmetric as you suggest.&lt;/p&gt;</comment>
                            <comment id="65230" author="afn" created="Wed, 28 Aug 2013 05:39:19 +0000"  >
&lt;p&gt;More responses from me later, but one quick one now:&lt;/p&gt;

&lt;p&gt;I wrote:&lt;/p&gt;

&lt;p&gt;    llapi_layout_stripe_size needs to have its get/set value be a 64bit integer. As far as I understand, even the current API permits stripe sizes up to 4GB and these would overflow a 32bit integer. As filesystems get bigger, I can imagine someone somewhere wanting bigger stripes than this too.&lt;/p&gt;

&lt;p&gt;You responded:&lt;/p&gt;

&lt;p&gt;Lustre stipe size is a 32-bit type with a maximum valid value of 2GB. See the definition of lov_user_md in /usr/include/lustre/lustre_user.h.&lt;/p&gt;

&lt;p&gt;and now I respond, to this with:&lt;/p&gt;

&lt;p&gt;The current documentation on the old API has this to say in section 32.3.2:&lt;/p&gt;


&lt;p&gt;stripe_size    &quot;Specifies stripe size (in bytes). Should be multiple of 64KB, not exceeding 4GB.&lt;/p&gt;

&lt;p&gt;and this to say in section 18.2.1&lt;/p&gt;

&lt;p&gt;&quot;The maximum stripe size is 4 GB&quot;&lt;/p&gt;


&lt;p&gt;So is this documentation wrong, or is the new Lustre in versions &amp;gt;2.4 and its llapi reduced in functionality from the Lustre (the manual I got this from claims to be for version &quot;2.x&quot;) and the old API?&lt;/p&gt;
</comment>
                            <comment id="65282" author="nedbass" created="Wed, 28 Aug 2013 17:31:36 +0000"  >&lt;p&gt;I was mistaken, the maximum stripe size is 4g - 64k, but the type is 32-bit and always has been, AFAIK.  This is not a new limitation in Lustre 2.4.  The documentation is inaccurate.&lt;/p&gt;</comment>
                            <comment id="65294" author="afn" created="Wed, 28 Aug 2013 18:47:54 +0000"  >
&lt;p&gt;How is (4G-64k) possible with a signed integer type? &lt;/p&gt;
</comment>
                            <comment id="65301" author="nedbass" created="Wed, 28 Aug 2013 19:30:33 +0000"  >&lt;p&gt;Good point.  It should be type size_t.&lt;/p&gt;</comment>
                            <comment id="65480" author="adilger" created="Fri, 30 Aug 2013 22:53:53 +0000"  >&lt;p&gt;Andy, thanks for your input.  We often don&apos;t get enough feedback from end users, so I thank you for the time spent to write down this information.&lt;/p&gt;

&lt;p&gt;Some comments:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Separate man pages for each command: by all means yes.  This has been listed on the &lt;a href=&quot;https://wiki.hpdd.intel.com/display/PUB/Project+Ideas&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/PUB/Project+Ideas&lt;/a&gt; for a couple of years.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;Function return codes: I&apos;ve got mixed feelings about this.  I think as long as fetching the layout from the kernel works without errors, then extracting the actual values from the structure shouldn&apos;t be able to fail?&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;4GB stripe_size limit: I agree this is a limit of the current kernel data structure, but if we are working on a new API it makes sense to use an explicit 64-bit value (not size_t) for this API.  If someone specifies a value that is too large for the current data structures it can always return -EOVERFLOW or similar.  I don&apos;t think the API needs to be too tied/limited to the current implementation, especially since we will be adding new types of layouts in the near future (mirrors, data on MDT, etc).&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;API for OST count: I suspect that it might be possible to implement a non-ioctl method find this information by digging around in /proc, but this would itself need to map the supplied path to a specific Lustre filesystem by comparing the supplied path to the mountpoint in /etc/fstab, which is neither robust nor efficient.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;Symmetry between get/set: I&apos;m also a fan of symmetry between function names in APIs (get/set, get/put, add/del, etc), so I&apos;m also in favour of making these match.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="65482" author="afn" created="Fri, 30 Aug 2013 23:11:05 +0000"  >
&lt;p&gt;Andreas,&lt;/p&gt;

&lt;p&gt;I&apos;ve got more comments in draft form, but as far as &quot;fetching the layout from the kernel&lt;br/&gt;
works without errors&quot; goes, I have one question:&lt;/p&gt;

&lt;p&gt;Each of the get calls takes, as an argument from the user, the layout pointer. How are&lt;br/&gt;
you going to handle the case when the user passes in a null/invalid pointer? This could easily&lt;br/&gt;
happen because someone has code where the layout &apos;setup&apos; is simply bypassed or forgotten&lt;br/&gt;
or whatever. &lt;/p&gt;

&lt;p&gt;It is a very bad answer to say &quot;the code will go &quot;splat!&quot; somewhere in the lustre library function, rather than simply handling the user footgun case and returning an error code that&lt;br/&gt;
tells them they screwed up.&lt;/p&gt;</comment>
                            <comment id="65636" author="jhammond" created="Tue, 3 Sep 2013 17:39:03 +0000"  >&lt;p&gt;Andy,&lt;/p&gt;

&lt;p&gt;Thanks again for your input. Can you give some more detail about which high level operations you would like? Especially any you see missing here. I expect that you want:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Functions to determine if a file descriptor or path (FDoP) belongs to a Lustre FS.&lt;/li&gt;
	&lt;li&gt;Functions to get the stripe size and stripe count for a FDoP.&lt;/li&gt;
	&lt;li&gt;Functions to determine the max and min stripe size and count for files on a given Lustre FS.&lt;/li&gt;
	&lt;li&gt;Functions to create files with given stripe size and count.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;What would you add to this list?&lt;/p&gt;

&lt;p&gt;I am not familiar with the function shipping implementation used on BG. (But I believe that I can imagine how it works.) Could you provide a short description? Based on Ned&apos;s patch I assume that functions based on extended attributes are much better suited to function shipping that those based on ioctls(). (Again I can imagine why this might be.) Is that so? Are there any unusual limitations of function shipping that we should account for here?&lt;/p&gt;</comment>
                            <comment id="65637" author="morrone" created="Tue, 3 Sep 2013 18:06:14 +0000"  >&lt;blockquote&gt;&lt;p&gt;I am not familiar with the function shipping implementation used on BG. (But I believe that I can imagine how it works.) Could you provide a short description?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The compute nodes (CN) of BG/Q runs a light-weight OS.  It provides many, but not all, of the common library and system calls found on Linux systems.  The calls are basically converted into RPCs.  The parameters are sent over the network to a node (called the I/O Node (ION)) that really runs Linux.  The function/system call is made there on behalf of the user.  When the function returns, the return code, error code, and associated buffers are copied back to the CN that made the call.&lt;/p&gt;

&lt;p&gt;The extended attribute functions have the pointer to the associated buffer and the buffer size clearly listed in the function parameter list.  That makes it easy for the function shipping system to copy the complete buffer back and forth between the nodes.&lt;/p&gt;

&lt;p&gt;ioctl(), on the other hand, has special behavior for every ioctl request number.  There is no single standard, which means that there is no generic function shipping code that can be written to handle all ioctl() calls.  Special purpose code would need to be written for many of the ioctls.  Therefore most ioctls, including all of the Lustre ioctls, are unsupported by the function shipping system.&lt;/p&gt;</comment>
                            <comment id="65642" author="afn" created="Tue, 3 Sep 2013 19:24:10 +0000"  >
&lt;p&gt;John,&lt;/p&gt;

&lt;p&gt;Chris filled you in on the function shipping stuff better than I could. I&apos;ll just note that this type of thing is not at all specific to BG machines. Cray also has such for example, and I would guess that there are others. Basically any machine that runs some sort of reduced OS kernel is susceptible...someone has to make a choice somewhere along the lines of &quot;what is in/what is out&quot; and there is always something that someone wants that ends up on the wrong side of the line.&lt;/p&gt;

&lt;p&gt;The rest of your email I&apos;ll get to shortly.&lt;/p&gt;</comment>
                            <comment id="65648" author="afn" created="Tue, 3 Sep 2013 20:25:00 +0000"  >&lt;p&gt;All,&lt;/p&gt;

&lt;p&gt;Rather than respond to Ned&apos;s (and others) comments in one big response where things get lost,&lt;br/&gt;
I&apos;ll cut things into one topic at a time chunks. Here is a first chunk, re the layout_t thing:&lt;/p&gt;

&lt;p&gt;&quot;The llapi_layout_t * points to user-space memory, and we&apos;re dealing with&lt;br/&gt;
C here, so a user can scribble over the data no matter how many layers&lt;br/&gt;
of abstraction are added. A file descriptor-like object would add&lt;br/&gt;
complexity and overhead for no real additional protection. However, to&lt;br/&gt;
help detect corruption we may want to add an internal magic value to the&lt;br/&gt;
layout.&quot;&lt;/p&gt;

&lt;p&gt;Re scribbling: that is true no matter which language if its in user&lt;br/&gt;
space, but the point I wanted to make is that it is very directly&lt;br/&gt;
available for scribbling if the pointer is not obfuscated in some&lt;br/&gt;
manner. Just write to what it points to and you&apos;ve done it: even though&lt;br/&gt;
the pointer is still valid, the data it points to no longer are. I don&apos;t&lt;br/&gt;
know what this magic is that you are talking about, but if it adds to&lt;br/&gt;
the difficulty of scribbling all to the good. Anyway, I am willing to&lt;br/&gt;
live with it the way it is, but brought it up as a point of danger for&lt;br/&gt;
the robustness of a public interface.&lt;/p&gt;

</comment>
                            <comment id="65649" author="afn" created="Tue, 3 Sep 2013 20:27:42 +0000"  >
&lt;p&gt;Re: ioctls:&lt;/p&gt;

&lt;p&gt;It appears that at least some of the &quot;new&quot; API still will require ioctls (e.g. the&lt;br/&gt;
obd count function Ned notes)?&lt;/p&gt;

&lt;p&gt;I thought that the whole point of the new API was to get rid of them altogether, due&lt;br/&gt;
to the implementation difficulties on various arches...&lt;/p&gt;
</comment>
                            <comment id="65651" author="afn" created="Tue, 3 Sep 2013 20:32:11 +0000"  >&lt;p&gt;&quot;llapi_layout_by_path: &quot;likely, but not guaranteed to fail&quot; on non-Lustre&lt;br/&gt;
filesystems???? It is unacceptable to provide guarantees that aren&apos;t&lt;br/&gt;
guarantees like this. Make it fail all the time on non-Lustre&lt;br/&gt;
filesystems.&lt;/p&gt;

&lt;p&gt;It is explictly not a guarantee. In other words, don&apos;t rely on this, and&lt;br/&gt;
instead use llapi_search_fsname() (which is admittedly undocumented, but&lt;br/&gt;
we can provide an example). This lets the application check once on the&lt;br/&gt;
directory, rather than the library blindly checking on every single file&lt;br/&gt;
lookup, which could get rather expensive.&quot;&lt;/p&gt;


&lt;p&gt;Actually it is a guarantee: It is a guarantee that the result of the&lt;br/&gt;
function cannot be relied upon to be correct. It is not acceptable for&lt;br/&gt;
any code, particularly a library, to generate unreliable output.&lt;br/&gt;
Guaranteed correct and slow beats fast but unguaranteed and possibly&lt;br/&gt;
invalid, every time. The function MUST fail when given non-Lustre&lt;br/&gt;
information...what does a layout even mean for a non-Lustre path?&lt;/p&gt;


</comment>
                            <comment id="65652" author="afn" created="Tue, 3 Sep 2013 20:32:58 +0000"  >


&lt;p&gt;&quot;llapi_layout_stripe_size needs to have its get/set value be a 64bit&quot;&lt;/p&gt;

&lt;p&gt;I&apos;m glad you and Andreas now see this as a problem and will change to a 64bit type.&lt;/p&gt;</comment>
                            <comment id="65653" author="afn" created="Tue, 3 Sep 2013 20:38:45 +0000"  >
&lt;p&gt;Re: lfs/llapi and such&lt;/p&gt;

&lt;p&gt;&quot;Actually, lfs uses llapi to implement its functionality. The problem is&lt;br/&gt;
that llapi is largely undocumented, and if you think the proposed&lt;br/&gt;
extensions are bad you would be horrified by the existing API.&quot;&lt;/p&gt;

&lt;p&gt;...well I guess you&apos;ve got your work cut out for you then &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;


&lt;p&gt;&quot;What we are trying to do here is implement a sane, portable, and&lt;br/&gt;
well-documented set of functions to let users read and configure&lt;br/&gt;
striping attributes. Overhauling and documenting the rest of llapi is&lt;br/&gt;
sorely needed but well beyond the current scope.&quot;&lt;/p&gt;


&lt;p&gt;ok, then my comments should be taken as an indicator that some&lt;br/&gt;
additional thought needs to be put into which parts to document and&lt;br/&gt;
which parts to omit. The point behind my comment was basically that some&lt;br/&gt;
of the llapi functions in the man page require information that is not&lt;br/&gt;
available from any of the other functions that are listed in the man&lt;br/&gt;
page, and other information that is omitted, but is needed to make the&lt;br/&gt;
functions that are there actually useful.&lt;/p&gt;

&lt;p&gt;It seems to me that John Hammond (comment above) was also pushing in this&lt;br/&gt;
same direction--&quot;what should the API look like?&quot; I&apos;ve got several different&lt;br/&gt;
responses to this, which will follow.&lt;/p&gt;



</comment>
                            <comment id="65655" author="afn" created="Tue, 3 Sep 2013 20:44:40 +0000"  >

&lt;p&gt;&quot;I see no function for getting the number of OSTs in the filesystem.&lt;/p&gt;

&lt;p&gt;This is the API function. It uses an ioctl() that IBM claims to have&lt;br/&gt;
implemented function-shipping support for on BGQ. However, it didn&apos;t&lt;br/&gt;
work in my tests, so we need to continue working with them to fix it.&lt;/p&gt;

&lt;p&gt;/* @mnt    - contains a path within the Lustre filesystem.&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;@count  - value returned here&lt;/li&gt;
	&lt;li&gt;@is_mdt - get number of OSTs if 0, else get number of MDTs&lt;br/&gt;
 */&lt;br/&gt;
int llapi_get_obd_count(char *mnt, int *count, int is_mdt);&quot;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Some function that tells me how many OSTs (or whatever other things) I have&lt;br/&gt;
available in the filesystem I&apos;m running on is essential to being able to set&lt;br/&gt;
that number in the functions I use to set the stripe count. &lt;/p&gt;

&lt;p&gt;The only other way to get that information (a messy hack) from where I work&lt;br/&gt;
would be to create a temporary file with &quot;-1&quot; stripes (aka let the fs stripe &lt;br/&gt;
it over everything) and then query for how many stripes it got striped over.&lt;/p&gt;
</comment>
                            <comment id="65656" author="afn" created="Tue, 3 Sep 2013 20:56:58 +0000"  >&lt;p&gt;&quot;Near as I can tell, the llapi_layout_by_fid takes as its an argument&lt;br/&gt;
some sort of lustre internal-ish name (the &apos;fid&apos;)...but there is no way&lt;br/&gt;
to get this from any llapi function. Need to have that, if&lt;br/&gt;
llapi_layout_by_fid is to be useful to anyone.&lt;/p&gt;

&lt;p&gt;A function exists:&lt;/p&gt;

&lt;p&gt;int llapi_path2fid(const char *path, lustre_fid *fid);&lt;br/&gt;
However, it relies on ioctl() so isn&apos;t portable. Typical users need not&lt;br/&gt;
be concerned with lustre FIDs. They are primarily used by developers,&lt;br/&gt;
filesystem monitoring software, or Hierarchical Storage Management (HSM)&lt;br/&gt;
systems. It is the exactly the type of internal implementation detail&lt;br/&gt;
that a proper API should hide from users. I did not include this&lt;br/&gt;
function in my original design, but an Intel reviewer requested it.&quot;&lt;/p&gt;



&lt;p&gt;Hmmm...that indicates that there are multiple customers of this API,&lt;br/&gt;
with very different needs. Would appear to be important to get a&lt;br/&gt;
list of requirements from that other community as well as the one&lt;br/&gt;
I am trying to represent then...exactly the sort of things that John&lt;br/&gt;
Hammond was just querying me for, for that other community.&lt;/p&gt;

&lt;p&gt;For the kinds of stuff I do and for what I need, the FID stuff is&lt;br/&gt;
not useful, and is something to remove entirely. But if that other&lt;br/&gt;
community needs it, then I&apos;d claim that there needs to be more in&lt;br/&gt;
the API than there currently is published, in order to make it at all&lt;br/&gt;
useful.&lt;/p&gt;


</comment>
                            <comment id="65657" author="afn" created="Tue, 3 Sep 2013 20:59:43 +0000"  >

&lt;p&gt;&quot;llapi_layout_pool_name&lt;span class=&quot;error&quot;&gt;&amp;#91;_set&amp;#93;&lt;/span&gt;: I am not sure whether or not pools can be&lt;br/&gt;
defined from userspace. Is it correct that they can be?&lt;/p&gt;

&lt;p&gt;No, assuming that by &quot;from userspace&quot; you mean &quot;by an unprivileged&lt;br/&gt;
user&quot;. Currently only system administrators can manage pools in Lustre.&lt;/p&gt;

&lt;p&gt;Also, it would be of some benefit to have some understanding of what a&lt;br/&gt;
pool actually is, why I might want one, and how to apply a pool to my&lt;br/&gt;
IO. At least somewhere if not in the man pages. I&apos;m a bit confused on&lt;br/&gt;
the concept right now.&lt;/p&gt;

&lt;p&gt;Quoting the Lustre manual: &quot;OST pools allows the administrator to&lt;br/&gt;
associate a name with an arbitrary subset of OSTs in a Lustre cluster. A&lt;br/&gt;
group of OSTs can be combined into a named pool with unique access&lt;br/&gt;
permissions and stripe characteristics.&quot;&lt;/p&gt;

&lt;p&gt;LLNL does not use pools, and AFAIK it is not a commonly used feature in&lt;br/&gt;
general.&quot;&lt;/p&gt;



&lt;p&gt;This pool stuff exactly the sort of stuff I&apos;d leave out of the API, if it were only&lt;br/&gt;
for me. But if it is useful to other communities, then again, I&apos;d say it needs to be&lt;br/&gt;
expanded to some extent, in order to expose a self consistent set of functionality.&lt;/p&gt;


</comment>
                            <comment id="65659" author="afn" created="Tue, 3 Sep 2013 21:25:49 +0000"  >
&lt;p&gt;Below is a list of the current API functions as exposed in the llapi_layout&lt;br/&gt;
man page on the LLNL machine &quot;vulcan&quot;. I have put in comments on each. The&lt;br/&gt;
oteher half of this comment is a &quot;whats missing&quot; statement. I&apos;ll put together &lt;br/&gt;
something like that asap.&lt;/p&gt;



&lt;hr /&gt;

&lt;p&gt;llapi_layout_t *llapi_layout_by_path(const char *path);&lt;/p&gt;

&lt;p&gt;llapi_layout_t *llapi_layout_by_fd(int fd);&lt;/p&gt;


&lt;p&gt;These two functions are each requirements to have in an API that I use.&lt;/p&gt;



&lt;p&gt;llapi_layout_t *llapi_layout_by_fid(const char *lustre_dir,&lt;br/&gt;
                                    const char *fidstr);&lt;/p&gt;


&lt;p&gt;This function requires a fidstr pointer, which cannot be gotten from anything&lt;br/&gt;
in the current API. It is therefore useless to include it in the API until&lt;br/&gt;
that defect is fixed. Further, I do not expect it would ever be a useful&lt;br/&gt;
thing to me, though Ned suggests that it might be to various fs monitoring&lt;br/&gt;
tools. You&apos;ll have to get that information from that user base though.&lt;/p&gt;


&lt;hr /&gt;

&lt;p&gt;llapi_layout_t *llapi_layout_alloc();&lt;/p&gt;

&lt;p&gt;void llapi_layout_free(llapi_layout_t *layout);&lt;/p&gt;


&lt;p&gt;These are useful/required functions under some conditions. There is a problem&lt;br/&gt;
however, in that it is not clear to me whether the opaque blob of layout data&lt;br/&gt;
is already populated with something, or whether it is something I have to set&lt;br/&gt;
by various explicit llapi calls. If the latter is the case, then how do I&lt;br/&gt;
know what calls are required? And what happens if I don&apos;t set everything I&lt;br/&gt;
have to (and how could I ever ensure that I set everything anyway, with&lt;br/&gt;
an opaque blob?)? In other words, what happens when I do this:&lt;/p&gt;


&lt;p&gt;my_layout = llapi_layout_alloc();&lt;/p&gt;

&lt;p&gt;rc = llapi_layout_stripe_count_get(my_layout,count);&lt;br/&gt;
...&lt;/p&gt;

&lt;p&gt;(pointer-ify as needed to get the syntax correct-my C is terrible):&lt;/p&gt;




&lt;hr /&gt;


&lt;p&gt;int llapi_layout_stripe_count(const llapi_layout_t *layout);&lt;/p&gt;

&lt;p&gt;int llapi_layout_stripe_count_set(llapi_layout_t *layout,&lt;br/&gt;
                                  int stripe_count);&lt;/p&gt;

&lt;p&gt;int llapi_layout_stripe_size(const llapi_layout_t *layout);&lt;/p&gt;

&lt;p&gt;int llapi_layout_stripe_size_set(llapi_layout_t *layout,&lt;br/&gt;
                                 int stripe_size);&lt;/p&gt;

&lt;p&gt;int llapi_layout_pattern(const llapi_layout_t *layout);&lt;/p&gt;

&lt;p&gt;int llapi_layout_pattern_set(llapi_layout_t *layout, int pattern);&lt;/p&gt;


&lt;p&gt;Must have all three pairs of these functions. The layout pattern function&lt;br/&gt;
is currently sort of meaningless though since lustre only has raid0, but&lt;br/&gt;
perhaps someday it&apos;ll grow some other patterns.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;int llapi_layout_ost_index(const llapi_layout_t *layout,&lt;br/&gt;
                           int stripe_number);&lt;/p&gt;

&lt;p&gt;int llapi_layout_ost_index_set(llapi_layout_t *layout,&lt;br/&gt;
                               int stripe_number, int ost_index);&lt;/p&gt;

&lt;p&gt;const char *llapi_layout_pool_name(const llapi_layout_t *layout);&lt;/p&gt;

&lt;p&gt;int llapi_layout_pool_name_set(llapi_layout_t *layout,&lt;br/&gt;
                               const char *pool_name);&lt;/p&gt;


&lt;p&gt;I cannot imagine any use for these functions and would remove them if it&lt;br/&gt;
were my API to design.&lt;/p&gt;


&lt;hr /&gt;



&lt;p&gt;int llapi_layout_file_create(const llapi_layout_t *layout,&lt;br/&gt;
                             char *path, int flags, int mode);&lt;/p&gt;


&lt;p&gt;Something like this is a requirement...have to be able to create files according&lt;br/&gt;
to the stripe characteristics I want to set.&lt;/p&gt;

&lt;p&gt;I&apos;d change the order of the arguments though, to match the order of open(2) with&lt;br/&gt;
the layout thing tacked onto the end.&lt;/p&gt;
</comment>
                            <comment id="65663" author="afn" created="Tue, 3 Sep 2013 22:35:43 +0000"  >
&lt;p&gt;&quot;The current error handling interface is not well put together. Get&lt;/p&gt;

&lt;p&gt;This is the biggest flaw in the current design from my perspective.&lt;br/&gt;
...&lt;/p&gt;

&lt;p&gt;In my view, the only good case for returing data via a pointer argument&lt;br/&gt;
is to get a compound data type like a struct stat. In our case we are&lt;br/&gt;
dealing with scalar integer types, and I don&apos;t forsee the need to change&lt;br/&gt;
them to compound types in the future. After all, the point of the API&lt;br/&gt;
was to abstract away the compound type struct lov_user_md behind simple&lt;br/&gt;
accessor functions. Adding a pointer to the argument list increases the&lt;br/&gt;
potential for user error and complicates the API specification without&lt;br/&gt;
any real added utility.&quot;&lt;/p&gt;



&lt;p&gt;As I understand your arguments, there are basically two reasons for&lt;br/&gt;
having the function return as its return value (i.e. not as an argument),&lt;br/&gt;
the thing it is getting:&lt;/p&gt;


&lt;p&gt;1) &quot;Adding a pointer to the argument list increases the potential for user&lt;br/&gt;
    error and complicates the API specification without any real added utility.&quot;&lt;/p&gt;


&lt;p&gt;I would rebut this by saying that in my experience this is exactly the opposite&lt;br/&gt;
of what would happen because the return&apos;d value is not overloaded with&lt;br/&gt;
two different meanings for valid/invalid result. The return&apos;d value has only&lt;br/&gt;
success/fail information, while the argument has only the gotten datum.&lt;br/&gt;
There is never any ambiguity of which is which. This is particularly true in the&lt;br/&gt;
use case you advocate: expressions--see point 2 below.&lt;/p&gt;

&lt;p&gt;As far as complicating the API, it actually simplifies it by making the get/set&lt;br/&gt;
functions be symmetric.&lt;/p&gt;



&lt;p&gt;2) People might want to use the return&apos;d value in an expression as you wrote in&lt;br/&gt;
   other email to me.&lt;/p&gt;

&lt;p&gt;In other words, you want to enable this kind of thing:&lt;/p&gt;


&lt;p&gt;bytes_in_a_full_stripe = llapi_get_stripe_count(layout)*llapi_get_stripe_size(layout);&lt;/p&gt;

&lt;p&gt;But what happens when you give the &apos;get&apos; functions an invalid layout?&lt;br/&gt;
Suddenly, the user code has this expression that uses an error return code&lt;br/&gt;
rather than a valid value to calculate something. Then they end up with some derived&lt;br/&gt;
variable with a nonsense value...say the return code is &apos;-1&apos;...then the&lt;br/&gt;
value of &apos;bytes_in_a_full_stripe&apos; possibly ends up negative, which ain&apos;t good. The&lt;br/&gt;
user code ends up going on with its life, only to train wreck at some completely&lt;br/&gt;
other place when that invalid data is used for something down the line. Check&lt;br/&gt;
the error code before going on you say? Can&apos;t do that in an expression! Only&lt;br/&gt;
in an out of line context, which completely nullifies your starting argument.&lt;br/&gt;
Better to design in the guidance to generate an applicaiton code error on the&lt;br/&gt;
spot, rather than later.&lt;/p&gt;

&lt;p&gt;And what happens when you have two or more of these functions in the same&lt;br/&gt;
expression? Even worse, if someone uses &apos;layout1&apos; in one of the functions&lt;br/&gt;
and &apos;layout2&apos; in the other and only one of them is invalid? Presumably errno&lt;br/&gt;
gets set, but to what? The error it should have for the first function in&lt;br/&gt;
the expression or for the second? (why would someone do this? Helifino, but&lt;br/&gt;
I can speculate about trying to build optimal IO parameters or something).&lt;/p&gt;

&lt;p&gt;Better to design out this use case.&lt;/p&gt;


&lt;p&gt;Also, I reiterate here my argument about how I would use the gotten data:&lt;br/&gt;
immediately load it all into my own filesystem independent structure so that&lt;br/&gt;
I can deal with all different filesystems in a portable way. This makes&lt;br/&gt;
a return value version superfluous: all the calculations are done in the&lt;br/&gt;
fs independent parts of the code, abstracted completely away from the llapi&lt;br/&gt;
functions you presume will be in those expressions.&lt;/p&gt;



&lt;p&gt;&quot;The real source of messiness is not that the return value can be&lt;br/&gt;
overloaded with error information if something goes wrong (a&lt;br/&gt;
well-established convention).&quot;&lt;/p&gt;

&lt;p&gt;Perhaps in some contexts, but I&apos;d claim your argument about &apos;using&lt;br/&gt;
the return&apos;d data in an expression&apos; points to the fact that the&lt;br/&gt;
convention is not universally valid...&lt;/p&gt;



&lt;p&gt;&quot;Rather, the problem is that the API&lt;br/&gt;
exposes the literal value (-1) of the exceptional return codes, rather&lt;br/&gt;
than abstracting them behind macros. In error conditions the API can&lt;br/&gt;
return the macro LLAPI_ERROR. Similarly, macros LLAPI_USE_ALL_OSTS and&lt;br/&gt;
LLAPI_MDT_CHOOSES_OFFSET can be use in place of -1 to indicate those&lt;br/&gt;
special cases.&quot;&lt;/p&gt;


&lt;p&gt;I don&apos;t really understand what this means. So what if the literal value&lt;br/&gt;
is provided? So what if its a macro? I don&apos;t get what implications&lt;br/&gt;
you are getting at here.&lt;/p&gt;


</comment>
                            <comment id="65886" author="nedbass" created="Thu, 5 Sep 2013 21:12:31 +0000"  >&lt;p&gt;John and Andreas, do you have a preference as to getting data via return value versus pointer argument?  Valid arguments can be made for and against either style, so I&apos;m happy go with the consensus as long as we&apos;re consistent throughout the API.&lt;/p&gt;</comment>
                            <comment id="65895" author="jhammond" created="Thu, 5 Sep 2013 23:48:32 +0000"  >&lt;p&gt;Ned,&lt;/p&gt;

&lt;p&gt;Either is fine be me as long as we&apos;re consistent and unambiguous about the error cases. I&apos;d still like to see a list of the high-level operations that are needed from this simplified API. Much of what exists in llapi is there to support the administration and testing of Lustre and is unlikely to go away or be reformed at this time. We can however develop a simplified API (in it&apos;s own header or just highlighted in some way) tailored to application developers. To do that however, we need to enumerate the high-level use case operations(e.g. get stripe count of file, get stripe size for file, get max stripe count for FS, ...). Then we questions about ioctl vs xattr, return method, ... will be easier to answer.&lt;/p&gt;

&lt;p&gt;Note also that depending on the operations and the desire for backwards compatibility, we are not necessarily restricted to using the existing ioctls and xattrs. We could instead define a ioctl with no buffer but that just returns a single quantity (stripe size, count, ...) for the file or FS. This approach would bring fewer issues about function shipping, byte swapping, .... I assume that no real scientific applications use the lmm_objects[] part of lov_mds_md_vx and so if we can make thing easier by not handling it then all the better.&lt;/p&gt;

&lt;p&gt;Operation that would seem to require ioctls may not. They can be replaced with virtual xattrs and probably other low-level implementations.&lt;/p&gt;</comment>
                            <comment id="65896" author="morrone" created="Fri, 6 Sep 2013 00:50:16 +0000"  >&lt;blockquote&gt;&lt;p&gt;We can however develop a simplified API (in it&apos;s own header or just highlighted in some way) tailored to application developers.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I think is a worthwhile discussion, but too broad in scope for this ticket.  For this ticket, we need to remain focused on the final details of the llapi_layout functions so we can get this finished by the end of the month.&lt;/p&gt;</comment>
                            <comment id="66000" author="afn" created="Sat, 7 Sep 2013 04:23:22 +0000"  >
&lt;p&gt;John,&lt;/p&gt;

&lt;p&gt;Here are some direct responses to your queries.&lt;/p&gt;


&lt;p&gt;	&#8226; Functions to determine if a file descriptor or path (FDoP) belongs to a Lustre FS.&lt;/p&gt;

&lt;p&gt;These are already available by way of fstatfs/statfs&#8230;which has an entry for filesystem &quot;super magic&quot;. Of course,&lt;br/&gt;
I wouldn&apos;t argue against a more standard/user friendly/universally implemented thing than that &lt;br/&gt;
super magic stuff (which is rather a mess if you go and google for definitive ways to query&lt;br/&gt;
filesystem type--just dive into a function called &apos;human_fstype&apos; which lives deep in the bowels&lt;br/&gt;
of the gnu core utils package if you have any doubts), but I think what exists is sufficient for my needs. &lt;/p&gt;

&lt;p&gt;	&#8226; Functions to get the stripe size and stripe count for a FDoP.&lt;/p&gt;

&lt;p&gt;Yep, and any other fs parameters, such as raid type (though currently lustre can only do raid0 of course,&lt;br/&gt;
it might grow something else in the future).&lt;/p&gt;

&lt;p&gt;	&#8226; Functions to determine the max and min stripe size and count for files on a given Lustre FS.&lt;/p&gt;

&lt;p&gt;Yep. Presumably the max for count is the filesystem max&#8230;basically the number OST&apos;s or some such?&lt;br/&gt;
This one is certainly a requirement, as I wrote above. Others are similarly important, in terms of being&lt;br/&gt;
able to determine things like &quot;what constraints do you have on values of stripe size or similar quantities,&lt;br/&gt;
that I need to be aware of when I choose my file/raid/IOsize parameters?&quot;.  Some versions of statfs contain &lt;br/&gt;
an entry for &apos;optimal transfer size&apos;, which has rather ambiguous meaning in my experience. It would be of &lt;br/&gt;
some benefit to have something of that sort that was actually useful. Typically though, it is not&lt;br/&gt;
reliable enough in terms of its meaning on different fs&apos;s to use by itself.&lt;/p&gt;


&lt;p&gt;	&#8226; Functions to create files with given stripe size and count.&lt;/p&gt;

&lt;p&gt;Yep. Also, assuming lustre ever grows something beyond raid0, something to be able to set/get&lt;br/&gt;
raid type (this exists in Ned&apos;s current draft).&lt;/p&gt;

&lt;p&gt;What would you add to this list?&lt;/p&gt;


&lt;p&gt;I&apos;d say you covered what I need pretty well. There may also be a few other sorts of functions which&lt;br/&gt;
would be useful to return things like the fs block size and similar information. These would be for&lt;br/&gt;
those trying to do O_DIRECT, which has constraints on IO sizes and alignments and such. In the&lt;br/&gt;
little bit that I&apos;ve tried O_DIRECT on lustre (mounted through some network interface as it typically&lt;br/&gt;
is), it doesn&apos;t seem to be very effective for me though, in spite of the fact that I do all of my own&lt;br/&gt;
buffering and the lack of a copy through kernel space &quot;should&quot; be superfluous. Have never followed&lt;br/&gt;
up on that, but anyway, I mention the need because someone else may have different experience.&lt;/p&gt;

&lt;p&gt;The other thing that might be of some future importance (not now), would be if there suddenly&lt;br/&gt;
got to be some other sort of storage medium, like SSDs or some such. Assuming such possibilities,&lt;br/&gt;
they are likely to have some other odd constraints that would be useful to know about in an application.&lt;br/&gt;
What I would want at that point is obviously unclear, but the point is to make sure that the api is&lt;br/&gt;
extensible in a sensible way to make it possible to get/set such things as are relevant then.&lt;/p&gt;

&lt;p&gt;Things I specifically &lt;b&gt;don&apos;t&lt;/b&gt; need or want are things that have to do with &quot;is this OST full or not&quot; and&lt;br/&gt;
&quot;start the file on OST number # and round robin from there&quot; etc. All that stuff to do with pools too.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;/p&gt;

&lt;p&gt;Andy&lt;/p&gt;</comment>
                            <comment id="70550" author="nedbass" created="Sat, 2 Nov 2013 01:27:45 +0000"  >&lt;p&gt;I&apos;ve attached manual pages for the revised API based on the discussion above. I&apos;ve combined functions in a single man page where logical, i.e. free and alloc, _get and _set pairs, etc.  Please review the new specification and provide feedback.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15471/15471_llapi_layout.txt&quot; title=&quot;llapi_layout.txt attached to LU-3840&quot;&gt;llapi_layout.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15472/15472_llapi_layout_alloc.txt&quot; title=&quot;llapi_layout_alloc.txt attached to LU-3840&quot;&gt;llapi_layout_alloc.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;^llapi_layout_by_fd.txt&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15473/15473_llapi_layout_file_create.txt&quot; title=&quot;llapi_layout_file_create.txt attached to LU-3840&quot;&gt;llapi_layout_file_create.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15475/15475_llapi_layout_ost_index_get.txt&quot; title=&quot;llapi_layout_ost_index_get.txt attached to LU-3840&quot;&gt;llapi_layout_ost_index_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15476/15476_llapi_layout_pattern_get.txt&quot; title=&quot;llapi_layout_pattern_get.txt attached to LU-3840&quot;&gt;llapi_layout_pattern_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15477/15477_llapi_layout_pool_name_get.txt&quot; title=&quot;llapi_layout_pool_name_get.txt attached to LU-3840&quot;&gt;llapi_layout_pool_name_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15478/15478_llapi_layout_stripe_count_get.txt&quot; title=&quot;llapi_layout_stripe_count_get.txt attached to LU-3840&quot;&gt;llapi_layout_stripe_count_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15479/15479_llapi_layout_stripe_size_get.txt&quot; title=&quot;llapi_layout_stripe_size_get.txt attached to LU-3840&quot;&gt;llapi_layout_stripe_size_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="70552" author="nedbass" created="Sat, 2 Nov 2013 01:39:19 +0000"  >&lt;p&gt;To highlight some of the changes from the previous version:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;_get and _set naming symmetry.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Attribute values are returned via pointer arguments, not function return&lt;br/&gt;
  values.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Added a llapi_layout_file_open() counterpart to&lt;br/&gt;
  llapi_layout_file_create().  _create() is just a wrapper to _open()&lt;br/&gt;
  with open flags O_EXCL|O_CREAT forced on.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;llapi_layout_pool_name_get now copies the pool name into a&lt;br/&gt;
  user-supplied buffer, rather than returning a pointer into the&lt;br/&gt;
  layout.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Separate man pages.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Use 64-bit types for all integer attributes.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Added macros LLAPI_USE_FS_DEFAULT and LLAPI_USE_ALL_OSTS to use in&lt;br/&gt;
  place of &quot;magic&quot; meanings of 0 and -1.  The library will internally&lt;br/&gt;
  translate these as appropriate.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Explicitly verify in llapi_layout_file_open() that &lt;tt&gt;path&lt;/tt&gt; is a Lustre&lt;br/&gt;
  file.  Same for llapi_layout_by_path().&lt;/li&gt;
&lt;/ul&gt;


&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Verify in llapi_layout_file_open() if the layout uses an OST pool that&lt;br/&gt;
  it exists and is not empty.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="70675" author="afn" created="Tue, 5 Nov 2013 00:40:00 +0000"  >&lt;p&gt;Hi Ned et al.,&lt;/p&gt;

&lt;p&gt;Here are some comments on the new API draft. Overall, this version looks &lt;br/&gt;
very, very much better than the previous version that I saw a couple of &lt;br/&gt;
months ago. Good work there, and thanks for that.&lt;/p&gt;

&lt;p&gt;Cheers,&lt;/p&gt;

&lt;p&gt;Andy&lt;/p&gt;



&lt;p&gt;1) llapi_file_create/open&lt;/p&gt;

&lt;p&gt;   I thought there was discussion to the effect that the layout argument would&lt;br/&gt;
   be moved to be the last argument in the call rather than the first. This&lt;br/&gt;
   ordering would suit me a lot better in terms of its symmetry with open(2).&lt;br/&gt;
   Then it would be exactly the same ordering, except for one more argument &lt;br/&gt;
   on the end.&lt;/p&gt;

&lt;p&gt;2) It is implied by the code shown in the example in the llapi_layout manpage, but &lt;br/&gt;
   it should also appear and be noted explicitly with some verbiage in the &lt;br/&gt;
   llapi_file_create/open man page too, that you can close a file created/opened &lt;br/&gt;
   with the llapi functions with a close(2) call.&lt;/p&gt;

&lt;p&gt;3) Point out in a few more relevant places that llapi_layout_t points to an opaque &lt;br/&gt;
   entity. For example, I found that comment in the overall llapi_layout page, but &lt;br/&gt;
   not the llapi_layout_alloc page. In this particular case, being redundant is a&lt;br/&gt;
   good thing. It short circuits the question people will ask about &quot;what is in that &lt;br/&gt;
   entity?&quot;, and the inevitable fruitless searches for that information.&lt;/p&gt;

&lt;p&gt;4) In the llapi_layout_pool_name_get/set page, it says that pools can be specified&lt;br/&gt;
   by way of lctl if you are a sysadmin. It wasn&apos;t clear whether or not pools can&lt;br/&gt;
   also be specified by this get/set pair, and if not, what the purpose of these &lt;br/&gt;
   functions is otherwise. Also, iiuc, pools can be manipulated to have arbitrary&lt;br/&gt;
   ost inventory. It is not at all clear to me how to set that, given the interface&lt;br/&gt;
   that exists...it looks like I (as a sysadmin) could only specify a pool name&lt;br/&gt;
   and that it had n OSTs in it, by way of previously calling llapi_layout_count_set&lt;br/&gt;
   on some layout that I later give to the pool function. So I don&apos;t really see&lt;br/&gt;
   how to fully use these yet. Of course, I can&apos;t see a reason for me (app guy)&lt;br/&gt;
   to use them at all, but I&apos;m commenting on behalf of some anonymous sysadmin&lt;br/&gt;
   down the road who might want to.&lt;/p&gt;

&lt;p&gt;5) I still see no function for getting the filesystem dimensions, in terms of&lt;br/&gt;
   number of OSTs etc. If I may predict your answer (that it needs ioctls and&lt;br/&gt;
   those are hard to make work everywhere), I will respond &quot;yes, I understand, but&lt;br/&gt;
   I still want this functionality when you can arrange it&quot;. For now, I suspect&lt;br/&gt;
   there is a workaround, though clumsy. Namely, to create a test file with&lt;br/&gt;
   a layout such that the number of stripes is set to LLAPI_USE_ALL_OSTS and then&lt;br/&gt;
   query the file to see how wide it is.&lt;/p&gt;
</comment>
                            <comment id="70679" author="nedbass" created="Tue, 5 Nov 2013 01:42:35 +0000"  >&lt;p&gt;Thanks for the feedback Andy.                                                   &lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;I thought there was discussion to the effect that the layout argument would be moved to be the last argument in the call rather than the first.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;You did make that suggestion, but I&apos;m not convinced it&apos;s the right way to go. Consistency within the API should take precedence over consistency with external library functions.  The &lt;tt&gt;llapi_layout_t&lt;/tt&gt; handle is the unifying element across the API, so in my view it is natural for it to be the first argument for all functions that require it.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;state&amp;#93;&lt;/span&gt; in the llapi_file_create/open man page that you can close a file created/opened with the llapi functions with a close(2) call.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;OK                                                                                 &lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;Point out in a few more relevant places that llapi_layout_t points to an opaque entity.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;OK                                                                                 &lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;... OST pools ...                                                              &lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The current wording seems to be generating a lot confusion around OST pools.  I&apos;ll try to make it more clear, but to avoid cluttering the man page I&apos;ll defer to the Lustre Operations Manual to provide the detailed background material.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;I still see no function for getting the filesystem dimensions,                 &lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I&apos;ll tackle this problem in a separate patch. &lt;/p&gt;</comment>
                            <comment id="70686" author="nedbass" created="Tue, 5 Nov 2013 02:40:48 +0000"  >&lt;p&gt;Man pages revised based on above feedback, and other minor cleanup.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15471/15471_llapi_layout.txt&quot; title=&quot;llapi_layout.txt attached to LU-3840&quot;&gt;llapi_layout.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15472/15472_llapi_layout_alloc.txt&quot; title=&quot;llapi_layout_alloc.txt attached to LU-3840&quot;&gt;llapi_layout_alloc.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;^llapi_layout_by_fd.txt&amp;#93;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15473/15473_llapi_layout_file_create.txt&quot; title=&quot;llapi_layout_file_create.txt attached to LU-3840&quot;&gt;llapi_layout_file_create.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15475/15475_llapi_layout_ost_index_get.txt&quot; title=&quot;llapi_layout_ost_index_get.txt attached to LU-3840&quot;&gt;llapi_layout_ost_index_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15476/15476_llapi_layout_pattern_get.txt&quot; title=&quot;llapi_layout_pattern_get.txt attached to LU-3840&quot;&gt;llapi_layout_pattern_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15477/15477_llapi_layout_pool_name_get.txt&quot; title=&quot;llapi_layout_pool_name_get.txt attached to LU-3840&quot;&gt;llapi_layout_pool_name_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15478/15478_llapi_layout_stripe_count_get.txt&quot; title=&quot;llapi_layout_stripe_count_get.txt attached to LU-3840&quot;&gt;llapi_layout_stripe_count_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15479/15479_llapi_layout_stripe_size_get.txt&quot; title=&quot;llapi_layout_stripe_size_get.txt attached to LU-3840&quot;&gt;llapi_layout_stripe_size_get.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="70759" author="afn" created="Tue, 5 Nov 2013 18:34:26 +0000"  >
&lt;p&gt;Hi Ned,&lt;/p&gt;

&lt;p&gt;Here is one response to our remaining point of discussion,&lt;br/&gt;
about the llapi_file_create/open argument ordering. In terms of &lt;br/&gt;
a global picture, the issue is rather small I think, so things&lt;br/&gt;
are looking pretty good there. In practice, I expect I will be &lt;br/&gt;
able to live with either of the orderings, but want to make my &lt;br/&gt;
thoughts known now, for consideration re why I still think its better &lt;br/&gt;
my way.&lt;/p&gt;

&lt;p&gt;Andy&lt;/p&gt;


&lt;p&gt;You see the ordering as better the way it is now, with the &lt;br/&gt;
layout argument first, based on API consistency and such.&lt;/p&gt;

&lt;p&gt;My argument otherwise, is basically twofold: &lt;/p&gt;

&lt;p&gt;1) There are actually two APIs that have to be considered here. One&lt;br/&gt;
   is the LLAPI semantics (layout first) in which llape_file_create/open&lt;br/&gt;
   are directly a part, and the other is the system API, which &lt;br/&gt;
   you see as an external library.&lt;/p&gt;

&lt;p&gt;   More than any other functions, the open/create functions &lt;br/&gt;
   can be thought of as a bridge between those two APIs--you can&apos;t use&lt;br/&gt;
   any of the other system API calls (read/write/seek/close/etc) without&lt;br/&gt;
   the llapi_file_open/create calls and, in that sense, they play a&lt;br/&gt;
   substitute role for the open(2) system call. That makes the question &lt;br/&gt;
   of their ordering a bit less clear than &quot;I&apos;m being consistent with the &lt;br/&gt;
   API&quot;. The question has to be a bit broader in the sense of &quot;which API &lt;br/&gt;
   should I retain most consistency with?&quot; The system API of which the&lt;br/&gt;
   open call is a part and which the llapi overrides with its own&lt;br/&gt;
   routines, or the rest of the llapi functions.&lt;/p&gt;

&lt;p&gt;   My vote is still with the system API, largely to strengthen the &lt;br/&gt;
   bridge and for this additional reason:&lt;/p&gt;


&lt;p&gt;2) In my mind, the layout argument plays a very similar conceptual role&lt;br/&gt;
   to the &apos;mode&apos; argument. The latter sets various file permission&lt;br/&gt;
   characteristics. The former sets various filesystem characteristics.&lt;br/&gt;
   In words and thinking as I&apos;m reading the function itself, the argument &lt;br/&gt;
   ordering then becomes&lt;/p&gt;

&lt;p&gt;      1)open &apos;filename&apos;&lt;br/&gt;
      2)with conditions on my access (read/write/whatever) given by &apos;flags&apos;,&lt;br/&gt;
      3)and with permissions for my and others&apos; future access given by &apos;mode&apos;&lt;br/&gt;
      4)and with filesystem characteristics given by &apos;layout&apos;&lt;/p&gt;

&lt;p&gt;   Basically, the &apos;open&apos; and the &apos;filename&apos; strings in the code,&lt;br/&gt;
   appearing adjacent to each other gives me a very big readability&lt;br/&gt;
   lift as I go along, about what I&apos;m doing. In terms of pseudo-English &lt;br/&gt;
   grammar, it looks like this to me&lt;/p&gt;

&lt;p&gt;      do (what) to &amp;lt;this&amp;gt; with &amp;lt;conditions&amp;gt;&lt;/p&gt;


&lt;p&gt;   Separating &apos;do what&apos; from &amp;lt;this&amp;gt;, breaks that grammar and readability &lt;br/&gt;
   connection for me. More specifically, I read the layout first &lt;br/&gt;
   ordering that you prefer, as &lt;/p&gt;

&lt;p&gt;      do (what) with &amp;lt;conditions&amp;gt; to &amp;lt;this&amp;gt; with &amp;lt;more conditions&amp;gt;&lt;/p&gt;


&lt;p&gt;   That isn&apos;t how I&apos;d say it in words--its grammatically more complex.&lt;/p&gt;

&lt;p&gt;   The get/set functions have a rather different sentence for me,&lt;br/&gt;
   that does feel right with the layour argument first in mind. &lt;br/&gt;
   Basically this sentence:&lt;/p&gt;

&lt;p&gt;   For &amp;lt;this&amp;gt; tell me about &amp;lt;that&amp;gt; characteristic&lt;br/&gt;
   or&lt;br/&gt;
   For &amp;lt;this&amp;gt; specify &amp;lt;that&amp;gt; characteristic&lt;/p&gt;
</comment>
                            <comment id="70768" author="nedbass" created="Tue, 5 Nov 2013 19:47:03 +0000"  >&lt;p&gt;Andy, I can see you point so I&apos;ve made the change:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15473/15473_llapi_layout_file_create.txt&quot; title=&quot;llapi_layout_file_create.txt attached to LU-3840&quot;&gt;llapi_layout_file_create.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="70771" author="afn" created="Tue, 5 Nov 2013 20:02:21 +0000"  >
&lt;p&gt;with that change, and in the sense of the lkml patch submission process,&lt;br/&gt;
I will say for the API as I&apos;ve seen in in these man page patches:&lt;/p&gt;


&lt;p&gt;Reviewed-By: Andy Nelson &amp;lt;andy.nelson@lanl.gov&amp;gt;&lt;/p&gt;

&lt;p&gt;When can I expect to see this implemented on sequoia? &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;

&lt;p&gt;Andy&lt;/p&gt;</comment>
                            <comment id="70801" author="nedbass" created="Wed, 6 Nov 2013 00:17:02 +0000"  >&lt;p&gt;Andy, a compute node build is now installed on vulcan, rzuseq, and sequoia, under /usr/local/tools/liblustre.  We haven&apos;t updated the packages installed on the LAC nodes yet, so you&apos;ll have to refer to the man pages attached to this issue for now.&lt;/p&gt;

&lt;p&gt;On vulcan, I was able to build and run the example program from the llapi_layout man page like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ mpicc -Wl,-rpath=/usr/local/tools/liblustre/lib -L/usr/local/tools/liblustre/lib -I/usr/local/tools/liblustre/include -llustreapi -dynamic -o liblustre_test liblustre_test.c
$ srun -p pdebug -N 1 ./liblustre_test /p/lscratchv/`whoami`/`mktemp -u  XXXX`
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let&apos;s take further discussion unrelated to API design to an email thread, if needed.&lt;/p&gt;</comment>
                            <comment id="71970" author="nedbass" created="Wed, 20 Nov 2013 17:37:56 +0000"  >&lt;p&gt;Andy, you may want to weigh in here.  Andreas made this comment in the patch review system about &lt;tt&gt;llapi_layout_file_create()&lt;/tt&gt; semantics.  Do you have a preference regarding the current behavior versus what Andreas proposes?&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;       /**
        * Create a file with the specified \a layout with the name \a path
	* using permissions in \a mode and open() \a flags.  Return an open
	* file descriptor for the new file.
	*/
	int llapi_layout_file_create(const char *path, int flags, int mode,
                                     const llapi_layout_t *layout)

This is inconsistent with the llapi_file_create() call, which just creates the
file but does not actually return the open file handle.

I don&apos;t think it makes sense to have this extra call just to avoid the
application needing to pass O_CREAT|O_EXCL to llapi_layout_file_open(),
but it is useful to have a version that does not return the open file handle.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="71976" author="afn" created="Wed, 20 Nov 2013 18:14:52 +0000"  >
&lt;p&gt;Hi Ned,&lt;/p&gt;

&lt;p&gt;I&apos;ve gotten sidetracked on other things, so haven&apos;t fully implemented the new&lt;br/&gt;
api in my code yet, so that&apos;s why I went dark all of a sudden. &lt;/p&gt;

&lt;p&gt;As to Andreas&apos;s comments:&lt;/p&gt;

&lt;p&gt;I cannot figure out any reason why I might want to &quot;have a version that does not return the open&lt;br/&gt;
file handle&quot; as Andreas writes. What use would that be, since I can&apos;t do anything with it, afaict?&lt;br/&gt;
I never even understood whether or not it even created the file I asked for, from what I could&lt;br/&gt;
make of the documentation etc in the old API. &lt;/p&gt;

&lt;p&gt;Panasas has a similar thing in their API, which I also never understood. It is just needless&lt;br/&gt;
obfuscation and hoop jumping from where I sit, but I suppose I could be missing a part of the picture.&lt;/p&gt;

&lt;p&gt;As to having both llapi_layout_file_create/open, I like having both. That way makes it more nearly &lt;br/&gt;
symmetric with the open/creat system calls. If there is an unstoppable effort to delete&lt;br/&gt;
one or the other (llapi_layout_file_create/llapi_layout_file_open), I&apos;d actually go exactly&lt;br/&gt;
the other way, and zap the &apos;open&apos; version. The reason is that it makes no sense to pass layout&lt;br/&gt;
stuff to an already existing file, which is the thing I would use the open version for. In fact&lt;br/&gt;
those arguments are documented to be ignored except for create, if I remember correctly. And &lt;br/&gt;
without the creation characteristics, why wouldn&apos;t I simply use a plain old &apos;open&apos; call anyway?&lt;/p&gt;

&lt;p&gt;As to the open(2)/creat(2) calls, I just noticed another small skew between those calls and the lustre&lt;br/&gt;
ones...the llapi_layout_file_create call adds O_CREAT|O_EXCL to the flags and then goes on&lt;br/&gt;
to pass those in to the llapi open function, while the creat(2) call adds O_CREAT|O_WRONLY|O_TRUNC, &lt;br/&gt;
as documented in the open(2) man page I&apos;m looking at on a linux box here. Why the difference? &lt;br/&gt;
Just looking at that, I can&apos;t see a specific reason why that skew makes sense, but I could &lt;br/&gt;
easily be missing something. Something to fix? No? hmmm...&lt;/p&gt;

&lt;p&gt;All that said, again I&apos;ll say that it makes more sense, and I prefer from a &apos;fullup API&apos; &lt;br/&gt;
standpoint, to keep both. &lt;/p&gt;

&lt;p&gt;As for looking the same as the old lustre api...the more different the better. I&apos;d claim having two &lt;br/&gt;
APIs with almost the same appearance, but which are still quite different in actuality, is a very bad &lt;br/&gt;
thing. It leads to confusion. It leads to bother and it leads to maintenance headaches for me downstream. &lt;br/&gt;
The sooner you get rid of the old API, after instituting the new one, the better it will be for me.&lt;br/&gt;
I won&apos;t have to maintain two different code interfaces to it, to account for &apos;this machine has the&lt;br/&gt;
new thing, but that machine is still on the old&apos;, for nearly as long as if you keep them both&lt;br/&gt;
around for a long time.&lt;/p&gt;
</comment>
                            <comment id="71978" author="nedbass" created="Wed, 20 Nov 2013 18:36:15 +0000"  >&lt;p&gt;Thanks Andy.  We have some discrepancies between what you think is useful versus what the patch reviewers at Intel think.  I provided both open() and create() interfaces to try to appease both camps.  I figured what you wanted was a guarantee that your file was created with the requested layout on a successful function return.  That&apos;s why I chose those particular open() flags, as compared to creat().  O_EXCL because, as you say, the function doesn&apos;t make sense for existing files.  Not O_WRONLY because I figured you might want the flexibility to do reads, but I&apos;m happy to add it if you think consistency with creat() is more important.  Not O_TRUNC because it doesn&apos;t make sense to combine with O_EXCL.&lt;/p&gt;

&lt;p&gt;On the other hand, some patch reviewers took exception to forcing those flags on in &lt;tt&gt;llapi_layout_file_open()&lt;/tt&gt;, preferring to leave it up to the application.  So, I settled on the providing both versions, which seemed like a reasonable middle ground.&lt;/p&gt;</comment>
                            <comment id="71988" author="afn" created="Wed, 20 Nov 2013 20:09:53 +0000"  >
&lt;p&gt;Seems right to me. I expect the create function to create the file for me with the layout I specified.&lt;br/&gt;
If it can&apos;t do that, it should return an error. One such error would be if the file already exists&lt;br/&gt;
because it will have some preexisting layout which may or may not be the one I wanted to specify. In&lt;br/&gt;
that case, it is entirely correct, in my mind, to return a &quot;Can&apos;t do that&quot; error. Perhaps this is&lt;br/&gt;
a point to clarify in the man page, re the create behavior...that it will fail if the file exists, because&lt;br/&gt;
the layout stuff can&apos;t be guaranteed. It is there implicitly because you note the O_EXCL flag, but&lt;br/&gt;
that doesn&apos;t hit me over the head quite so hard as is needed compared to an explicit, English text &lt;br/&gt;
statement of the same thing too.&lt;/p&gt;

&lt;p&gt;As to the O_WRONLY and O_TRUNC etc, I have no strong opinions about that. Just noticed it when I was&lt;br/&gt;
reading the man pages very closely as I put together my response above. You might perhaps want to&lt;br/&gt;
put in a small note in the man page about the fact that the flags added and passed from the llapi &lt;br/&gt;
create to the llapi open calls are slightly different than what are passed from the creat(2) to &lt;br/&gt;
the open(2) call, just so that the skew is explicitly noted, but even that may be more than is &lt;br/&gt;
really needed or important to have in the man page.&lt;/p&gt;</comment>
                            <comment id="71993" author="nedbass" created="Wed, 20 Nov 2013 20:32:50 +0000"  >&lt;p&gt;Andy, also please be aware the names of the macros LLAPI_USE_FS_DEFAULT and LLAPI_USE_ALL_OSTS may change to LLAPI_LAYOUT_USE_FS_DEFAULT and LLAPI_LAYOUT_USE_ALL_OSTS.  I left out LAYOUT to keep the names reasonably short, but Andreas requested the longer names for better consistency.&lt;/p&gt;

&lt;p&gt;Andreas, are you willing to keep the llapi_layout_file_create() interface as-is, in light of Andy&apos;s comments?  What use case do you have in mind for a version that doesn&apos;t return an open file handle?&lt;/p&gt;</comment>
                            <comment id="89175" author="adilger" created="Wed, 16 Jul 2014 00:41:20 +0000"  >&lt;p&gt;A concern I have about the current patch that James thought would be better discussed in this ticket is the amount of layout validity checking that is part of the API.  While some basic sanity checking is desirable in any API, it seems to me that there is now an excessive amount of sanity checking in userspace, which all needs to be replicated in the kernel anyway (because the kernel cannot depend on userspace to be correct and only populated with non-malicious users).&lt;/p&gt;

&lt;p&gt;A number of the sanity checks are &quot;expensive&quot; in that they open external /proc files and iterate over e.g. all mounted filesystems to check if the pathname belongs to a Lustre filesystem, or all OSTs to verify the OST pool specification, which can impact performance when creating millions of files or concurrently by hundreds of threads on a system with thousands of OSTs.&lt;/p&gt;

&lt;p&gt;My proposal is to have a separate function like &lt;tt&gt;llapi_layout_verify()&lt;/tt&gt; or similar that can be called once to verify a layout after it has been created, and then the layout can be used repeatedly to create files.  It could (optionally) store a flag in the opaque layout structure to indicate if the layout has been verified and this could be done automatically on the first use of the layout, and/or set by the implicit verification done by the kernel returning success.&lt;/p&gt;

&lt;p&gt;I think this handles both the desire for applications to get detailed error feedback if necessary, while not adding permanent runtime overhead to a commonly-used function.  Applications which only care about success/failure can omit all checking and just get an error back from the kernel.&lt;/p&gt;</comment>
                            <comment id="89176" author="nedbass" created="Wed, 16 Jul 2014 01:21:32 +0000"  >&lt;p&gt;I think this is a reasonable approach. Regarding a &quot;verified&quot; flag, we&apos;d also need to store which filesystem(s) the layout was verified against. I don&apos;t see enough utility in saving the verification status to justify the added complexity, however.  I say let the application track verification status.&lt;/p&gt;

&lt;p&gt;I also considered using a caching strategy to mitigate the validity checking overhead.  The library could remember Lustre mount points, pool membership, OST lists, etc.  But I don&apos;t care for the complexity of that approach either.&lt;/p&gt;</comment>
                            <comment id="89202" author="simmonsja" created="Wed, 16 Jul 2014 14:18:20 +0000"  >&lt;p&gt;Should it be &lt;/p&gt;

&lt;p&gt;int llapi_layout_verfiy(const struct llapi_layout *layout, char *fsname)&lt;/p&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;p&gt;int llapi_layout_verfiy(const struct llapi_layout *layout, char *path)&lt;/p&gt;

&lt;p&gt;I personally tend to favor the fsname. approach. Once verified against a file system that layout can be used for any path.&lt;/p&gt;

&lt;p&gt;The other option is to change llapi_layout_create to have fsname has a option in creation. Of course the draw back is some layouts are created with only using the fd .Also we want to expose the validation function.&lt;/p&gt;</comment>
                            <comment id="89214" author="nedbass" created="Wed, 16 Jul 2014 14:50:57 +0000"  >&lt;p&gt;It should take a path argument. Users should not have to bother figuring out the fsname from a path.  They will just want to know if the layout is valid for the directory they&apos;re writing to.&lt;/p&gt;</comment>
                            <comment id="89240" author="afn" created="Wed, 16 Jul 2014 16:28:41 +0000"  >
&lt;p&gt;I agree with Ned. I seem to recall times in my past when the name of the filesystem was obscured in some odd way, so that I could&lt;br/&gt;
never figure out what the actual, definitive name for the filesystem was. Things like this:&lt;/p&gt;

&lt;p&gt;/scratch8/afn/blah&lt;/p&gt;

&lt;p&gt;being an ailas for something like this:&lt;/p&gt;

&lt;p&gt;/lustre/scratch8/afn/blah&lt;/p&gt;

&lt;p&gt;and so forth, done by the systems folk, always confuse me.  They will confuse others too, I am sure.&lt;/p&gt;

&lt;p&gt;As far as the create function taking an fsname argument, I don&apos;t like that for exactly the same reason. Add to that,&lt;br/&gt;
also the fact that the call loses its symmetry with the normal system file create call. Search above for my comment&lt;br/&gt;
of 6:34 Nov 5 2013, for what I mean about symmetry stuff.&lt;/p&gt;

&lt;p&gt;All that said, putting a &quot;verified&quot; flag into the opaque layout structure seems a good idea with one big flaw. Namely,&lt;br/&gt;
that that structure, opaque or not, is still in the hands of a user, who may be malicious, or worse, inept. The latter&lt;br/&gt;
adjective applies to me, quite often. There are frequently times when I pass around some pointer and end up scribbling on&lt;br/&gt;
its data because of some call/callee error in dereferencing or some such.&lt;/p&gt;

&lt;p&gt;In consequence, the layout structure can get corrupted in whole or in part, and its the &apos;in part&apos; portion that is&lt;br/&gt;
the biggest problem. What happens when you have a partially corrupted layout, which still has the &apos;verified&apos;&lt;br/&gt;
portion of it, uncorrupted? Then llapi goes ahead with the incorrect idea that the layout is valid, while it isn&apos;t.&lt;/p&gt;

&lt;p&gt;Splat.&lt;/p&gt;

&lt;p&gt;Or it has to go do all the verification over again, which is what the whole &quot;verified&quot; flag is supposed to short circuit.&lt;/p&gt;

&lt;p&gt;Given that, is it a better option to have the layout thingy (whatever its form), be some sort of entity that is more&lt;br/&gt;
like a file descriptor, that indexes into a list of layouts that are kept somewhere in kernel/library space?  I know I &lt;br/&gt;
commented about this in previous iterations of this discussion (search above somewhere), but things went a &lt;br/&gt;
different way at the time. &lt;/p&gt;

&lt;p&gt;Perhaps a solution would be to make the &quot;verified&quot; flag some sort of checksum of the rest of the struct and if&lt;br/&gt;
the checksum is invalid when given to the open/create/whatever/other/call, then revalidate with all the costs&lt;br/&gt;
involved?&lt;/p&gt;

&lt;p&gt;Another issue with using a path as the argument is probably the reason James likes the fsname alternative. Namely,&lt;br/&gt;
the question of what the &quot;validity&quot; applies to. Just that path alone? Everything under it in the directory tree? For example,&lt;br/&gt;
the path is /lustre/afn/blah, and it applies to /lustre/afn/blah/all/the/directories/and/files/under/it? Just to /lustre/afn/blah? &lt;br/&gt;
The option of &quot;just that path&quot; has simplicity, because of the complex possibility of one layout at one level of the tree, and another&lt;br/&gt;
layout specified for some subset of directories under it. But then what about &quot;just that path&quot; applying to a file? there is&lt;br/&gt;
no under, and when you have such a case, perhaps the thing you want to be doing to it is to take that layout and&lt;br/&gt;
replicate it on some other file you create somewhere, where it would not apply due to invalid verification status.&lt;/p&gt;</comment>
                            <comment id="89246" author="simmonsja" created="Wed, 16 Jul 2014 17:33:22 +0000"  >&lt;p&gt;Okay my local patch uses a path argument :&lt;/p&gt;

&lt;p&gt; int llapi_layout_verify(const struct llapi_layout *layout, const char *path)&lt;/p&gt;

&lt;p&gt;For llapi_file_open what I&apos;m doing currently is &lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;{
       /* If the user verifed the layout we still need to ensure that the
         * requested file is located on the same lustre file system as the
         * layout. */
        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (strlen(layout-&amp;gt;llot_fsname) != 0) {
                &lt;span class=&quot;code-comment&quot;&gt;/* Verify that the file path belongs to a lustre filesystem. */&lt;/span&gt;
                rc = llapi_search_fsname(path, fsname);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc &amp;lt; 0 || (strlen(fsname) == 0)) {
                        errno = ENOTTY;
                        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
                }

                /* Verify the file system of the user supplied path is the 
                 * same one the layout was created &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt;. */
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (strcmp(fsname, layout-&amp;gt;llot_fsname)) {
                        errno = ENOTTY;
                        &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
                }
        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (llapi_layout_verify(layout, path) &amp;lt; 0)
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So it forces a verify if it doesn&apos;t see llot_fsname. If verified we still need to make sure that the file is located on the same file system. A subdirectory could be another file system mount. So I handle my fsname paranoia.&lt;/p&gt;

&lt;p&gt;As for corruption, well that is harder to deal with. For checksum we have to write a check sum algothrim since userland libcfs is going away. Also with that small amount of data their is greater chance of checksum value collisons. Maybe we could get away with a hashtable. Again with libcfs going away that might be more challanging. Have to see if we can use just the header. Currently struct llapi_layout is opaque but I guess nothing stops the user from casting and scribbling. &lt;/p&gt;</comment>
                            <comment id="89249" author="jhammond" created="Wed, 16 Jul 2014 17:55:38 +0000"  >&lt;p&gt;What is the use case of llapi_layout_verfiy()?&lt;/p&gt;</comment>
                            <comment id="89253" author="nedbass" created="Wed, 16 Jul 2014 18:03:27 +0000"  >&lt;p&gt;At some point I added a &lt;tt&gt;llot_magic&lt;/tt&gt; canary field to use as a minimal sanity check.  But I somehow lost that change in all the refreshes.  I think we should add a magic field, but I&apos;m opposed to further measures such as checksums to detect or prevent corruption.  The application is ultimately responsible for not trashing its memory.&lt;/p&gt;</comment>
                            <comment id="89256" author="simmonsja" created="Wed, 16 Jul 2014 18:10:25 +0000"  >&lt;p&gt;The llot_magic is still there. Users will always find a way to break things :-/&lt;/p&gt;</comment>
                            <comment id="89261" author="nedbass" created="Wed, 16 Jul 2014 18:53:50 +0000"  >&lt;p&gt;I have to voice a concern that lack of a formal design process is threatening to derail this API.  There is too much implementation going on with too little understanding of requirements, as John hinted at. There should be one architect/implementer to preserve unity of design and conceptual integrity, with others providing feedback, requirements, and code review.  New public interfaces such as &lt;tt&gt;llapi_layout_verify()&lt;/tt&gt; should be fully documented in man page format and reviewed here before an implementation is submitted for review in gerrit.&lt;/p&gt;

&lt;p&gt;So, let&apos;s precisely nail down exactly what our validity checking requirements are before charging ahead with implementation.  We need concrete answers to at least:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Is the return status of fsetxattr() a sufficient indicator of validity?&lt;/li&gt;
	&lt;li&gt;If supplemental validity checking is needed (beyond fsetxattr()), how much control over that checking needs to be exposed through the API?&lt;/li&gt;
	&lt;li&gt;What specific error conditions need to be communicated?&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="89285" author="jhammond" created="Wed, 16 Jul 2014 20:17:17 +0000"  >&lt;p&gt;Let me rephrase. What are the use cases of validity checking in user space? I&apos;m in favor of good interfaces that answer specific questions like: &quot;How many stripes can I have?&quot; or &quot;What are the minimum and maximum stripe sizes?&quot; These are easy to answer.&lt;/p&gt;

&lt;p&gt;On the other hand, offering an API to answer &quot;Is this striping valid?&quot; makes me a bit uncomfortable. It&apos;s a bit like being asked &quot;Where babies come from?&quot; by someone else&apos;s kid. There are too many details to ensure that striping that passes this function will be always be accepted and created by Lustre.&lt;/p&gt;

&lt;p&gt;Setting implementation aside, how do I use such a function? Do I create various hypothetical striping and pass them to verify? This seems like a parameter search to answer the questions from the first paragraph.&lt;/p&gt;</comment>
                            <comment id="89305" author="nedbass" created="Wed, 16 Jul 2014 22:04:13 +0000"  >&lt;p&gt;Yes.  John&apos;s comments cut to the heart of the matter are in line with what Andy has been asking for.  The ultimate test of a layout&apos;s validity is whether Lustre accepts and creates it. But the API should provide interfaces that allow the user to determine valid layout values.&lt;/p&gt;

&lt;p&gt;At this point I think &lt;a href=&quot;http://review.whamcloud.com/#/c/5302/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/5302/&lt;/a&gt; should be abandoned and replaced with a new change ID based on patch set 23.  That was the last stable point in the revision series (aside from errno stomping bugs caught by Frank), and there has been too much churn since then for me to review the changes with any confidence.  Future revisions should be based strictly on design changes agreed to here, and avoid unnecessary refactoring of code that has already been reviewed, tested, and debugged.&lt;/p&gt;

&lt;p&gt;For now, I respectfully request to be the only person to push changes to the review. Others are of course welcome to contribute dependent patches as separate reviews. I&apos;m grateful that James took the initiative to move this forward, but I think it&apos;s important for consistency&apos;s sake to just have one chef in the kitchen.  Please communicate any new requirements for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4665&quot; title=&quot;utils: lfs setstripe to specify OSTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4665&quot;&gt;&lt;del&gt;LU-4665&lt;/del&gt;&lt;/a&gt; integration here so that work can move forward.&lt;/p&gt;

&lt;p&gt;To summarize, I propose that we do the following, in order.&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Abandon change 5302&lt;/li&gt;
	&lt;li&gt;Submit a new gerrit review based on 5302 patch set 23&lt;/li&gt;
	&lt;li&gt;Identify the questions we need the API to answer as suggested by John&lt;/li&gt;
	&lt;li&gt;Post draft man pages for new interfaces for review here, revise as needed&lt;/li&gt;
	&lt;li&gt;Refresh the patch with implementation of new interfaces, along with complete test cases and documentation&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Are others on board with this plan?&lt;/p&gt;</comment>
                            <comment id="89308" author="simmonsja" created="Wed, 16 Jul 2014 22:42:50 +0000"  >&lt;p&gt;Okay. I will take all the changes I did in 5302 and place it in the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4665&quot; title=&quot;utils: lfs setstripe to specify OSTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4665&quot;&gt;&lt;del&gt;LU-4665&lt;/del&gt;&lt;/a&gt; patch. Andreas we will need to submit a different patch on top of Ned&apos;s new patch with your idea of a llapi_layout_verify. That is assuming people will accept your idea. Ned new base patch will be fine since it will not be the back end of anything. Further testing of the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4665&quot; title=&quot;utils: lfs setstripe to specify OSTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4665&quot;&gt;&lt;del&gt;LU-4665&lt;/del&gt;&lt;/a&gt;, which will use the layout api with lfs getstripe and setstripe, on my part will expose any problems with Ned&apos;s design. Plus the new base patch will not handle the case of DNE directories so that work will have to be developed as well. Much work to be done.&lt;/p&gt;

&lt;p&gt;I promise to break the work I do in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4665&quot; title=&quot;utils: lfs setstripe to specify OSTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4665&quot;&gt;&lt;del&gt;LU-4665&lt;/del&gt;&lt;/a&gt; into small incremental patch so people can properly review them.&lt;/p&gt;

&lt;p&gt;Perhaps Ned you could consider breaking your patch up into smaller pieces. It is a lot of work.&lt;/p&gt;</comment>
                            <comment id="89312" author="nedbass" created="Wed, 16 Jul 2014 23:29:41 +0000"  >&lt;p&gt;That is one reason I want to start over from patch set 23.  It was pretty thoroughly reviewed at that stage so I&apos;d like to see it land with only minor changes and do follow-up work in separate patches.  I don&apos;t see much benefit to breaking up what&apos;s already been reviewed though.  The main obstacle to landing it is the expensive validity checks that Andreas objects to. But those checks can simply be removed for now, and replaced with improved interfaces as discussed above in subsequent patches.  How would that sit with you, Andreas?&lt;/p&gt;</comment>
                            <comment id="89331" author="adilger" created="Thu, 17 Jul 2014 08:23:58 +0000"  >&lt;p&gt;I don&apos;t see any benefit to abandoning 5302and creating a new change over just pushing a new patch which drops the changes you don&apos;t want. Making a new change just means more places to look for information about this change. &lt;/p&gt;

&lt;p&gt;As for checks, I agree with John. Simple checks against hard (constant) limits are fine, but I&apos;d prefer to do any complex checks (e.g. opening /proc files and iterating) in a separate function.&lt;/p&gt;

&lt;p&gt;The kernel has to do all of these checks itself anyway. The main drawback is that the xattr interface that is needed for BG/L is clumsy because it might &quot;succeed&quot; on any filesystem with xattr support, but not actually create the striped file as expected. The ioctl() interface was less troublesome in this regard since it is unlikely that any filesystem would handle the Lustre ioctl command.  It is worthwhile to verify if setxattr(&quot;lustre.lov&quot;) will work on other filesystems or if they will refuse the &quot;lustre.lov&quot; xattr because it is not in one if the normal namespaces (&quot;user&quot;, &quot;system&quot;, &quot;trusted&quot;, or &quot;security&quot;) that are handled by the kernel. &lt;/p&gt;</comment>
                            <comment id="89343" author="nedbass" created="Thu, 17 Jul 2014 15:02:51 +0000"  >&lt;blockquote&gt;&lt;p&gt;I don&apos;t see any benefit to abandoning 5302&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;We can keep working there.  I just find it becomes hard to navigate when the comment and revision history gets too long.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;the xattr interface ... might &quot;succeed&quot; on any filesystem with xattr support, but not actually create the striped file as expected.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I have not found that to be the case in practice (ext4, zfs, tmpfs, and nfs).  I believe (but haven&apos;t verified in the code) that a filesystem must explicitly register support for an xattr namespace beyond the standard ones, otherwise the kernel will return EOPNOTSUP.&lt;/p&gt;</comment>
                            <comment id="90732" author="nedbass" created="Mon, 4 Aug 2014 18:20:07 +0000"  >&lt;p&gt;Updated attached man pages to reflect recent API changes.  In particular&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;removed &lt;tt&gt;llapi_layout_expected()&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;&lt;tt&gt;llapi_layout_by_{fd,fid,path}()&lt;/tt&gt; renamed to &lt;tt&gt;llapi_layout_get_by_{fd,fid,path}()&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;Added &lt;tt&gt;flags&lt;/tt&gt; parameter to &lt;tt&gt;llapi_layout_get_by_{fd,fid,path}()&lt;/tt&gt;&lt;/li&gt;
	&lt;li&gt;Implemented flag &lt;tt&gt;LAYOUT_GET_EXPECTED&lt;/tt&gt; which is accepted by &lt;tt&gt;llapi_layout_get_by_path()&lt;/tt&gt; to implement functionality formerly provided by &lt;tt&gt;llapi_layout_expected()&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Please review the documentation of the new &lt;tt&gt;flags&lt;/tt&gt; parameter in &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/15480/15480_llapi_layout_get_by_fd.txt&quot; title=&quot;llapi_layout_get_by_fd.txt attached to LU-3840&quot;&gt;llapi_layout_get_by_fd.txt&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;.&lt;/p&gt;</comment>
                            <comment id="92663" author="adilger" created="Wed, 27 Aug 2014 22:11:29 +0000"  >&lt;p&gt;One thing that snuck past my review in these patches was that the new llapi_layout_*() functions are all returning &quot;-1&quot; to the caller and returning the error codes via &quot;errno&quot; instead of returning the negative error numbers directly to the callers.  IMHO, &quot;errno&quot; is the domain of the kernel and libc and should not be used by application libraries.  This is a global variable that could be touched by many parts of the process, and there is the danger that errno gets clobbered by other parts of the code, leading to ugliness like:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (lum == NULL) {
                tmp = errno;
                close(fd);
                errno = tmp;
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;and every piece of code that is returning an error having to do it twice:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (path == NULL ||
            (layout != NULL &amp;amp;&amp;amp; layout-&amp;gt;llot_magic != LLAPI_LAYOUT_MAGIC)) {
                errno = EINVAL;
                &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; -1;
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;My preference would be to fix the new llapi_layout_*() functions to return the negative error number directly and avoid errno entirely.&lt;/p&gt;</comment>
                            <comment id="92664" author="nedbass" created="Wed, 27 Aug 2014 22:49:14 +0000"  >&lt;p&gt;I&apos;m on board with this.  I just had a hallway discussion about the various approaches for returning errors, and negative errno return values were generally agreed to be the least evil of the following possibilities:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;b&gt;Use &lt;tt&gt;errno&lt;/tt&gt;.&lt;/b&gt;&lt;br/&gt;
This is the current approach.  It&apos;s evil for the reasons given above by Andreas.&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;Use a library-specific version of &lt;tt&gt;errno&lt;/tt&gt;.&lt;/b&gt;&lt;br/&gt;
 e.g. &lt;tt&gt;llapi_errno&lt;/tt&gt;. It wouldn&apos;t get stomped on, but we&apos;d have to handle thread safety.  Ick.&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;Implement our own class of error codes.&lt;/b&gt;&lt;br/&gt;
This might be cleaner and more flexibile, but with a higher implementation and maintenance cost, plus UNIX programmers will be already familiar &lt;tt&gt;errno&lt;/tt&gt; values.&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;Return negated &lt;tt&gt;errno&lt;/tt&gt; values.&lt;/b&gt;&lt;br/&gt;
This is the proposed approach. The only real downside is there&apos;s little precedent outside the kernel and &lt;tt&gt;llapi&lt;/tt&gt;.  But it&apos;s thread safe and cleaner than the current approach.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="92671" author="morrone" created="Thu, 28 Aug 2014 01:01:59 +0000"  >&lt;blockquote&gt;&lt;p&gt;IMHO, &quot;errno&quot; is the domain of the kernel and libc and should not be used by application libraries.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I think that the lustre library should be considered a &lt;em&gt;system&lt;/em&gt; library, not an &quot;application library&quot;.  From a normal application&apos;s perspective, the lustre library is as system as they come: you are using library the interacts on your behalf directly with the kernel to influence a service offered by the kernel.  In that respect I would argue that &lt;tt&gt;errno&lt;/tt&gt; is entirely reasonable to use.&lt;/p&gt;

&lt;p&gt;I would argue that this kind of error handling is exactly what user-space C developers have come to expect from system level libraries.  After all, if it isn&apos;t OK for use to use errno, is it really OK for us to reuse all of the standard error codes (EIO, EINVAL, etc.)?  Shouldn&apos;t we have to invent our own error names and values if those things are only the purview of the kernel and Lib C?&lt;/p&gt;

&lt;p&gt;Granted, using temporary variable to implement the use of error is mildly annoying.  But is that really enough justification to violate the principle of least surprise for the user-space developers who will be consuming our library functions?&lt;/p&gt;

&lt;p&gt;I think that the &quot;Return negated errno values&quot; approach is probably the least desirable of those proposed.  This is a kernel-ism; the result of a clever hack that recognized that those memory values would never be valid so hey why not throw the error value in there.  In user space, the programmers are going to think we have lost our minds if we force them to check for an negative version of an error code that is always positive everywhere else.  At the very least, we would need to create macros or functions that the users would need to use to check the return code and another to translate the error code into the correct value.  We would be shifting the annoying error code shuffling from the library writer to all of the library consumers.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;errno&lt;/tt&gt; is defined by the C language standard.  Most, if not all, of the values of &lt;tt&gt;errno&lt;/tt&gt; that we use (EACCESS, EAGAIN, EIO, EISDIR, etc.) are defined by POSIX.1-2001 or C99, not inventions of the Linux kernel.&lt;/p&gt;

&lt;p&gt;If we use the same values as errno, users are going to want to use standard functions like perror() that assume the use of errno.  Granted, strerror() also exists, but it is more difficult to use than perror().&lt;/p&gt;

&lt;p&gt;It is already difficult to get users to check error codes, so I think it is important to keep things simple when reasonable to do so.&lt;/p&gt;</comment>
                            <comment id="92690" author="adilger" created="Thu, 28 Aug 2014 07:16:12 +0000"  >&lt;p&gt;If there is an insistence on setting errno to return errors, it would still be possible to also return the negative errno instead of &quot;-1&quot; all the time.  I&apos;m not suggesting to return PTR_ERR() instead of NULL in case of memory allocation failures, as I agree that this is not very common for userspace programs.&lt;/p&gt;</comment>
                            <comment id="92719" author="afn" created="Thu, 28 Aug 2014 16:13:49 +0000"  >&lt;p&gt;FWIW, I have already implemented the version of the api that uses errno stuff, and put strerror calls in various&lt;br/&gt;
error paths. For example:&lt;/p&gt;

&lt;p&gt;rc = llapi_layout_stripe_count_get(layout,&amp;amp;num_comps  ); if(rc!=0){*ierr=-1;goto writeerrout;};&lt;/p&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;p&gt;writeerrout:&lt;br/&gt;
      sprintf(cerr,&quot;Lustre layout definition error: %s\n&quot;,strerror(errno));&lt;/p&gt;


&lt;p&gt;This api is consistent with how I do things in other parts of the code as well, e.g. with stat calls and such. &lt;/p&gt;

&lt;p&gt;As an implementer of userland code that uses the llapi functionality, I strongly prefer the errno approach.&lt;/p&gt;
</comment>
                            <comment id="92738" author="nedbass" created="Thu, 28 Aug 2014 19:13:19 +0000"  >&lt;blockquote&gt;&lt;p&gt;If there is an insistence on setting errno to return errors, it would still be possible to also return the negative errno instead of &quot;-1&quot; all the time.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;We could, but what would be the benefit? It would be awkward and potentially confusing for the API to specify more than one authoritative source of error codes.  And as Andy points out, applications using both standard library and  &lt;tt&gt;llapi_layout&lt;/tt&gt; calls would prefer to use common error handling constructs.&lt;/p&gt;</comment>
                            <comment id="92758" author="adilger" created="Thu, 28 Aug 2014 23:11:26 +0000"  >&lt;p&gt;Because it would keep the same API that the rest of liblustreapi has today, and there is no drawback to doing so.  Virtually every application I&apos;ve seen checks &lt;tt&gt;if (rc &amp;lt; 0)&lt;/tt&gt; instead of &lt;tt&gt;if (rc == -1)&lt;/tt&gt;, so as it is clearly documented in the man pages that the functions return a negative value on error instead of &quot;-1&quot; there shouldn&apos;t be any problem.  That allows applications to choose which behaviour they want to use for programming, and I don&apos;t see it introducing any significant complexity into the library - at worst it would mean &lt;tt&gt;return -errno&lt;/tt&gt; instead of &lt;tt&gt;return -1&lt;/tt&gt; in some places.&lt;/p&gt;</comment>
                            <comment id="92765" author="nedbass" created="Thu, 28 Aug 2014 23:36:22 +0000"  >&lt;blockquote&gt;&lt;p&gt;so as it is clearly documented in the man pages that the functions return a negative value on error instead of &quot;-1&quot; there shouldn&apos;t be any problem.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;If the man page merely specifies a negative value on error, than a properly written application shouldn&apos;t rely on the specific value returned.  Therefore in order for this behavior to be useful, it most become a formal part of the API and documented as such. Simplicity is arguably the most important aspect of a well-designed API, and having two alternative means of returning error codes would add complexity for little gain. I&apos;m not persuaded we should do it just because the legacy &lt;tt&gt;llapi&lt;/tt&gt; functions do it.  We should see this as an opportunity to get things right this time, not to carry over all the old baggage.&lt;/p&gt;</comment>
                            <comment id="92803" author="afn" created="Fri, 29 Aug 2014 15:14:57 +0000"  >
&lt;p&gt;I am all for abandoning the old api as soon as possible. It is a maintenance nightmare for my applicaiton code to handle. There is ugly and otherwise unneeded ifdef goo for different machines all over the place. Please dump that api. What Ned says is exactly right: there is a chance to do it right/better this time around. &lt;/p&gt;

&lt;p&gt;As far as &quot;negative value returned&quot; vs &quot;specific negative value returned (i.e. rc=-ETHE_ERROR_CODE), again, this leads to confusion and complexity that is just not needed. What happens when someone makes a commit sometime and gets them out of sync, such that the return code and errno are not the same any more, for example? Yes, thats a bug that&lt;br/&gt;
someone introduced, but that is the whole point. You can remove that bug years ahead of time by making it impossible to do that by design.&lt;/p&gt;

</comment>
                            <comment id="92810" author="adilger" created="Fri, 29 Aug 2014 15:57:26 +0000"  >&lt;p&gt;Andy, while I&apos;m all for updating applications to the new API, yours is not the only application in the world that uses it, so we can&apos;t remove the old API very quickly.  Also, this new API has only just been landed into a development branch and would need to be backported into the maintenance releases before it even has a chance to be used by regular users. &lt;/p&gt;

&lt;p&gt;I think the best that can be done is to include this into all of the maintenance releases, update the old APIs to use this new code (James has started on that) and then it can be deprecated in a few years after it is available in those releases.  It is also possible to mark those functions as deprecated in the headers so that application developers learn this before the API is removed.  It would also make sense to update the user manual once this API is available in the maintenance releases. &lt;/p&gt;</comment>
                            <comment id="92838" author="simmonsja" created="Fri, 29 Aug 2014 18:09:36 +0000"  >&lt;p&gt;I will hold off on my patches until this is resolved. I sent out  a email to some people in our applications department to see what they say. I will report on that feedback.&lt;/p&gt;</comment>
                            <comment id="92857" author="afn" created="Fri, 29 Aug 2014 20:34:35 +0000"  >
&lt;p&gt;I fully understand the constraints of backwards compatibility and &quot;can&apos;t get rid of that yet&quot; cruft. That was very much the point of my previous comment: I&apos;ve got to carry around backwards compatibility stuff for the different lustre api&apos;s until such time as the new one is available on the oldest machine I have to compile code on. I&apos;m expecting that I can remove the &quot;old API&quot; functionality from my own code in something like 5 years or so, if the &quot;new API&quot; stuff lands in a distributed version right now. And that isn&apos;t even the case yet, so the date keeps getting pulled further and further out.&lt;/p&gt;

&lt;p&gt;An example of that pain is the lustre_idl.h file, which I have to hack by hand and carry around with my code since it doesn&apos;t compile in user space and I need stuff out of there. The example codes in the lustre manual include it for example too.  All the more complicated since that file changes in odd ways across different versions and I can&apos;t just use a 1.8x lustre_idl.h file on a lustre 2.x installation and expect it to work...&lt;/p&gt;

&lt;p&gt;So this is why my emphasis on the &quot;start now&quot; and get the clock ticking. As far as new api format (errno and such discussions), we&apos;ve gone over that, but here is a reiteration:  My position as an applications developer is to have &lt;b&gt;one&lt;/b&gt; way to access the error information, and for that way to be the errno setting stuff. &lt;/p&gt;


</comment>
                            <comment id="93254" author="morrone" created="Thu, 4 Sep 2014 23:39:56 +0000"  >&lt;blockquote&gt;&lt;p&gt;Because it would keep the same API that the rest of liblustreapi has today, and there is no drawback to doing so.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The old API is seriously deficient in its design. Yes, we need to keep it around for some time, but I don&apos;t see a great deal of value in maintaining compatibility with poor design. This is our chance to make a clean break and make something that applications can really use.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt; Virtually every application I&apos;ve seen checks if (rc &amp;lt; 0) instead of if (rc == -1), so as it is clearly documented in the man pages that the functions return a negative value on error instead of &quot;-1&quot; there shouldn&apos;t be any problem.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Granted, that is not uncommon.  But the man page usually says:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;On error, -1 is returned, and errno is set appropriately.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;So it is perhaps not unreasonable to guess there are are also folks out there that are used to explicitly checking for -1. Lustre returning only -1 could be considered the least likely to trip anyone up because it will work for checks of either &lt;tt&gt;rc == -1&lt;/tt&gt; or &lt;tt&gt;rc &amp;lt; 0&lt;/tt&gt;.&lt;/p&gt;</comment>
                            <comment id="93799" author="simmonsja" created="Thu, 11 Sep 2014 17:17:06 +0000"  >&lt;p&gt;So I got feedback from our users and middle ware developers. For our users they only care about lfs setstripe and getstripe working. They will most likely never program the striping api themselves. For the ones that do program the most common case is they test for success which means test for zero and then bomb out the app. They normally could care less about returned error values. What is most important to them is that a zero is returned for all functions on success. For our middle ware guys they also want zero to returned for all cases. Now for the error reporting they varied a bit on opinion. What I did get is they don&apos;t like negative errno values. They also were not fans of having to read errno itself after a function call. So they asked why not just return a positive errno instead since 0 is success and something else is failure. Also I found out in the discussion return 0 on success and a positive errno is defined in the POSIX.1c standard.&lt;/p&gt;</comment>
                            <comment id="93830" author="adilger" created="Fri, 12 Sep 2014 03:43:16 +0000"  >&lt;p&gt;I would much rather stick with returning -1 and errors in errno than having positive error numbers returned from the functions. Many programs I&apos;ve seen check &quot;&lt;tt&gt;if (rc &amp;lt; 0)&lt;/tt&gt;&quot; for errors instead of &quot;&lt;tt&gt;if (rc == -1)&lt;/tt&gt;&quot; , so I don&apos;t think it makes sense to break these. &lt;/p&gt;</comment>
                            <comment id="105573" author="jlevi" created="Tue, 3 Feb 2015 19:14:46 +0000"  >&lt;p&gt;Duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2182&quot; title=&quot;Add llapi_file_get_layout() function in liblustreapi&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2182&quot;&gt;&lt;del&gt;LU-2182&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="16365">LU-2182</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="19462">LU-3480</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="23278">LU-4665</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="15471" name="llapi_layout.txt" size="3678" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15472" name="llapi_layout_alloc.txt" size="1733" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15473" name="llapi_layout_file_create.txt" size="2495" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15480" name="llapi_layout_get_by_fd.txt" size="4455" author="nedbass" created="Mon, 4 Aug 2014 18:24:31 +0000"/>
                            <attachment id="15475" name="llapi_layout_ost_index_get.txt" size="1843" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15476" name="llapi_layout_pattern_get.txt" size="1535" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15477" name="llapi_layout_pool_name_get.txt" size="2228" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15478" name="llapi_layout_stripe_count_get.txt" size="1448" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                            <attachment id="15479" name="llapi_layout_stripe_size_get.txt" size="1318" author="nedbass" created="Mon, 4 Aug 2014 18:22:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvzcv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9942</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>