[LU-8314] user interface of lfs getdirstripe needs reworking Created: 21/Jun/16  Updated: 10/Aug/17  Resolved: 03/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: llnl

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

lfs's subcommand "getdirstripe" has a very strange, and user unfriendly interface. For whatever reason, getdirstripe always recurses one level down.

Here is an example directory tree on Lustre 2.8.0:

$ find level1
level1
level1/level2_c
level1/level2_b
level1/level2_a
level1/level2_a/level3_c
level1/level2_a/level3_b
level1/level2_a/level3_a

If I just want to get the directory striping information for the directory named "level1", I might assume that the command is "lfs getdirstripe level1". But this is what I get instead:

$ lfs getdirstripe level1
level1
lmv_stripe_count: 0 lmv_stripe_offset: 0
level1/level2_c
lmv_stripe_count: 0 lmv_stripe_offset: 0
level1/level2_b
lmv_stripe_count: 0 lmv_stripe_offset: 0
level1/level2_a
lmv_stripe_count: 0 lmv_stripe_offset: 0

Not only did it give me the information for the directory that I actually asked for, it also decided to recurse on level down the tree and give me the directory information for all subdirectories.

The man page more-or-less acknowledges this behaviour, but it buries the detail in the -r/--recursive option. So I can only assume that this was intentional rather than an accidental off-by-one default recursion count in the code.

-r, --recursive
The default behavior when a directory is specified is to list the striping information
for all sub-directories within the specified directory. This can be expanded with
--recursive, which will recurse into all subdirectories.

The information there isn't exactly accurate either, because it actually lists the information for the specified directory and all of its sub-directories, not just the sub-directories of the specified directory.

But the documentation is not really the problem. The problem is that this is a bad user interface.

When a user types "lfs getdirstripe somedirectory", by default it should print the information for directory "somedirectory" and nothing else. Preferably it will not print "somedirectory" again in response before showing somedirectory's directory striping information. Just print the dir striping information and nothing else.

Just to make it super clear, here is an example of something more like I would expect to see from this command:

$ lfs getdirstripe level1
lmv_stripe_count: 0 lmv_stripe_offset: 0

That is it. Nothing else.

This would make the command consistent with the -c and -i options as well. This is how those currently work:

$ lfs getdirstripe -c level1
0
$ lfs getdirstripe -i level1
0

For those commands it didn't decide to also show the requested information for all of the subdirectories as well.

And while we are on the topic of weird interfaces, what the heck is this supposed to mean:

-q, --quiet
Only show the details of the sub-striped directory FID information.

I have absolutely no idea. Apparently it turns of all output:

$ lfs getdirstripe -q level1

That seems overly quiet. But then, I don't understand what "details of the sub-striped directory FID information", so I can't tell if printing nothing makes any sense.

And next we have the verbose option with which supposedly "Additional information will be shown". But no additional information is shown:

$ lfs getdirstripe -v level1
level1
lmv_stripe_count: 0 lmv_stripe_offset: 0
level1/level2_c
lmv_stripe_count: 0 lmv_stripe_offset: 0
level1/level2_b
lmv_stripe_count: 0 lmv_stripe_offset: 0
level1/level2_a
lmv_stripe_count: 0 lmv_stripe_offset: 0

Maybe there is some other situation under which it would print more information?

Overall, the lfs getdirstripe command needs work before we can turn it over to our users.



 Comments   
Comment by Li Xi (Inactive) [ 21/Jun/16 ]

I agreed.

Most of the Lustre tools are unfriendly to users especially to people who are just starting to learn Lustre.
And the outdated and insufficient document/manual is a problem too.

Comment by Christopher Morrone [ 22/Jun/16 ]

While we are at it, "lmv_stripe_count" and "lmv_stripe_offset" are really silly names for these. Why the "lmv_" prefix? It doesn't even match the variable names in the header file (those start with lum_). But even if they did match, it does not make much sense to expose C programming details in the command line interface.

Comment by Andreas Dilger [ 22/Jun/16 ]

Comparing the behaviour of "lfs getdirstripe" on 2.5.3 vs. 2.8.0 I see that the output has changed from printing only the layout of the requested directory (including the FID of the directory object) to printing only the stripe count and offset of all first-level subdirectories.

client-2.5.3$ lfs getdirstripe /myth/tmp
/myth/tmp
lmv_stripe_count: 1
lmv_stripe_offset: 0
        mdtidx          FID[seq:oid:ver]
                0          [0x2489f5:0x451cf6bf:0x0]
client-2.8.0$ lfs getstripe /myth/tmp
/myth/tmp
lmv_stripe_count: 0 lmv_stripe_offset: 0
/myth/tmp/test2
lmv_stripe_count: 0 lmv_stripe_offset: 0
/myth/tmp/ost2
lmv_stripe_count: 0 lmv_stripe_offset: 0
:
:

I agree that this isn't as useful as one would expect. It is possible to disable this behavior by using "lfs getdirstripe -d /path/to/dir" to print only the layout of the requested directory (like "ls -d <directory>"), which is the same way that "lfs getstripe /path/to/dir" works. That said, I think the default recursion isn't the most obvious behavior and I'd be OK to change it.

The reason the filename is printed first is to separate the output from multiple directories, given the current behaviour.

The "lmv_" prefix is because this is the directory layout, and not the file layout which is shown with an "lmm_" prefix.

Comment by Christopher Morrone [ 12/Jul/16 ]

The "lmv_" prefix is because this is the directory layout, and not the file layout which is shown with an "lmm_" prefix.

I understand that directory and file stripes are named differently. But "lmv_" is not the correct name for these variables. Look here:

struct lmv_user_md_v1 {
        __u32   lum_magic;       /* must be the first field */
        __u32   lum_stripe_count;  /* dirstripe count */
        __u32   lum_stripe_offset; /* MDT idx for default dirstripe */
        __u32   lum_hash_type;     /* Dir stripe policy */
        __u32   lum_type;         /* LMV type: default or normal */
        __u32   lum_padding1;
        __u32   lum_padding2;
        __u32   lum_padding3;
        char    lum_pool_name[LOV_MAXPOOLNAME + 1];
        struct  lmv_user_mds_data  lum_objects[0];
} __attribute__((packed));

Sure, the structure starts with "lmv_", but the variables inside the structure all start with "lum_". getdirstripe gets this wrong. getstripe does not. For lov stripes, the struct name starts with "lov_" and the variables in side the struct start with "lmm_".

Comment by Andreas Dilger [ 19/Jul/16 ]

I would argue in this case that "lum_" is not the best prefix for this data structure's fields, and "lmv_" would be better, since "lum_" is easily confused with "lov_user_md", while "lmv_" is reflecting "Lustre Metadata Volume", LMV_USER_MAGIC, etc. Renaming the fields internally also avoids problems with changing the user-visible interface that has existed since 2.4.

I do agree that we should revert the "lfs getdirstripe <dir>" to the non-recursive mode that existed since 2.5. I'm not 100% sure whether that was new in 2.8, or if it existed since 2.7, but it is confusing enough that I don't think anyone depends on that behaviour.

Comment by Christopher Morrone [ 19/Jul/16 ]

I would argue in this case that "lum_" is not the best prefix for this data structure's fields, and "lmv_" would be better, since "lum_" is easily confused with "lov_user_md", while "lmv_" is reflecting "Lustre Metadata Volume", LMV_USER_MAGIC, etc. Renaming the fields internally also avoids problems with changing the user-visible interface that has existed since 2.4.

Which is an API change, with all that implies. I'm not objecting, just pointing it out.

That would get the command and the C API to match, but I still think it would be better still to not embed C variable names in the output of commands designed for human consumption. They should use independent, descriptive terms rather than specific code implementation details.

I do agree that we should revert the "lfs getdirstripe <dir>" to the non-recursive mode that existed since 2.5. I'm not 100% sure whether that was new in 2.8, or if it existed since 2.7, but it is confusing enough that I don't think anyone depends on that behaviour.

Great!

Comment by Christopher Morrone [ 27/Jul/16 ]

First patch under way is 21516.

Comment by Gerrit Updater [ 15/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21516/
Subject: LU-8314 utils: revert lfs_getdirstripe to non-recursive mode
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 957df6f030e8c89b02b168776d237dbb6879073b

Comment by Peter Jones [ 15/Aug/16 ]

Landed for 2.9. Other work in this area is best tracked under a follow on ticket

Comment by Christopher Morrone [ 15/Aug/16 ]

Did you open that ticket? 95% of what was listed in the Description is unaddressed, so I'm not clear why another one is needed, but go ahead and open a new one if that is your process.

Comment by Peter Jones [ 15/Aug/16 ]

In that case I guess I will reopen and remove the fixversion

Comment by Christopher Morrone [ 06/Oct/16 ]

When can we expect to see progress on this?

Comment by Gerrit Updater [ 13/Dec/16 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/24319
Subject: LU-8314 lfs: improve getdirstripe interface
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c7a0cb437281c4a1fadaba910b2df4477f3fa04d

Comment by Gerrit Updater [ 01/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24319/
Subject: LU-8314 lfs: improve getdirstripe interface
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 64c1297230288fda8d8f38d9e41b8e55c1373619

Comment by Peter Jones [ 03/Jan/17 ]

Is there still more work to come being tracked under this ticket or can it now be marked as resolved?

Comment by Andreas Dilger [ 03/Jan/17 ]

The recursion issue was fixed in 21516, and issues with -q and -v options were addressed in the 24319 patch.

Comment by Peter Jones [ 03/Jan/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:16:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.