[LU-8314] user interface of lfs getdirstripe needs reworking Created: 21/Jun/16 Updated: 10/Aug/17 Resolved: 03/Jan/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Christopher Morrone | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
lfs's subcommand "getdirstripe" has a very strange, and user unfriendly interface. For whatever reason, getdirstripe always recurses one level down. Here is an example directory tree on Lustre 2.8.0: $ find level1 level1 level1/level2_c level1/level2_b level1/level2_a level1/level2_a/level3_c level1/level2_a/level3_b level1/level2_a/level3_a If I just want to get the directory striping information for the directory named "level1", I might assume that the command is "lfs getdirstripe level1". But this is what I get instead: $ lfs getdirstripe level1 level1 lmv_stripe_count: 0 lmv_stripe_offset: 0 level1/level2_c lmv_stripe_count: 0 lmv_stripe_offset: 0 level1/level2_b lmv_stripe_count: 0 lmv_stripe_offset: 0 level1/level2_a lmv_stripe_count: 0 lmv_stripe_offset: 0 Not only did it give me the information for the directory that I actually asked for, it also decided to recurse on level down the tree and give me the directory information for all subdirectories. The man page more-or-less acknowledges this behaviour, but it buries the detail in the -r/--recursive option. So I can only assume that this was intentional rather than an accidental off-by-one default recursion count in the code.
The information there isn't exactly accurate either, because it actually lists the information for the specified directory and all of its sub-directories, not just the sub-directories of the specified directory. But the documentation is not really the problem. The problem is that this is a bad user interface. When a user types "lfs getdirstripe somedirectory", by default it should print the information for directory "somedirectory" and nothing else. Preferably it will not print "somedirectory" again in response before showing somedirectory's directory striping information. Just print the dir striping information and nothing else. Just to make it super clear, here is an example of something more like I would expect to see from this command: $ lfs getdirstripe level1 lmv_stripe_count: 0 lmv_stripe_offset: 0 That is it. Nothing else. This would make the command consistent with the -c and -i options as well. This is how those currently work: $ lfs getdirstripe -c level1 0 $ lfs getdirstripe -i level1 0 For those commands it didn't decide to also show the requested information for all of the subdirectories as well. And while we are on the topic of weird interfaces, what the heck is this supposed to mean:
I have absolutely no idea. Apparently it turns of all output: $ lfs getdirstripe -q level1 That seems overly quiet. And next we have the verbose option with which supposedly "Additional information will be shown". But no additional information is shown: $ lfs getdirstripe -v level1 level1 lmv_stripe_count: 0 lmv_stripe_offset: 0 level1/level2_c lmv_stripe_count: 0 lmv_stripe_offset: 0 level1/level2_b lmv_stripe_count: 0 lmv_stripe_offset: 0 level1/level2_a lmv_stripe_count: 0 lmv_stripe_offset: 0 Maybe there is some other situation under which it would print more information? Overall, the lfs getdirstripe command needs work before we can turn it over to our users. |
| Comments |
| Comment by Li Xi (Inactive) [ 21/Jun/16 ] |
|
I agreed. Most of the Lustre tools are unfriendly to users especially to people who are just starting to learn Lustre. |
| Comment by Christopher Morrone [ 22/Jun/16 ] |
|
While we are at it, "lmv_stripe_count" and "lmv_stripe_offset" are really silly names for these. Why the "lmv_" prefix? It doesn't even match the variable names in the header file (those start with lum_). But even if they did match, it does not make much sense to expose C programming details in the command line interface. |
| Comment by Andreas Dilger [ 22/Jun/16 ] |
|
Comparing the behaviour of "lfs getdirstripe" on 2.5.3 vs. 2.8.0 I see that the output has changed from printing only the layout of the requested directory (including the FID of the directory object) to printing only the stripe count and offset of all first-level subdirectories. client-2.5.3$ lfs getdirstripe /myth/tmp
/myth/tmp
lmv_stripe_count: 1
lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x2489f5:0x451cf6bf:0x0]
client-2.8.0$ lfs getstripe /myth/tmp /myth/tmp lmv_stripe_count: 0 lmv_stripe_offset: 0 /myth/tmp/test2 lmv_stripe_count: 0 lmv_stripe_offset: 0 /myth/tmp/ost2 lmv_stripe_count: 0 lmv_stripe_offset: 0 : : I agree that this isn't as useful as one would expect. It is possible to disable this behavior by using "lfs getdirstripe -d /path/to/dir" to print only the layout of the requested directory (like "ls -d <directory>"), which is the same way that "lfs getstripe /path/to/dir" works. That said, I think the default recursion isn't the most obvious behavior and I'd be OK to change it. The reason the filename is printed first is to separate the output from multiple directories, given the current behaviour. The "lmv_" prefix is because this is the directory layout, and not the file layout which is shown with an "lmm_" prefix. |
| Comment by Christopher Morrone [ 12/Jul/16 ] |
I understand that directory and file stripes are named differently. But "lmv_" is not the correct name for these variables. Look here: struct lmv_user_md_v1 {
__u32 lum_magic; /* must be the first field */
__u32 lum_stripe_count; /* dirstripe count */
__u32 lum_stripe_offset; /* MDT idx for default dirstripe */
__u32 lum_hash_type; /* Dir stripe policy */
__u32 lum_type; /* LMV type: default or normal */
__u32 lum_padding1;
__u32 lum_padding2;
__u32 lum_padding3;
char lum_pool_name[LOV_MAXPOOLNAME + 1];
struct lmv_user_mds_data lum_objects[0];
} __attribute__((packed));
Sure, the structure starts with "lmv_", but the variables inside the structure all start with "lum_". getdirstripe gets this wrong. getstripe does not. For lov stripes, the struct name starts with "lov_" and the variables in side the struct start with "lmm_". |
| Comment by Andreas Dilger [ 19/Jul/16 ] |
|
I would argue in this case that "lum_" is not the best prefix for this data structure's fields, and "lmv_" would be better, since "lum_" is easily confused with "lov_user_md", while "lmv_" is reflecting "Lustre Metadata Volume", LMV_USER_MAGIC, etc. Renaming the fields internally also avoids problems with changing the user-visible interface that has existed since 2.4. I do agree that we should revert the "lfs getdirstripe <dir>" to the non-recursive mode that existed since 2.5. I'm not 100% sure whether that was new in 2.8, or if it existed since 2.7, but it is confusing enough that I don't think anyone depends on that behaviour. |
| Comment by Christopher Morrone [ 19/Jul/16 ] |
Which is an API change, with all that implies. I'm not objecting, just pointing it out. That would get the command and the C API to match, but I still think it would be better still to not embed C variable names in the output of commands designed for human consumption. They should use independent, descriptive terms rather than specific code implementation details.
Great! |
| Comment by Christopher Morrone [ 27/Jul/16 ] |
|
First patch under way is 21516. |
| Comment by Gerrit Updater [ 15/Aug/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21516/ |
| Comment by Peter Jones [ 15/Aug/16 ] |
|
Landed for 2.9. Other work in this area is best tracked under a follow on ticket |
| Comment by Christopher Morrone [ 15/Aug/16 ] |
|
Did you open that ticket? 95% of what was listed in the Description is unaddressed, so I'm not clear why another one is needed, but go ahead and open a new one if that is your process. |
| Comment by Peter Jones [ 15/Aug/16 ] |
|
In that case I guess I will reopen and remove the fixversion |
| Comment by Christopher Morrone [ 06/Oct/16 ] |
|
When can we expect to see progress on this? |
| Comment by Gerrit Updater [ 13/Dec/16 ] |
|
Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/24319 |
| Comment by Gerrit Updater [ 01/Jan/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24319/ |
| Comment by Peter Jones [ 03/Jan/17 ] |
|
Is there still more work to come being tracked under this ticket or can it now be marked as resolved? |
| Comment by Andreas Dilger [ 03/Jan/17 ] |
|
The recursion issue was fixed in 21516, and issues with -q and -v options were addressed in the 24319 patch. |
| Comment by Peter Jones [ 03/Jan/17 ] |
|
Landed for 2.10 |