[LU-6211] getxattr() for {lustre,trusted}.lov on HSM released file returns stripe count 0 Created: 05/Feb/15  Updated: 30/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: hsm, lov, xattr

Issue Links:
Related
is related to LU-6212 listxattr() on HSM released file does... Open
Severity: 3
Rank (Obsolete): 17381

 Description   

When getxattr() is called to retrieve the lustre.lov or trusted.lov xattr of a regular file the returned value is synthesized from the LSM attached to that file. When the file is released that LSM will have a stripe count of 0.

# lfs setstripe -c2 f0
# lfs getstripe f0
f0
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  0
        obdidx          objid           objid           group
        0               3               0x3             0
        1               3               0x3             0

# sys_getxattr --raw f0 lustre.lov | hexdump -C
00000000  d0 0b d1 0b 01 00 00 00  04 00 00 00 00 00 00 00  |................|
00000010  00 04 00 00 02 00 00 00  00 00 10 00 02 00 00 00  |................|
00000020  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  |................|
00000050
# sys_getxattr --raw f0 trusted.lov | hexdump -C
00000000  d0 0b d1 0b 01 00 00 00  04 00 00 00 00 00 00 00  |................|
00000010  00 04 00 00 02 00 00 00  00 00 10 00 02 00 00 00  |................|
00000020  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  03 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 01 00 00 00  |................|
00000050
# lfs hsm_archive f0
# lfs hsm_release f0
# lfs hsm_state f0
f0: (0x0000000d) released exists archived, archive_id:1

lfs getstripe returns the correct answer as it does not use the LSM but instead does a IOC_MDC_GETFILESTRIPE ioctl on the parent directory which calls ll_lov_getstripe_ea_info() and sends a MDS_GETATTR_NAME request.

# lfs getstripe f0
f0
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_pattern:        80000001
lmm_layout_gen:     1
lmm_stripe_offset:  0

getxattr() returns the wrong answer as it uses that attached LSM:

q:lustre# sys_getxattr -r f0 lustre.lov | hexdump -C
00000000  d0 0b d1 0b 01 00 00 80  04 00 00 00 00 00 00 00  |................|
00000010  00 04 00 00 02 00 00 00  00 00 10 00 00 00 00 00  |................|
00000020
q:lustre# sys_getxattr -r f0 trusted.lov | hexdump -C
00000000  d0 0b d1 0b 01 00 00 80  04 00 00 00 00 00 00 00  |................|
00000010  00 04 00 00 02 00 00 00  00 00 10 00 00 00 00 00  |................|
00000020

          magic------ pattern----  oi---------------------
00000000  d0 0b d1 0b 01 00 00 80  04 00 00 00 00 00 00 00  |................|
          -----------------------  size------- count gen--
00000010  00 04 00 00 02 00 00 00  00 00 10 00 00 00 00 00  |................|

When the LL_IOC_LOV_GETSTRIPE ioctl is done on a regular file we will also get a stripe count of 0 on a released file. But neither llapi nor lfs will use LL_IOC_LOV_GETSTRIPE on a regular file.

Note that this code was modified recently as part of CLIO simplification (see LU-5823 and http://review.whamcloud.com/12452) but the same defect existed before that commit.



 Comments   
Comment by Andreas Dilger [ 05/Feb/15 ]

I agree that this is inconsistent behaviour and should be fixed, though I couldn't say which result is more correct. Does this cause any specific problems that would warrant it being fixed more urgently, or is this just a quirk that can be fixed at some later time?

Comment by Colin Faber [X] (Inactive) [ 07/Jun/19 ]

This is an oldie but a goodie. I would say this probably should be a little more urgent, recently we suspect we've hit this behavior, resulting in problems with a site's archive solution which attempts to store the stripe data with the archive, so on restore it can be recreated with the same striping configuration.  It's not a show stopper, but definitely a problem and others are likely experiencing it though they may not realize it.

Comment by Andreas Dilger [ 07/Jun/19 ]

Wouldn't it make sense for the HSM to archive the layout xattr before the file is released? That allows it to be re-used in restore, and also helps recovery if the file is lost/deleted.

Comment by Colin Faber [X] (Inactive) [ 07/Jun/19 ]

Yes, this does make great sense, however we're also chasing an issue were, apparently in some cases xattr read fails silently before the file is released =)

Comment by Andreas Dilger [ 07/Jun/19 ]

On the flip side, something like the "lfs_migrate -A" code to auto-select the layout of the file based on the file size is pretty reasonable. PFL and such are good when you don't know the file size, but if you know the size in advance (e.g. restoring from HSM) then it is possible to make a better choice of the stripe count than when the file was first created.

Not saying this bug shouldn't be fixed, just giving some options.

Comment by Colin Faber [X] (Inactive) [ 07/Jun/19 ]

No arguments here

Generated at Sat Feb 10 01:58:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.