[LU-3128] filter_fid on OST not updated during layout swap Created: 08/Apr/13  Updated: 27/Nov/17  Resolved: 27/Nov/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: John Hammond Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: mdd

Issue Links:
Related
is related to LU-2677 Adding LMA to OST object Resolved
is related to LU-10248 Need to update PFID of OST objects af... Resolved
Severity: 3
Rank (Obsolete): 7599

 Description   

LMA on OST is not updated during layout swap.

# llmount.sh 
...
# cd /mnt/lustre
# lfs setstripe -c2 f0
# dd if=/dev/zero bs=1M count=2 | tr '\0' 'X' > f0
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.0201865 s, 104 MB/s
# lfs getstripe f0
f0
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  0
	obdidx		 objid		 objid		 group
	     0	             1	          0x1	             0
	     1	             1	          0x1	             0

# touch f1
# ls -lh
total 2.0M
-rw-r--r-- 1 root root 2.0M Apr  8 13:54 f0
-rw-r--r-- 1 root root    0 Apr  8 13:54 f1
# lfs path2fid f0
[0x200000400:0x1:0x0]
# lfs path2fid f1
[0x200000400:0x3:0x0]
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
# ls -lh /mnt/ost1/O/0/d1/1
-rw-rw-rw- 1 root root 1.0M Apr  8 13:54 /mnt/ost1/O/0/d1/1
# ll_decode_filter_fid /mnt/ost1/O/0/d1/1
/mnt/ost1/O/0/d1/1: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x0]
# umount /mnt/ost1
# mount /tmp/lustre-ost1  /mnt/ost1 -t lustre -o loop
# lfs swap_layouts f0 f1
# ls -lh
total 2.0M
-rw-r--r-- 1 root root    0 Apr  8 13:57 f0
-rw-r--r-- 1 root root 2.0M Apr  8 13:57 f1
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
# ls -lh /mnt/ost1/O/0/d1/1
-rw-rw-rw- 1 root root 1.0M Apr  8 13:57 /mnt/ost1/O/0/d1/1
# ll_decode_filter_fid /mnt/ost1/O/0/d1/1
/mnt/ost1/O/0/d1/1: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x0]


 Comments   
Comment by Peter Jones [ 09/Apr/13 ]

Bob will look into this

Comment by Bob Glossman (Inactive) [ 09/Apr/13 ]

From the discussion in LU-2677 it isn't clear if there is more to be done than is in the fix http://review.whamcloud.com/#change,5838 for LU-2677

Comment by John Hammond [ 10/Apr/13 ]

Hi Bob. After the layout swap ll_decode_filter_fid should return the FID of the new parent.

# lfs setstripe -c2 f0
# touch f1
# dd if=/dev/zero of=f0 bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.02666 s, 78.7 MB/s
# lfs path2fid f0 
[0x200000400:0x5:0x0]
# lfs path2fid f1
[0x200000400:0x6:0x0]
# lfs getstripe f0
f0
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  1
	obdidx		 objid		 objid		 group
	     1	             5	          0x5	             0
	     0	             5	          0x5	             0

# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
# ls -l /mnt/ost1/O/0/d5/5
-rw-rw-rw- 1 root root 1048576 Apr 10 10:20 /mnt/ost1/O/0/d5/5
# ll_decode_filter_fid /mnt/ost1/O/0/d5/5
/mnt/ost1/O/0/d5/5: parent=[0x200000400:0x5:0x0] stripe=1 ### Correct.
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop
# lfs swap_layouts f0 f1
# ls -lh 
total 2.0M
-rw-r--r-- 1 root root    0 Apr 10 10:22 f0
-rw-r--r-- 1 root root 2.0M Apr 10 10:22 f1
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
# ll_decode_filter_fid /mnt/ost1/O/0/d5/5
/mnt/ost1/O/0/d5/5: parent=[0x200000400:0x5:0x0] stripe=1 ### Incorrect parent should be [0x200000400:0x6:0x0].
Comment by Bob Glossman (Inactive) [ 11/Apr/13 ]

Does look like the problem still reproduces with the commit from LU-2677 in. My results exactly correspond to John's comment:

[root@centos1 lustre]# lfs setstripe -c2 f0
[root@centos1 lustre]# touch f1
[root@centos1 lustre]# dd if=/dev/zero of=f0 bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.00989071 s, 212 MB/s
[root@centos1 lustre]# lfs path2fid f0
[0x200000400:0x1:0x0]
[root@centos1 lustre]# lfs path2fid f1
[0x200000400:0x2:0x0]
[root@centos1 lustre]# lfs getstripe f0
f0
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  1
	obdidx		 objid		 objid		 group
	     1	             2	          0x2	             0
	     0	             2	          0x2	             0

[root@centos1 lustre]# umount /mnt/ost1
[root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
[root@centos1 lustre]# ll_decode_filter_fid /mnt/ost1/O/0/d2/2
/mnt/ost1/O/0/d2/2: parent=[0x200000400:0x1:0x0] stripe=1
[root@centos1 lustre]# umount /mnt/ost1
[root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop
[root@centos1 lustre]# lfs swap_layouts f0 f1
[root@centos1 lustre]# ls -lh
total 2.0M
-rw-r--r-- 1 root root    0 Apr 11 11:33 f0
-rw-r--r-- 1 root root 2.0M Apr 11 11:33 f1
[root@centos1 lustre]# umount /mnt/ost1
[root@centos1 lustre]# 
[root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
[root@centos1 lustre]# ll_decode_filter_fid /mnt/ost1/O/0/d2/2
/mnt/ost1/O/0/d2/2: parent=[0x200000400:0x1:0x0] stripe=1
Comment by Mikhail Pershin [ 18/Apr/13 ]

John, are you sure this occurred after LU-2677? I am asking because parent_fid is not in LMA but in filter_fid EA and it is still here. So if that worked before than it should work still. Strictly speaking the bug description is not correct - LMA shouldn't be changed but filter_fid should.

Comment by John Hammond [ 18/Apr/13 ]

Mike, you're right about the description and the EAs. Sorry. I sort of copied that from the summary of LU-2677 and realized my mistake later. In any case the filter_fid is not updated, even after LU-2677 landed.

# git describe
2.3.64-7-gc4f7a77
# llmount.sh
...
# cd /mnt/lustre
# lfs setstripe -c2 f0
# dd if=/dev/zero of=f0 bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.0128964 s, 163 MB/s
# lfs path2fid f0
[0x200000400:0x1:0x0]
# lfs getstripe f0
f0
lmm_stripe_count:   2
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  1
        obdidx           objid           objid           group
             1               2            0x2                0
             0               2            0x2                0

# touch f1
# lfs path2fid f1
[0x200000400:0x3:0x0]
# lfs getstripe f1
f1
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  1
        obdidx           objid           objid           group
             1               3            0x3                0

# ls -lh
total 2.0M
-rw-r--r-- 1 root root 2.0M Apr 18 08:59 f0
-rw-r--r-- 1 root root    0 Apr 18 08:59 f1
#
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
#
# ls -lh /mnt/ost1/O/0/d2/2
-rw-rw-rw- 1 root root 1.0M Apr 18 08:59 /mnt/ost1/O/0/d2/2
# sys_listxattr /mnt/ost1/O/0/d2/2
'trusted.lma' '000000000000000000000000010000000200000000000000'
'trusted.fid' '00040000020000000100000001000000'
# ll_decode_filter_fid /mnt/ost1/O/0/d2/2
/mnt/ost1/O/0/d2/2: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x1]
#
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop
#
# lfs swap_layouts f0 f1
# ls -lh
total 2.0M
-rw-r--r-- 1 root root    0 Apr 18 08:59 f0
-rw-r--r-- 1 root root 2.0M Apr 18 08:59 f1
#
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
#
# lfs swap_layouts f0 f1
# ls -lh
total 2.0M
-rw-r--r-- 1 root root    0 Apr 18 08:59 f0
-rw-r--r-- 1 root root 2.0M Apr 18 08:59 f1
#
# umount /mnt/ost1
# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
#
# ls -lh /mnt/ost1/O/0/d2/2
-rw-rw-rw- 1 root root 1.0M Apr 18 08:59 /mnt/ost1/O/0/d2/2
# sys_listxattr /mnt/ost1/O/0/d2/2
'trusted.lma' '000000000000000000000000010000000200000000000000'
'trusted.fid' '00040000020000000100000001000000'
# ll_decode_filter_fid /mnt/ost1/O/0/d2/2
/mnt/ost1/O/0/d2/2: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x1]
Comment by Mikhail Pershin [ 19/Apr/13 ]

John, am I right that it was not update before LU-2677 as well? So this is not side effect of LU-2677 but just issue in master?

Comment by Alex Zhuravlev [ 19/Apr/13 ]

is this really a blocker? I would hope it's not. I think LOD replacing layout should take care of filter_fid: call ->do_xattr_set(XATTR_NAME_FID, ..), then osp should llog the change and set appropriate OST_SET_ATTR?
ideally, this should be done by means of OUT, but it's not quite ready to run on OST, unfortunately.

Comment by Andreas Dilger [ 22/Apr/13 ]

Alex, you are completely right - this should be handled by layout swap code, but this isn't done yet. The LFSCK Phase 2 code will correct the parent FID on the OST objects, but I'd rather that it be done correctly during swap under normal conditions.

Comment by Mikhail Pershin [ 11/Apr/14 ]

I need to check if bug exists in the master to fix this or close the bug otherwise, lower priority to the Major

Comment by Jinshan Xiong (Inactive) [ 27/Nov/17 ]

This problem will be fixed in LU-10248.

Generated at Sat Feb 10 01:31:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.