Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3128

filter_fid on OST not updated during layout swap

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.0
    • 3
    • 7599

    Description

      LMA on OST is not updated during layout swap.

      # llmount.sh 
      ...
      # cd /mnt/lustre
      # lfs setstripe -c2 f0
      # dd if=/dev/zero bs=1M count=2 | tr '\0' 'X' > f0
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.0201865 s, 104 MB/s
      # lfs getstripe f0
      f0
      lmm_stripe_count:   2
      lmm_stripe_size:    1048576
      lmm_layout_gen:     0
      lmm_stripe_offset:  0
      	obdidx		 objid		 objid		 group
      	     0	             1	          0x1	             0
      	     1	             1	          0x1	             0
      
      # touch f1
      # ls -lh
      total 2.0M
      -rw-r--r-- 1 root root 2.0M Apr  8 13:54 f0
      -rw-r--r-- 1 root root    0 Apr  8 13:54 f1
      # lfs path2fid f0
      [0x200000400:0x1:0x0]
      # lfs path2fid f1
      [0x200000400:0x3:0x0]
      # umount /mnt/ost1
      # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
      # ls -lh /mnt/ost1/O/0/d1/1
      -rw-rw-rw- 1 root root 1.0M Apr  8 13:54 /mnt/ost1/O/0/d1/1
      # ll_decode_filter_fid /mnt/ost1/O/0/d1/1
      /mnt/ost1/O/0/d1/1: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x0]
      # umount /mnt/ost1
      # mount /tmp/lustre-ost1  /mnt/ost1 -t lustre -o loop
      # lfs swap_layouts f0 f1
      # ls -lh
      total 2.0M
      -rw-r--r-- 1 root root    0 Apr  8 13:57 f0
      -rw-r--r-- 1 root root 2.0M Apr  8 13:57 f1
      # umount /mnt/ost1
      # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
      # ls -lh /mnt/ost1/O/0/d1/1
      -rw-rw-rw- 1 root root 1.0M Apr  8 13:57 /mnt/ost1/O/0/d1/1
      # ll_decode_filter_fid /mnt/ost1/O/0/d1/1
      /mnt/ost1/O/0/d1/1: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x0]
      

      Attachments

        Issue Links

          Activity

            [LU-3128] filter_fid on OST not updated during layout swap

            This problem will be fixed in LU-10248.

            jay Jinshan Xiong (Inactive) added a comment - This problem will be fixed in LU-10248 .

            I need to check if bug exists in the master to fix this or close the bug otherwise, lower priority to the Major

            tappro Mikhail Pershin added a comment - I need to check if bug exists in the master to fix this or close the bug otherwise, lower priority to the Major

            Alex, you are completely right - this should be handled by layout swap code, but this isn't done yet. The LFSCK Phase 2 code will correct the parent FID on the OST objects, but I'd rather that it be done correctly during swap under normal conditions.

            adilger Andreas Dilger added a comment - Alex, you are completely right - this should be handled by layout swap code, but this isn't done yet. The LFSCK Phase 2 code will correct the parent FID on the OST objects, but I'd rather that it be done correctly during swap under normal conditions.

            is this really a blocker? I would hope it's not. I think LOD replacing layout should take care of filter_fid: call ->do_xattr_set(XATTR_NAME_FID, ..), then osp should llog the change and set appropriate OST_SET_ATTR?
            ideally, this should be done by means of OUT, but it's not quite ready to run on OST, unfortunately.

            bzzz Alex Zhuravlev added a comment - is this really a blocker? I would hope it's not. I think LOD replacing layout should take care of filter_fid: call ->do_xattr_set(XATTR_NAME_FID, ..), then osp should llog the change and set appropriate OST_SET_ATTR? ideally, this should be done by means of OUT, but it's not quite ready to run on OST, unfortunately.

            John, am I right that it was not update before LU-2677 as well? So this is not side effect of LU-2677 but just issue in master?

            tappro Mikhail Pershin added a comment - John, am I right that it was not update before LU-2677 as well? So this is not side effect of LU-2677 but just issue in master?
            jhammond John Hammond added a comment -

            Mike, you're right about the description and the EAs. Sorry. I sort of copied that from the summary of LU-2677 and realized my mistake later. In any case the filter_fid is not updated, even after LU-2677 landed.

            # git describe
            2.3.64-7-gc4f7a77
            # llmount.sh
            ...
            # cd /mnt/lustre
            # lfs setstripe -c2 f0
            # dd if=/dev/zero of=f0 bs=1M count=2
            2+0 records in
            2+0 records out
            2097152 bytes (2.1 MB) copied, 0.0128964 s, 163 MB/s
            # lfs path2fid f0
            [0x200000400:0x1:0x0]
            # lfs getstripe f0
            f0
            lmm_stripe_count:   2
            lmm_stripe_size:    1048576
            lmm_layout_gen:     0
            lmm_stripe_offset:  1
                    obdidx           objid           objid           group
                         1               2            0x2                0
                         0               2            0x2                0
            
            # touch f1
            # lfs path2fid f1
            [0x200000400:0x3:0x0]
            # lfs getstripe f1
            f1
            lmm_stripe_count:   1
            lmm_stripe_size:    1048576
            lmm_layout_gen:     0
            lmm_stripe_offset:  1
                    obdidx           objid           objid           group
                         1               3            0x3                0
            
            # ls -lh
            total 2.0M
            -rw-r--r-- 1 root root 2.0M Apr 18 08:59 f0
            -rw-r--r-- 1 root root    0 Apr 18 08:59 f1
            #
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            #
            # ls -lh /mnt/ost1/O/0/d2/2
            -rw-rw-rw- 1 root root 1.0M Apr 18 08:59 /mnt/ost1/O/0/d2/2
            # sys_listxattr /mnt/ost1/O/0/d2/2
            'trusted.lma' '000000000000000000000000010000000200000000000000'
            'trusted.fid' '00040000020000000100000001000000'
            # ll_decode_filter_fid /mnt/ost1/O/0/d2/2
            /mnt/ost1/O/0/d2/2: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x1]
            #
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop
            #
            # lfs swap_layouts f0 f1
            # ls -lh
            total 2.0M
            -rw-r--r-- 1 root root    0 Apr 18 08:59 f0
            -rw-r--r-- 1 root root 2.0M Apr 18 08:59 f1
            #
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            #
            # lfs swap_layouts f0 f1
            # ls -lh
            total 2.0M
            -rw-r--r-- 1 root root    0 Apr 18 08:59 f0
            -rw-r--r-- 1 root root 2.0M Apr 18 08:59 f1
            #
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            #
            # ls -lh /mnt/ost1/O/0/d2/2
            -rw-rw-rw- 1 root root 1.0M Apr 18 08:59 /mnt/ost1/O/0/d2/2
            # sys_listxattr /mnt/ost1/O/0/d2/2
            'trusted.lma' '000000000000000000000000010000000200000000000000'
            'trusted.fid' '00040000020000000100000001000000'
            # ll_decode_filter_fid /mnt/ost1/O/0/d2/2
            /mnt/ost1/O/0/d2/2: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x1]
            
            jhammond John Hammond added a comment - Mike, you're right about the description and the EAs. Sorry. I sort of copied that from the summary of LU-2677 and realized my mistake later. In any case the filter_fid is not updated, even after LU-2677 landed. # git describe 2.3.64-7-gc4f7a77 # llmount.sh ... # cd /mnt/lustre # lfs setstripe -c2 f0 # dd if=/dev/zero of=f0 bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.0128964 s, 163 MB/s # lfs path2fid f0 [0x200000400:0x1:0x0] # lfs getstripe f0 f0 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 2 0x2 0 0 2 0x2 0 # touch f1 # lfs path2fid f1 [0x200000400:0x3:0x0] # lfs getstripe f1 f1 lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 3 0x3 0 # ls -lh total 2.0M -rw-r--r-- 1 root root 2.0M Apr 18 08:59 f0 -rw-r--r-- 1 root root 0 Apr 18 08:59 f1 # # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop # # ls -lh /mnt/ost1/O/0/d2/2 -rw-rw-rw- 1 root root 1.0M Apr 18 08:59 /mnt/ost1/O/0/d2/2 # sys_listxattr /mnt/ost1/O/0/d2/2 'trusted.lma' '000000000000000000000000010000000200000000000000' 'trusted.fid' '00040000020000000100000001000000' # ll_decode_filter_fid /mnt/ost1/O/0/d2/2 /mnt/ost1/O/0/d2/2: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x1] # # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop # # lfs swap_layouts f0 f1 # ls -lh total 2.0M -rw-r--r-- 1 root root 0 Apr 18 08:59 f0 -rw-r--r-- 1 root root 2.0M Apr 18 08:59 f1 # # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop # # lfs swap_layouts f0 f1 # ls -lh total 2.0M -rw-r--r-- 1 root root 0 Apr 18 08:59 f0 -rw-r--r-- 1 root root 2.0M Apr 18 08:59 f1 # # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop # # ls -lh /mnt/ost1/O/0/d2/2 -rw-rw-rw- 1 root root 1.0M Apr 18 08:59 /mnt/ost1/O/0/d2/2 # sys_listxattr /mnt/ost1/O/0/d2/2 'trusted.lma' '000000000000000000000000010000000200000000000000' 'trusted.fid' '00040000020000000100000001000000' # ll_decode_filter_fid /mnt/ost1/O/0/d2/2 /mnt/ost1/O/0/d2/2: objid=4294967296 seq=1 parent=[0x200000400:0x1:0x1]

            John, are you sure this occurred after LU-2677? I am asking because parent_fid is not in LMA but in filter_fid EA and it is still here. So if that worked before than it should work still. Strictly speaking the bug description is not correct - LMA shouldn't be changed but filter_fid should.

            tappro Mikhail Pershin added a comment - John, are you sure this occurred after LU-2677 ? I am asking because parent_fid is not in LMA but in filter_fid EA and it is still here. So if that worked before than it should work still. Strictly speaking the bug description is not correct - LMA shouldn't be changed but filter_fid should.

            Does look like the problem still reproduces with the commit from LU-2677 in. My results exactly correspond to John's comment:

            [root@centos1 lustre]# lfs setstripe -c2 f0
            [root@centos1 lustre]# touch f1
            [root@centos1 lustre]# dd if=/dev/zero of=f0 bs=1M count=2
            2+0 records in
            2+0 records out
            2097152 bytes (2.1 MB) copied, 0.00989071 s, 212 MB/s
            [root@centos1 lustre]# lfs path2fid f0
            [0x200000400:0x1:0x0]
            [root@centos1 lustre]# lfs path2fid f1
            [0x200000400:0x2:0x0]
            [root@centos1 lustre]# lfs getstripe f0
            f0
            lmm_stripe_count:   2
            lmm_stripe_size:    1048576
            lmm_layout_gen:     0
            lmm_stripe_offset:  1
            	obdidx		 objid		 objid		 group
            	     1	             2	          0x2	             0
            	     0	             2	          0x2	             0
            
            [root@centos1 lustre]# umount /mnt/ost1
            [root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            [root@centos1 lustre]# ll_decode_filter_fid /mnt/ost1/O/0/d2/2
            /mnt/ost1/O/0/d2/2: parent=[0x200000400:0x1:0x0] stripe=1
            [root@centos1 lustre]# umount /mnt/ost1
            [root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop
            [root@centos1 lustre]# lfs swap_layouts f0 f1
            [root@centos1 lustre]# ls -lh
            total 2.0M
            -rw-r--r-- 1 root root    0 Apr 11 11:33 f0
            -rw-r--r-- 1 root root 2.0M Apr 11 11:33 f1
            [root@centos1 lustre]# umount /mnt/ost1
            [root@centos1 lustre]# 
            [root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            [root@centos1 lustre]# ll_decode_filter_fid /mnt/ost1/O/0/d2/2
            /mnt/ost1/O/0/d2/2: parent=[0x200000400:0x1:0x0] stripe=1
            
            bogl Bob Glossman (Inactive) added a comment - Does look like the problem still reproduces with the commit from LU-2677 in. My results exactly correspond to John's comment: [root@centos1 lustre]# lfs setstripe -c2 f0 [root@centos1 lustre]# touch f1 [root@centos1 lustre]# dd if=/dev/zero of=f0 bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.00989071 s, 212 MB/s [root@centos1 lustre]# lfs path2fid f0 [0x200000400:0x1:0x0] [root@centos1 lustre]# lfs path2fid f1 [0x200000400:0x2:0x0] [root@centos1 lustre]# lfs getstripe f0 f0 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 2 0x2 0 0 2 0x2 0 [root@centos1 lustre]# umount /mnt/ost1 [root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop [root@centos1 lustre]# ll_decode_filter_fid /mnt/ost1/O/0/d2/2 /mnt/ost1/O/0/d2/2: parent=[0x200000400:0x1:0x0] stripe=1 [root@centos1 lustre]# umount /mnt/ost1 [root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop [root@centos1 lustre]# lfs swap_layouts f0 f1 [root@centos1 lustre]# ls -lh total 2.0M -rw-r--r-- 1 root root 0 Apr 11 11:33 f0 -rw-r--r-- 1 root root 2.0M Apr 11 11:33 f1 [root@centos1 lustre]# umount /mnt/ost1 [root@centos1 lustre]# [root@centos1 lustre]# mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop [root@centos1 lustre]# ll_decode_filter_fid /mnt/ost1/O/0/d2/2 /mnt/ost1/O/0/d2/2: parent=[0x200000400:0x1:0x0] stripe=1
            jhammond John Hammond added a comment - - edited

            Hi Bob. After the layout swap ll_decode_filter_fid should return the FID of the new parent.

            # lfs setstripe -c2 f0
            # touch f1
            # dd if=/dev/zero of=f0 bs=1M count=2
            2+0 records in
            2+0 records out
            2097152 bytes (2.1 MB) copied, 0.02666 s, 78.7 MB/s
            # lfs path2fid f0 
            [0x200000400:0x5:0x0]
            # lfs path2fid f1
            [0x200000400:0x6:0x0]
            # lfs getstripe f0
            f0
            lmm_stripe_count:   2
            lmm_stripe_size:    1048576
            lmm_layout_gen:     0
            lmm_stripe_offset:  1
            	obdidx		 objid		 objid		 group
            	     1	             5	          0x5	             0
            	     0	             5	          0x5	             0
            
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            # ls -l /mnt/ost1/O/0/d5/5
            -rw-rw-rw- 1 root root 1048576 Apr 10 10:20 /mnt/ost1/O/0/d5/5
            # ll_decode_filter_fid /mnt/ost1/O/0/d5/5
            /mnt/ost1/O/0/d5/5: parent=[0x200000400:0x5:0x0] stripe=1 ### Correct.
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop
            # lfs swap_layouts f0 f1
            # ls -lh 
            total 2.0M
            -rw-r--r-- 1 root root    0 Apr 10 10:22 f0
            -rw-r--r-- 1 root root 2.0M Apr 10 10:22 f1
            # umount /mnt/ost1
            # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop
            # ll_decode_filter_fid /mnt/ost1/O/0/d5/5
            /mnt/ost1/O/0/d5/5: parent=[0x200000400:0x5:0x0] stripe=1 ### Incorrect parent should be [0x200000400:0x6:0x0].
            
            jhammond John Hammond added a comment - - edited Hi Bob. After the layout swap ll_decode_filter_fid should return the FID of the new parent. # lfs setstripe -c2 f0 # touch f1 # dd if=/dev/zero of=f0 bs=1M count=2 2+0 records in 2+0 records out 2097152 bytes (2.1 MB) copied, 0.02666 s, 78.7 MB/s # lfs path2fid f0 [0x200000400:0x5:0x0] # lfs path2fid f1 [0x200000400:0x6:0x0] # lfs getstripe f0 f0 lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 5 0x5 0 0 5 0x5 0 # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop # ls -l /mnt/ost1/O/0/d5/5 -rw-rw-rw- 1 root root 1048576 Apr 10 10:20 /mnt/ost1/O/0/d5/5 # ll_decode_filter_fid /mnt/ost1/O/0/d5/5 /mnt/ost1/O/0/d5/5: parent=[0x200000400:0x5:0x0] stripe=1 ### Correct. # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t lustre -o loop # lfs swap_layouts f0 f1 # ls -lh total 2.0M -rw-r--r-- 1 root root 0 Apr 10 10:22 f0 -rw-r--r-- 1 root root 2.0M Apr 10 10:22 f1 # umount /mnt/ost1 # mount /tmp/lustre-ost1 /mnt/ost1 -t ldiskfs -o loop # ll_decode_filter_fid /mnt/ost1/O/0/d5/5 /mnt/ost1/O/0/d5/5: parent=[0x200000400:0x5:0x0] stripe=1 ### Incorrect parent should be [0x200000400:0x6:0x0].
            bogl Bob Glossman (Inactive) added a comment - - edited

            From the discussion in LU-2677 it isn't clear if there is more to be done than is in the fix http://review.whamcloud.com/#change,5838 for LU-2677

            bogl Bob Glossman (Inactive) added a comment - - edited From the discussion in LU-2677 it isn't clear if there is more to be done than is in the fix http://review.whamcloud.com/#change,5838 for LU-2677

            People

              tappro Mikhail Pershin
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: