Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4850

DNE Striped Directory - Changing default striping only works on MDT0

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • Lustre master on CentOS clients and servers. Commit: Idcfb918e0d4e203ff0f9c6d838a68b1a204ee3bd (One commit behind current head.)
    • 3
    • 13361

    Description

      I've been doing some simple testing of the new DNE striped directories, and I'm seeing some strangely inconsistent behavior. This system has 2 MDSes, and 2 MDTs per MDS. Clients and servers are all running master as described in "Environment".

      Apologies in advance for the lengthy description, it takes a while to make the problem clear/show proof.

      I'm seeing inconsistencies when I change the default stripe of a directory.

      Create a striped directory, then set the default striping. Everything, at this point, works fine. All directories created get the correct striping:

      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11,test12}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12
      test1
      lmv_stripe_count: 4
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d534:0x0]		
           1		 [0x640000400:0x1b:0x0]		
           2		 [0x680000400:0x1b:0x0]		
           3		 [0x6c0000400:0x1b:0x0]		
      
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x21:0x0]		
           0		 [0x600000403:0x19:0x0]		
           2		 [0x680000403:0x19:0x0]		
           3		 [0x6c0000402:0x19:0x0]		
      
      test3
      lmv_stripe_count: 4
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa32:0x0]		
           0		 [0x600000404:0x39:0x0]		
           1		 [0x640000402:0x39:0x0]		
           3		 [0x6c0000403:0x39:0x0]		
      
      test4
      lmv_stripe_count: 4
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0x9e:0x0]		
           0		 [0x600000405:0x2d:0x0]		
           1		 [0x640000403:0x2d:0x0]		
           2		 [0x680000404:0x2d:0x0]		
      

      All the directories are created with the correct striping information set.

      The problems come in when I start changing the default striping. This only seems to work on MDT0.

      Note the hash function (see output above) puts test1 on MDT0, test2 on MDT1, test3 on MDT2, and test4 on MDT3, which is handy for showing this bug.

      If no subdirectories have been created before changing the default striping, everything works just fine:

      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
      test1
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d709:0x0]		
      
      test2
      lmv_stripe_count: 1
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x95:0x0]		
      
      test3
      lmv_stripe_count: 1
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa35:0x0]		
      
      test4
      lmv_stripe_count: 1
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0xa1:0x0]		
      
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# rm -rf *
      

      Changing the default striping after a subdirectory has been created only seems to work on MDT0.

      Here's all four directories at once:

      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
      test1
      lmv_stripe_count: 4
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d735:0x0]		
           1		 [0x640000400:0x8f:0x0]		
           2		 [0x680000400:0x8f:0x0]		
           3		 [0x6c0000400:0x8f:0x0]		
      
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa5:0x0]		
           0		 [0x600000403:0x62:0x0]		
           2		 [0x680000403:0x62:0x0]		
           3		 [0x6c0000402:0x62:0x0]		
      
      test3
      lmv_stripe_count: 4
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa3c:0x0]		
           0		 [0x600000404:0x40:0x0]		
           1		 [0x640000402:0x40:0x0]		
           3		 [0x6c0000403:0x40:0x0]		
      
      test4
      lmv_stripe_count: 4
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0xa5:0x0]		
           0		 [0x600000405:0x33:0x0]		
           1		 [0x640000403:0x33:0x0]		
           2		 [0x680000404:0x33:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
      test1
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d738:0x0]		
      
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa6:0x0]		
           0		 [0x600000403:0x63:0x0]		
           2		 [0x680000403:0x63:0x0]		
           3		 [0x6c0000402:0x63:0x0]		
      
      test3
      lmv_stripe_count: 4
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa3d:0x0]		
           0		 [0x600000404:0x41:0x0]		
           1		 [0x640000402:0x41:0x0]		
           3		 [0x6c0000403:0x41:0x0]		
      
      test4
      lmv_stripe_count: 4
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0xa6:0x0]		
           0		 [0x600000405:0x34:0x0]		
           1		 [0x640000403:0x34:0x0]		
           2		 [0x680000404:0x34:0x0]		
      

      Here's the directories by themselves. Here's test1, on MDT0:

      [root@centclient02 centssm1]# DIR=test1
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test1
      lmv_stripe_count: 4
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d70d:0x0]		
           1		 [0x640000400:0x80:0x0]		
           2		 [0x680000400:0x80:0x0]		
           3		 [0x6c0000400:0x80:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test1
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d70f:0x0]	
      

      But here's test2, which goes on MDT1:

      [root@centclient02 centssm1]# DIR=test2
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x9d:0x0]		
           0		 [0x600000403:0x5a:0x0]		
           2		 [0x680000403:0x5a:0x0]		
           3		 [0x6c0000402:0x5a:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x9e:0x0]		
           0		 [0x600000403:0x5b:0x0]		
           2		 [0x680000403:0x5b:0x0]		
           3		 [0x6c0000402:0x5b:0x0]		
      

      We see the same results for test3 on MDT2, and test4 on MDT3.

      Also note that it's creating any directory that goes on the particular MDT - It's not limited to being the same directory, as we can see from this example - test6 and test2 both go on MDT1:

      [root@centclient02 centssm1]# DIR=test6
      [root@centclient02 centssm1]# DIR2=test2
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test6
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x9f:0x0]		
           0		 [0x600000403:0x5c:0x0]		
           2		 [0x680000403:0x5c:0x0]		
           3		 [0x6c0000402:0x5c:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR2
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR2
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa0:0x0]		
           0		 [0x600000403:0x5d:0x0]		
           2		 [0x680000403:0x5d:0x0]		
           3		 [0x6c0000402:0x5d:0x0]		
      
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# rm -rf *
      

      And for additional confirmation, if I create test2 (which goes on MDT1) and then change default striping and create test3 (which goes on MDT2), we don't see the problem:

      [root@centclient02 centssm1]# DIR=test2
      [root@centclient02 centssm1]# DIR2=test3
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa1:0x0]		
           0		 [0x600000403:0x5e:0x0]		
           2		 [0x680000403:0x5e:0x0]		
           3		 [0x6c0000402:0x5e:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR2
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR2
      test3
      lmv_stripe_count: 1
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa3a:0x0]	
      

      When the hash function is changed to "all_char", this behavior continues, but name to MDT mappings change, so we see it for different directories.

      Attachments

        Issue Links

          Activity

            [LU-4850] DNE Striped Directory - Changing default striping only works on MDT0
            di.wang Di Wang added a comment -

            The patch has been landed to master with 9511.

            di.wang Di Wang added a comment - The patch has been landed to master with 9511.
            di.wang Di Wang added a comment -

            The later LBUG problem is actually brought in by another fix (http://review.whamcloud.com/#/c/9511/), which is not being landed to master yet. so I will add the fix to 9511. http://review.whamcloud.com/#/c/9862 and http://review.whamcloud.com/#/c/9883 will be used to track other fixes of this ticket.

            di.wang Di Wang added a comment - The later LBUG problem is actually brought in by another fix ( http://review.whamcloud.com/#/c/9511/ ), which is not being landed to master yet. so I will add the fix to 9511. http://review.whamcloud.com/#/c/9862 and http://review.whamcloud.com/#/c/9883 will be used to track other fixes of this ticket.
            di.wang Di Wang added a comment -

            Hi, Patrick, I had some trouble to access git right now, so I can not push patch to review until next monday. But I attached the fix above (patch.diff), which might fix the LBUG issue you met, please try. Thanks.

            di.wang Di Wang added a comment - Hi, Patrick, I had some trouble to access git right now, so I can not push patch to review until next monday. But I attached the fix above (patch.diff), which might fix the LBUG issue you met, please try. Thanks.
            di.wang Di Wang added a comment -

            Patrick, thanks for testing. According to the debug log, it seems MDT mis-regarded this striped_directory as non-striped directory. I can not figure out the reason without client side debug log. Unfortunately, even I repeat what you did, I can not reproduce the problem here. Could you also please add patch http://review.whamcloud.com/#/c/9862/ in your build ? And collect the debug log(with -1 level) on client side as well? Thanks again for all of help.

            My test

            echo 0 > /proc/sys/lnet/panic_on_lbug 
            + echo 0
            echo -1 > /proc/sys/lnet/debug
            + echo -1
            echo 50 > /proc/sys/lnet/debug_mb
            + echo 50
            cd /mnt/lustre
            + cd /mnt/lustre
            
            /home/work/lustre-release/lustre/utils/lfs setdirstripe -i 1 -c4 striped_directory
            + /home/work/lustre-release/lustre/utils/lfs setdirstripe -i 1 -c4 striped_directory
             /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 4 -D striped_directory
            + /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 4 -D striped_directory
            mkdir -p striped_directory/{test1,test2,test3,test4}
            + mkdir -p striped_directory/test1 striped_directory/test2 striped_directory/test3 striped_directory/test4
            cd striped_directory
            + cd striped_directory
            /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4
            + /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4
            test1
            lmv_stripe_count: 4
            lmv_stripe_offset: 1
            mdtidx		 FID[seq:oid:ver]
                 1		 [0x280000401:0x2:0x0]		
                 0		 [0x340000400:0x2:0x0]		
                 2		 [0x2c0000400:0x2:0x0]		
                 3		 [0x300000400:0x2:0x0]		
            test2
            lmv_stripe_count: 4
            lmv_stripe_offset: 2
            mdtidx		 FID[seq:oid:ver]
                 2		 [0x2c0000402:0x1:0x0]		
                 0		 [0x340000401:0x1:0x0]		
                 1		 [0x280000402:0x1:0x0]		
                 3		 [0x300000401:0x1:0x0]		
            test3
            lmv_stripe_count: 4
            lmv_stripe_offset: 3
            mdtidx		 FID[seq:oid:ver]
                 3		 [0x300000403:0x1:0x0]		
                 0		 [0x340000402:0x1:0x0]		
                 1		 [0x280000403:0x1:0x0]		
                 2		 [0x2c0000403:0x1:0x0]		
            test4
            lmv_stripe_count: 4
            lmv_stripe_offset: 0
            mdtidx		 FID[seq:oid:ver]
                 0		 [0x340000404:0x1:0x0]		
                 1		 [0x280000404:0x1:0x0]		
                 2		 [0x2c0000404:0x1:0x0]		
                 3		 [0x300000404:0x1:0x0]		
            rm -rf *
            + rm -rf test1 test2 test3 test4
            cd .. 
            + cd ..
            
            /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 1 -D striped_directory
            + /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 1 -D striped_directory
            
            mkdir -p striped_directory/{test1,test2,test3,test4}
            + mkdir -p striped_directory/test1 striped_directory/test2 striped_directory/test3 striped_directory/test4
            
            cd striped_directory
            + cd striped_directory
            
            /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4
            + /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4
            test1
            lmv_stripe_count: 1
            lmv_stripe_offset: 1
            mdtidx		 FID[seq:oid:ver]
                 1		 [0x280000401:0x3:0x0]		
            test2
            lmv_stripe_count: 1
            lmv_stripe_offset: 2
            mdtidx		 FID[seq:oid:ver]
                 2		 [0x2c0000402:0x2:0x0]		
            test3
            lmv_stripe_count: 1
            lmv_stripe_offset: 3
            mdtidx		 FID[seq:oid:ver]
                 3		 [0x300000403:0x2:0x0]		
            test4
            lmv_stripe_count: 1
            lmv_stripe_offset: 0
            mdtidx		 FID[seq:oid:ver]
                 0		 [0x340000404:0x2:0x0]		
            
            cd ..
            + cd ..
            rm -rf *
            + rm -rf striped_directory
            
            di.wang Di Wang added a comment - Patrick, thanks for testing. According to the debug log, it seems MDT mis-regarded this striped_directory as non-striped directory. I can not figure out the reason without client side debug log. Unfortunately, even I repeat what you did, I can not reproduce the problem here. Could you also please add patch http://review.whamcloud.com/#/c/9862/ in your build ? And collect the debug log(with -1 level) on client side as well? Thanks again for all of help. My test echo 0 > /proc/sys/lnet/panic_on_lbug + echo 0 echo -1 > /proc/sys/lnet/debug + echo -1 echo 50 > /proc/sys/lnet/debug_mb + echo 50 cd /mnt/lustre + cd /mnt/lustre /home/work/lustre-release/lustre/utils/lfs setdirstripe -i 1 -c4 striped_directory + /home/work/lustre-release/lustre/utils/lfs setdirstripe -i 1 -c4 striped_directory /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 4 -D striped_directory + /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 4 -D striped_directory mkdir -p striped_directory/{test1,test2,test3,test4} + mkdir -p striped_directory/test1 striped_directory/test2 striped_directory/test3 striped_directory/test4 cd striped_directory + cd striped_directory /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4 + /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4 test1 lmv_stripe_count: 4 lmv_stripe_offset: 1 mdtidx FID[seq:oid:ver] 1 [0x280000401:0x2:0x0] 0 [0x340000400:0x2:0x0] 2 [0x2c0000400:0x2:0x0] 3 [0x300000400:0x2:0x0] test2 lmv_stripe_count: 4 lmv_stripe_offset: 2 mdtidx FID[seq:oid:ver] 2 [0x2c0000402:0x1:0x0] 0 [0x340000401:0x1:0x0] 1 [0x280000402:0x1:0x0] 3 [0x300000401:0x1:0x0] test3 lmv_stripe_count: 4 lmv_stripe_offset: 3 mdtidx FID[seq:oid:ver] 3 [0x300000403:0x1:0x0] 0 [0x340000402:0x1:0x0] 1 [0x280000403:0x1:0x0] 2 [0x2c0000403:0x1:0x0] test4 lmv_stripe_count: 4 lmv_stripe_offset: 0 mdtidx FID[seq:oid:ver] 0 [0x340000404:0x1:0x0] 1 [0x280000404:0x1:0x0] 2 [0x2c0000404:0x1:0x0] 3 [0x300000404:0x1:0x0] rm -rf * + rm -rf test1 test2 test3 test4 cd .. + cd .. /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 1 -D striped_directory + /home/work/lustre-release/lustre/utils/lfs setdirstripe -c 1 -D striped_directory mkdir -p striped_directory/{test1,test2,test3,test4} + mkdir -p striped_directory/test1 striped_directory/test2 striped_directory/test3 striped_directory/test4 cd striped_directory + cd striped_directory /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4 + /home/work/lustre-release/lustre/utils/lfs getdirstripe test1 test2 test3 test4 test1 lmv_stripe_count: 1 lmv_stripe_offset: 1 mdtidx FID[seq:oid:ver] 1 [0x280000401:0x3:0x0] test2 lmv_stripe_count: 1 lmv_stripe_offset: 2 mdtidx FID[seq:oid:ver] 2 [0x2c0000402:0x2:0x0] test3 lmv_stripe_count: 1 lmv_stripe_offset: 3 mdtidx FID[seq:oid:ver] 3 [0x300000403:0x2:0x0] test4 lmv_stripe_count: 1 lmv_stripe_offset: 0 mdtidx FID[seq:oid:ver] 0 [0x340000404:0x2:0x0] cd .. + cd .. rm -rf * + rm -rf striped_directory

            With those three patches installed on client and server, I reformatted, set debug and debug_mb as requested (I used 1000 for debug_mb, rather than just 80) and ran the following commands about two-three minutes after startup:
            lfs setdirstripe -c 4 striped_directory
            lfs setdirstripe -c 4 -D striped_directory
            mkdir -p striped_directory/

            {test1,test2,test3,test4}
            cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
            rm -rf *
            cd ..
            lfs setdirstripe -c 1 -D striped_directory
            mkdir -p striped_directory/{test1,test2,test3,test4}

            cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
            cd ..
            rm -rf *

            The primary MDS crashed when I ran the last command.

            Here's the stack trace:
            <0>LustreError: 14169:0:(osd_handler.c:2471:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed:
            <0>LustreError: 14169:0:(osd_handler.c:2471:osd_object_destroy()) LBUG
            <4>Pid: 14169, comm: mdt_rdpg00_001
            <4>
            <4>Call Trace:
            <4> [<ffffffffa089b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4> [<ffffffffa089be97>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4> [<ffffffffa17290a9>] osd_object_destroy+0x459/0x460 [osd_ldiskfs]
            <4> [<ffffffffa19c3490>] lod_object_destroy+0x360/0x800 [lod]
            <4> [<ffffffffa1a0e6e0>] mdd_close+0x8e0/0xb80 [mdd]
            <4> [<ffffffffa19000b9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt]
            <4> [<ffffffffa08ac581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa1903b73>] mdt_close+0x743/0xae0 [mdt]
            <4> [<ffffffffa12b19ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa126098a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa125fc70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            Dump and logs from client and other servers will be here shortly:

            ftp.whamcloud.com
            uploads/LU-4850/LU-4850_140407.tar.gz

            paf Patrick Farrell (Inactive) added a comment - With those three patches installed on client and server, I reformatted, set debug and debug_mb as requested (I used 1000 for debug_mb, rather than just 80) and ran the following commands about two-three minutes after startup: lfs setdirstripe -c 4 striped_directory lfs setdirstripe -c 4 -D striped_directory mkdir -p striped_directory/ {test1,test2,test3,test4} cd striped_directory/; lfs getdirstripe test1 test2 test3 test4 rm -rf * cd .. lfs setdirstripe -c 1 -D striped_directory mkdir -p striped_directory/{test1,test2,test3,test4} cd striped_directory/; lfs getdirstripe test1 test2 test3 test4 cd .. rm -rf * The primary MDS crashed when I ran the last command. Here's the stack trace: <0>LustreError: 14169:0:(osd_handler.c:2471:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: <0>LustreError: 14169:0:(osd_handler.c:2471:osd_object_destroy()) LBUG <4>Pid: 14169, comm: mdt_rdpg00_001 <4> <4>Call Trace: <4> [<ffffffffa089b895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa089be97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa17290a9>] osd_object_destroy+0x459/0x460 [osd_ldiskfs] <4> [<ffffffffa19c3490>] lod_object_destroy+0x360/0x800 [lod] <4> [<ffffffffa1a0e6e0>] mdd_close+0x8e0/0xb80 [mdd] <4> [<ffffffffa19000b9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt] <4> [<ffffffffa08ac581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa1903b73>] mdt_close+0x743/0xae0 [mdt] <4> [<ffffffffa12b19ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa126098a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa125fc70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 Dump and logs from client and other servers will be here shortly: ftp.whamcloud.com uploads/ LU-4850 / LU-4850 _140407.tar.gz

            Di -

            Sure. Just a heads up, I'm going to LUG this week, and I don't think I'll get to testing this before leaving for Miami. In that case, I might not test again until after LUG.

            paf Patrick Farrell (Inactive) added a comment - Di - Sure. Just a heads up, I'm going to LUG this week, and I don't think I'll get to testing this before leaving for Miami. In that case, I might not test again until after LUG.
            di.wang Di Wang added a comment -

            Hello, Patrick, Unfortunately, I can not reproduce the LBUG you mentioned here, and also the dump log information did not provide much information. Though I add another fixes based on 9511 and 9883, http://review.whamcloud.com/9890. Probably it can not fix the problem you saw, but I also add some debug information there. Could you please repeat these in your system.

            Btw: before you start the test, could you please set

            lctl set_param debug_mb=80
            lctl set_param debug=-1

            So provide more information in the dump. Thanks again for all of helps.

            WangDi

            di.wang Di Wang added a comment - Hello, Patrick, Unfortunately, I can not reproduce the LBUG you mentioned here, and also the dump log information did not provide much information. Though I add another fixes based on 9511 and 9883, http://review.whamcloud.com/9890 . Probably it can not fix the problem you saw, but I also add some debug information there. Could you please repeat these in your system. Btw: before you start the test, could you please set lctl set_param debug_mb=80 lctl set_param debug=-1 So provide more information in the dump. Thanks again for all of helps. WangDi

            I tried 1 and 2 above, and everything worked fine. (Waiting a few minutes before creating any directories.)

            I was able to rm everything on the file system.

            Then I stopped the file system and started it (our 'start' process is waiting for all the target mount commands to return), and mounted it on the client.

            I immediately created some striped directories and set default striping and put some child directories in them. As expected, the default striping didn't work right away. I was able to delete those directories, then I created them again. Striping still didn't work.

            I waited a few minutes, then created a separate striped directory and set default striping and created child directories. The child directories got the correct striping from the parent. When I tried to delete all of the file system contents, I got the LBUG I gave above on the primary MDS.

            The -1 dk log dump for that is on the whamcloud FTP site here:

            uploads/LU-4850/LU-4850-140403_nlink_lbug.tar.gz

            Along with logs from the other MDS and client.

            Just a heads up - I'm stopping work for the night, so I won't test anything else until at least tomorrow.

            paf Patrick Farrell (Inactive) added a comment - - edited I tried 1 and 2 above, and everything worked fine. (Waiting a few minutes before creating any directories.) I was able to rm everything on the file system. Then I stopped the file system and started it (our 'start' process is waiting for all the target mount commands to return), and mounted it on the client. I immediately created some striped directories and set default striping and put some child directories in them. As expected, the default striping didn't work right away. I was able to delete those directories, then I created them again. Striping still didn't work. I waited a few minutes, then created a separate striped directory and set default striping and created child directories. The child directories got the correct striping from the parent. When I tried to delete all of the file system contents, I got the LBUG I gave above on the primary MDS. The -1 dk log dump for that is on the whamcloud FTP site here: uploads/ LU-4850 / LU-4850 -140403_nlink_lbug.tar.gz Along with logs from the other MDS and client. Just a heads up - I'm stopping work for the night, so I won't test anything else until at least tomorrow.
            di.wang Di Wang added a comment -

            Hmm, yes, please grub -1 dump log for me. Thanks. Just want to be clear, when you creating striped directory, no matter inherit stripes from default stripe or created by user specified stripes, MDT might adjust the stripe_count, in your case, it might be not enough MDTs have been fully started up yet. So you probably try this after startup

            1. create a directory and set default stripe.
            2. wait a few mins, then create sub-directories, to see whether they get correct stripes.

            Thanks.

            di.wang Di Wang added a comment - Hmm, yes, please grub -1 dump log for me. Thanks. Just want to be clear, when you creating striped directory, no matter inherit stripes from default stripe or created by user specified stripes, MDT might adjust the stripe_count, in your case, it might be not enough MDTs have been fully started up yet. So you probably try this after startup 1. create a directory and set default stripe. 2. wait a few mins, then create sub-directories, to see whether they get correct stripes. Thanks.

            Hmmm. Di, what I saw was when I created a striped directory with default striping set in it right after start up, and created directories in it, that didn't work. The child directories didn't get the default stripe setting from the parent.

            I deleted all the directories and tried again, and got the same results. A minute or two later, I tried again, and it worked: The child directories got the striping settings from the parent.

            I then removed the child directories and changed the striping default for the parent, and re-created the child directories. The new default striping pattern was applied correctly to the children.

            However, I just tried to repeat this, to see if it was right, and the primary MDS crashed when trying to rm -rf * (deleting a striped directory and several children). I don't have logs from this, but I could grab a dump if needed:

            LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed:
            <0>LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) LBUG
            <4>Pid: 2957, comm: mdt_rdpg00_002
            <4>
            <4>Call Trace:
            <4> [<ffffffffa0302895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4> [<ffffffffa0302e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4> [<ffffffffa0c29fd9>] osd_object_destroy+0x459/0x460 [osd_ldiskfs]
            <4> [<ffffffffa0e90240>] lod_object_destroy+0x2c0/0x760 [lod]
            <4> [<ffffffffa0edc6e0>] mdd_close+0x8e0/0xb80 [mdd]
            <4> [<ffffffffa0dcf0b9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt]
            <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa0dd2b73>] mdt_close+0x743/0xae0 [mdt]
            <4> [<ffffffffa07059ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa06b498a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
            <4>
            <0>Kernel panic - not syncing: LBUG
            <4>Pid: 2957, comm: mdt_rdpg00_002 Not tainted 2.6.32.431.5.1.el6_lustre #1
            <4>Call Trace:
            <4> [<ffffffff81527983>] ? panic+0xa7/0x16f
            <4> [<ffffffffa0302eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
            <4> [<ffffffffa0c29fd9>] ? osd_object_destroy+0x459/0x460 [osd_ldiskfs]
            <4> [<ffffffffa0e90240>] ? lod_object_destroy+0x2c0/0x760 [lod]
            <4> [<ffffffffa0edc6e0>] ? mdd_close+0x8e0/0xb80 [mdd]
            <4> [<ffffffffa0dcf0b9>] ? mdt_mfd_close+0x4a9/0x1ba0 [mdt]
            <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa0dd2b73>] ? mdt_close+0x743/0xae0 [mdt]
            <4> [<ffffffffa07059ac>] ? tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa06b498a>] ? ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] ? kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            paf Patrick Farrell (Inactive) added a comment - Hmmm. Di, what I saw was when I created a striped directory with default striping set in it right after start up, and created directories in it, that didn't work. The child directories didn't get the default stripe setting from the parent. I deleted all the directories and tried again, and got the same results. A minute or two later, I tried again, and it worked: The child directories got the striping settings from the parent. I then removed the child directories and changed the striping default for the parent, and re-created the child directories. The new default striping pattern was applied correctly to the children. However, I just tried to repeat this, to see if it was right, and the primary MDS crashed when trying to rm -rf * (deleting a striped directory and several children). I don't have logs from this, but I could grab a dump if needed: LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: <0>LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) LBUG <4>Pid: 2957, comm: mdt_rdpg00_002 <4> <4>Call Trace: <4> [<ffffffffa0302895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0302e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0c29fd9>] osd_object_destroy+0x459/0x460 [osd_ldiskfs] <4> [<ffffffffa0e90240>] lod_object_destroy+0x2c0/0x760 [lod] <4> [<ffffffffa0edc6e0>] mdd_close+0x8e0/0xb80 [mdd] <4> [<ffffffffa0dcf0b9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt] <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa0dd2b73>] mdt_close+0x743/0xae0 [mdt] <4> [<ffffffffa07059ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa06b498a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 2957, comm: mdt_rdpg00_002 Not tainted 2.6.32.431.5.1.el6_lustre #1 <4>Call Trace: <4> [<ffffffff81527983>] ? panic+0xa7/0x16f <4> [<ffffffffa0302eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0c29fd9>] ? osd_object_destroy+0x459/0x460 [osd_ldiskfs] <4> [<ffffffffa0e90240>] ? lod_object_destroy+0x2c0/0x760 [lod] <4> [<ffffffffa0edc6e0>] ? mdd_close+0x8e0/0xb80 [mdd] <4> [<ffffffffa0dcf0b9>] ? mdt_mfd_close+0x4a9/0x1ba0 [mdt] <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa0dd2b73>] ? mdt_close+0x743/0xae0 [mdt] <4> [<ffffffffa07059ac>] ? tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa06b498a>] ? ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            People

              di.wang Di Wang
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: