Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4850

DNE Striped Directory - Changing default striping only works on MDT0

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • Lustre master on CentOS clients and servers. Commit: Idcfb918e0d4e203ff0f9c6d838a68b1a204ee3bd (One commit behind current head.)
    • 3
    • 13361

    Description

      I've been doing some simple testing of the new DNE striped directories, and I'm seeing some strangely inconsistent behavior. This system has 2 MDSes, and 2 MDTs per MDS. Clients and servers are all running master as described in "Environment".

      Apologies in advance for the lengthy description, it takes a while to make the problem clear/show proof.

      I'm seeing inconsistencies when I change the default stripe of a directory.

      Create a striped directory, then set the default striping. Everything, at this point, works fine. All directories created get the correct striping:

      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11,test12}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12
      test1
      lmv_stripe_count: 4
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d534:0x0]		
           1		 [0x640000400:0x1b:0x0]		
           2		 [0x680000400:0x1b:0x0]		
           3		 [0x6c0000400:0x1b:0x0]		
      
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x21:0x0]		
           0		 [0x600000403:0x19:0x0]		
           2		 [0x680000403:0x19:0x0]		
           3		 [0x6c0000402:0x19:0x0]		
      
      test3
      lmv_stripe_count: 4
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa32:0x0]		
           0		 [0x600000404:0x39:0x0]		
           1		 [0x640000402:0x39:0x0]		
           3		 [0x6c0000403:0x39:0x0]		
      
      test4
      lmv_stripe_count: 4
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0x9e:0x0]		
           0		 [0x600000405:0x2d:0x0]		
           1		 [0x640000403:0x2d:0x0]		
           2		 [0x680000404:0x2d:0x0]		
      

      All the directories are created with the correct striping information set.

      The problems come in when I start changing the default striping. This only seems to work on MDT0.

      Note the hash function (see output above) puts test1 on MDT0, test2 on MDT1, test3 on MDT2, and test4 on MDT3, which is handy for showing this bug.

      If no subdirectories have been created before changing the default striping, everything works just fine:

      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
      test1
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d709:0x0]		
      
      test2
      lmv_stripe_count: 1
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x95:0x0]		
      
      test3
      lmv_stripe_count: 1
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa35:0x0]		
      
      test4
      lmv_stripe_count: 1
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0xa1:0x0]		
      
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# rm -rf *
      

      Changing the default striping after a subdirectory has been created only seems to work on MDT0.

      Here's all four directories at once:

      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
      test1
      lmv_stripe_count: 4
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d735:0x0]		
           1		 [0x640000400:0x8f:0x0]		
           2		 [0x680000400:0x8f:0x0]		
           3		 [0x6c0000400:0x8f:0x0]		
      
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa5:0x0]		
           0		 [0x600000403:0x62:0x0]		
           2		 [0x680000403:0x62:0x0]		
           3		 [0x6c0000402:0x62:0x0]		
      
      test3
      lmv_stripe_count: 4
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa3c:0x0]		
           0		 [0x600000404:0x40:0x0]		
           1		 [0x640000402:0x40:0x0]		
           3		 [0x6c0000403:0x40:0x0]		
      
      test4
      lmv_stripe_count: 4
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0xa5:0x0]		
           0		 [0x600000405:0x33:0x0]		
           1		 [0x640000403:0x33:0x0]		
           2		 [0x680000404:0x33:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/{test1,test2,test3,test4}
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe test1 test2 test3 test4
      test1
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d738:0x0]		
      
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa6:0x0]		
           0		 [0x600000403:0x63:0x0]		
           2		 [0x680000403:0x63:0x0]		
           3		 [0x6c0000402:0x63:0x0]		
      
      test3
      lmv_stripe_count: 4
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa3d:0x0]		
           0		 [0x600000404:0x41:0x0]		
           1		 [0x640000402:0x41:0x0]		
           3		 [0x6c0000403:0x41:0x0]		
      
      test4
      lmv_stripe_count: 4
      lmv_stripe_offset: 3
      mdtidx		 FID[seq:oid:ver]
           3		 [0x6c0000401:0xa6:0x0]		
           0		 [0x600000405:0x34:0x0]		
           1		 [0x640000403:0x34:0x0]		
           2		 [0x680000404:0x34:0x0]		
      

      Here's the directories by themselves. Here's test1, on MDT0:

      [root@centclient02 centssm1]# DIR=test1
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test1
      lmv_stripe_count: 4
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d70d:0x0]		
           1		 [0x640000400:0x80:0x0]		
           2		 [0x680000400:0x80:0x0]		
           3		 [0x6c0000400:0x80:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test1
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx		 FID[seq:oid:ver]
           0		 [0x600000402:0x1d70f:0x0]	
      

      But here's test2, which goes on MDT1:

      [root@centclient02 centssm1]# DIR=test2
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x9d:0x0]		
           0		 [0x600000403:0x5a:0x0]		
           2		 [0x680000403:0x5a:0x0]		
           3		 [0x6c0000402:0x5a:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x9e:0x0]		
           0		 [0x600000403:0x5b:0x0]		
           2		 [0x680000403:0x5b:0x0]		
           3		 [0x6c0000402:0x5b:0x0]		
      

      We see the same results for test3 on MDT2, and test4 on MDT3.

      Also note that it's creating any directory that goes on the particular MDT - It's not limited to being the same directory, as we can see from this example - test6 and test2 both go on MDT1:

      [root@centclient02 centssm1]# DIR=test6
      [root@centclient02 centssm1]# DIR2=test2
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test6
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0x9f:0x0]		
           0		 [0x600000403:0x5c:0x0]		
           2		 [0x680000403:0x5c:0x0]		
           3		 [0x6c0000402:0x5c:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR2
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR2
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa0:0x0]		
           0		 [0x600000403:0x5d:0x0]		
           2		 [0x680000403:0x5d:0x0]		
           3		 [0x6c0000402:0x5d:0x0]		
      
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# rm -rf *
      

      And for additional confirmation, if I create test2 (which goes on MDT1) and then change default striping and create test3 (which goes on MDT2), we don't see the problem:

      [root@centclient02 centssm1]# DIR=test2
      [root@centclient02 centssm1]# DIR2=test3
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 striped_directory
      [root@centclient02 centssm1]# lfs setdirstripe -c 4 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR
      test2
      lmv_stripe_count: 4
      lmv_stripe_offset: 1
      mdtidx		 FID[seq:oid:ver]
           1		 [0x640000401:0xa1:0x0]		
           0		 [0x600000403:0x5e:0x0]		
           2		 [0x680000403:0x5e:0x0]		
           3		 [0x6c0000402:0x5e:0x0]		
      
      [root@centclient02 striped_directory]# rm -rf *
      [root@centclient02 striped_directory]# cd ..
      [root@centclient02 centssm1]# lfs setdirstripe -c 1 -D striped_directory
      [root@centclient02 centssm1]# mkdir -p striped_directory/$DIR2
      [root@centclient02 centssm1]# cd striped_directory/; lfs getdirstripe $DIR2
      test3
      lmv_stripe_count: 1
      lmv_stripe_offset: 2
      mdtidx		 FID[seq:oid:ver]
           2		 [0x680000402:0x1aa3a:0x0]	
      

      When the hash function is changed to "all_char", this behavior continues, but name to MDT mappings change, so we see it for different directories.

      Attachments

        Issue Links

          Activity

            [LU-4850] DNE Striped Directory - Changing default striping only works on MDT0

            I tried 1 and 2 above, and everything worked fine. (Waiting a few minutes before creating any directories.)

            I was able to rm everything on the file system.

            Then I stopped the file system and started it (our 'start' process is waiting for all the target mount commands to return), and mounted it on the client.

            I immediately created some striped directories and set default striping and put some child directories in them. As expected, the default striping didn't work right away. I was able to delete those directories, then I created them again. Striping still didn't work.

            I waited a few minutes, then created a separate striped directory and set default striping and created child directories. The child directories got the correct striping from the parent. When I tried to delete all of the file system contents, I got the LBUG I gave above on the primary MDS.

            The -1 dk log dump for that is on the whamcloud FTP site here:

            uploads/LU-4850/LU-4850-140403_nlink_lbug.tar.gz

            Along with logs from the other MDS and client.

            Just a heads up - I'm stopping work for the night, so I won't test anything else until at least tomorrow.

            paf Patrick Farrell (Inactive) added a comment - - edited I tried 1 and 2 above, and everything worked fine. (Waiting a few minutes before creating any directories.) I was able to rm everything on the file system. Then I stopped the file system and started it (our 'start' process is waiting for all the target mount commands to return), and mounted it on the client. I immediately created some striped directories and set default striping and put some child directories in them. As expected, the default striping didn't work right away. I was able to delete those directories, then I created them again. Striping still didn't work. I waited a few minutes, then created a separate striped directory and set default striping and created child directories. The child directories got the correct striping from the parent. When I tried to delete all of the file system contents, I got the LBUG I gave above on the primary MDS. The -1 dk log dump for that is on the whamcloud FTP site here: uploads/ LU-4850 / LU-4850 -140403_nlink_lbug.tar.gz Along with logs from the other MDS and client. Just a heads up - I'm stopping work for the night, so I won't test anything else until at least tomorrow.

            Hmm, yes, please grub -1 dump log for me. Thanks. Just want to be clear, when you creating striped directory, no matter inherit stripes from default stripe or created by user specified stripes, MDT might adjust the stripe_count, in your case, it might be not enough MDTs have been fully started up yet. So you probably try this after startup

            1. create a directory and set default stripe.
            2. wait a few mins, then create sub-directories, to see whether they get correct stripes.

            Thanks.

            di.wang Di Wang (Inactive) added a comment - Hmm, yes, please grub -1 dump log for me. Thanks. Just want to be clear, when you creating striped directory, no matter inherit stripes from default stripe or created by user specified stripes, MDT might adjust the stripe_count, in your case, it might be not enough MDTs have been fully started up yet. So you probably try this after startup 1. create a directory and set default stripe. 2. wait a few mins, then create sub-directories, to see whether they get correct stripes. Thanks.

            Hmmm. Di, what I saw was when I created a striped directory with default striping set in it right after start up, and created directories in it, that didn't work. The child directories didn't get the default stripe setting from the parent.

            I deleted all the directories and tried again, and got the same results. A minute or two later, I tried again, and it worked: The child directories got the striping settings from the parent.

            I then removed the child directories and changed the striping default for the parent, and re-created the child directories. The new default striping pattern was applied correctly to the children.

            However, I just tried to repeat this, to see if it was right, and the primary MDS crashed when trying to rm -rf * (deleting a striped directory and several children). I don't have logs from this, but I could grab a dump if needed:

            LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed:
            <0>LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) LBUG
            <4>Pid: 2957, comm: mdt_rdpg00_002
            <4>
            <4>Call Trace:
            <4> [<ffffffffa0302895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4> [<ffffffffa0302e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4> [<ffffffffa0c29fd9>] osd_object_destroy+0x459/0x460 [osd_ldiskfs]
            <4> [<ffffffffa0e90240>] lod_object_destroy+0x2c0/0x760 [lod]
            <4> [<ffffffffa0edc6e0>] mdd_close+0x8e0/0xb80 [mdd]
            <4> [<ffffffffa0dcf0b9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt]
            <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa0dd2b73>] mdt_close+0x743/0xae0 [mdt]
            <4> [<ffffffffa07059ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa06b498a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
            <4>
            <0>Kernel panic - not syncing: LBUG
            <4>Pid: 2957, comm: mdt_rdpg00_002 Not tainted 2.6.32.431.5.1.el6_lustre #1
            <4>Call Trace:
            <4> [<ffffffff81527983>] ? panic+0xa7/0x16f
            <4> [<ffffffffa0302eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
            <4> [<ffffffffa0c29fd9>] ? osd_object_destroy+0x459/0x460 [osd_ldiskfs]
            <4> [<ffffffffa0e90240>] ? lod_object_destroy+0x2c0/0x760 [lod]
            <4> [<ffffffffa0edc6e0>] ? mdd_close+0x8e0/0xb80 [mdd]
            <4> [<ffffffffa0dcf0b9>] ? mdt_mfd_close+0x4a9/0x1ba0 [mdt]
            <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa0dd2b73>] ? mdt_close+0x743/0xae0 [mdt]
            <4> [<ffffffffa07059ac>] ? tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa06b498a>] ? ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] ? kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            paf Patrick Farrell (Inactive) added a comment - Hmmm. Di, what I saw was when I created a striped directory with default striping set in it right after start up, and created directories in it, that didn't work. The child directories didn't get the default stripe setting from the parent. I deleted all the directories and tried again, and got the same results. A minute or two later, I tried again, and it worked: The child directories got the striping settings from the parent. I then removed the child directories and changed the striping default for the parent, and re-created the child directories. The new default striping pattern was applied correctly to the children. However, I just tried to repeat this, to see if it was right, and the primary MDS crashed when trying to rm -rf * (deleting a striped directory and several children). I don't have logs from this, but I could grab a dump if needed: LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: <0>LustreError: 2957:0:(osd_handler.c:2471:osd_object_destroy()) LBUG <4>Pid: 2957, comm: mdt_rdpg00_002 <4> <4>Call Trace: <4> [<ffffffffa0302895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0302e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0c29fd9>] osd_object_destroy+0x459/0x460 [osd_ldiskfs] <4> [<ffffffffa0e90240>] lod_object_destroy+0x2c0/0x760 [lod] <4> [<ffffffffa0edc6e0>] mdd_close+0x8e0/0xb80 [mdd] <4> [<ffffffffa0dcf0b9>] mdt_mfd_close+0x4a9/0x1ba0 [mdt] <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa0dd2b73>] mdt_close+0x743/0xae0 [mdt] <4> [<ffffffffa07059ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa06b498a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 2957, comm: mdt_rdpg00_002 Not tainted 2.6.32.431.5.1.el6_lustre #1 <4>Call Trace: <4> [<ffffffff81527983>] ? panic+0xa7/0x16f <4> [<ffffffffa0302eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0c29fd9>] ? osd_object_destroy+0x459/0x460 [osd_ldiskfs] <4> [<ffffffffa0e90240>] ? lod_object_destroy+0x2c0/0x760 [lod] <4> [<ffffffffa0edc6e0>] ? mdd_close+0x8e0/0xb80 [mdd] <4> [<ffffffffa0dcf0b9>] ? mdt_mfd_close+0x4a9/0x1ba0 [mdt] <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa0dd2b73>] ? mdt_close+0x743/0xae0 [mdt] <4> [<ffffffffa07059ac>] ? tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa06b498a>] ? ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            Hello, Patrick, 9883 suppose to fix "the delay", i.e. you should not see the lbug you mentioned above, even if you create striped dir immediately after startup. Yes, the default striping only works when you create new directory.

            di.wang Di Wang (Inactive) added a comment - Hello, Patrick, 9883 suppose to fix "the delay", i.e. you should not see the lbug you mentioned above, even if you create striped dir immediately after startup. Yes, the default striping only works when you create new directory.

            So, I applied 9883 on top of 9511, reinstalled everywhere, and reformatted.

            For the first minute or two, setting the default striping on a directory didn't seem to affect its children, but after a minute or two, it did. [I was deleting and re-creating each time.] That now seems to be correct.

            After that minute or two of time, the problem I originally reported with adjusting default striping seems to be gone and there were no crashes.

            It sounds like fixing the delay on startup will be in a separate patch?

            paf Patrick Farrell (Inactive) added a comment - - edited So, I applied 9883 on top of 9511, reinstalled everywhere, and reformatted. For the first minute or two, setting the default striping on a directory didn't seem to affect its children, but after a minute or two, it did. [I was deleting and re-creating each time.] That now seems to be correct. After that minute or two of time, the problem I originally reported with adjusting default striping seems to be gone and there were no crashes. It sounds like fixing the delay on startup will be in a separate patch?

            Hello, Patrick, I create another patch based on 9511, http://review.whamcloud.com/#/c/9883/

            di.wang Di Wang (Inactive) added a comment - Hello, Patrick, I create another patch based on 9511, http://review.whamcloud.com/#/c/9883/

            Hmm, this seems because MDT did not check the intra connection (connections between MDTs) before it handle such requests. Sigh, we do this for the connection between MDT and OST. I will cook a patch, in the mean time, could you please wait a bit(say 2 or 3 mins) after setup system. Anyway I will cook a patch. Thanks again for testing.

            di.wang Di Wang (Inactive) added a comment - Hmm, this seems because MDT did not check the intra connection (connections between MDTs) before it handle such requests. Sigh, we do this for the connection between MDT and OST. I will cook a patch, in the mean time, could you please wait a bit(say 2 or 3 mins) after setup system. Anyway I will cook a patch. Thanks again for testing.

            Di - Installed latest version from this afternoon, reformatted, started, mounted on client, then when I run:

            lfs setdirstripe -c 4 striped_directory

            The MDS crashes:

            <0>LustreError: 2002:0:(fid_request.c:72:seq_client_rpc()) ASSERTION( exp != ((void *)0) && !IS_ERR(exp) ) failed:
            <0>LustreError: 2002:0:(fid_request.c:72:seq_client_rpc()) LBUG
            <4>Pid: 2002, comm: mdt00_004
            <4>
            <4>Call Trace:
            <4> [<ffffffffa0303895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4> [<ffffffffa0303e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4> [<ffffffffa093a265>] seq_client_rpc+0x7a5/0x910 [fid]
            <4> [<ffffffffa0314581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa093a7cf>] seq_client_alloc_seq+0x3ff/0x480 [fid]
            <4> [<ffffffffa09399c3>] ? seq_fid_alloc_prep+0x43/0xc0 [fid]
            <4> [<ffffffffa093accf>] seq_client_alloc_fid+0xdf/0x470 [fid]
            <4> [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
            <4> [<ffffffffa0f3d7de>] osp_fid_alloc+0xbe/0x100 [osp]
            <4> [<ffffffffa0e943e7>] lod_declare_xattr_set_lmv+0x7c7/0x1ff0 [lod]
            <4> [<ffffffffa0e95e50>] lod_dir_striping_create_internal+0x240/0x1460 [lod]
            <4> [<ffffffffa0c536d8>] ? osd_declare_inode_qid+0x1e8/0x270 [osd_ldiskfs]
            <4> [<ffffffffa030e4d8>] ? libcfs_log_return+0x28/0x40 [libcfs]
            <4> [<ffffffffa0e97297>] lod_declare_object_create+0x227/0x390 [lod]
            <4> [<ffffffffa0eda8c4>] mdd_declare_object_create_internal+0xb4/0x1e0 [mdd]
            <4> [<ffffffffa0eec7f3>] mdd_create+0x813/0x18a0 [mdd]
            <4> [<ffffffffa0dc2f83>] mdt_reint_create+0xac3/0xfa0 [mdt]
            <4> [<ffffffffa0314581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa0dbe5e1>] mdt_reint_rec+0x41/0xe0 [mdt]
            <4> [<ffffffffa0da3e13>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
            <4> [<ffffffffa0da469b>] mdt_reint+0x6b/0x120 [mdt]
            <4> [<ffffffffa07069ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa06b598a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa06b4c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            Dump with dk logs from all nodes involved is uploading here, should be up in a few minutes:

            ftp.whamcloud.com
            uploads/LU-4850/LU-4850-140403.tar.gz

            paf Patrick Farrell (Inactive) added a comment - Di - Installed latest version from this afternoon, reformatted, started, mounted on client, then when I run: lfs setdirstripe -c 4 striped_directory The MDS crashes: <0>LustreError: 2002:0:(fid_request.c:72:seq_client_rpc()) ASSERTION( exp != ((void *)0) && !IS_ERR(exp) ) failed: <0>LustreError: 2002:0:(fid_request.c:72:seq_client_rpc()) LBUG <4>Pid: 2002, comm: mdt00_004 <4> <4>Call Trace: <4> [<ffffffffa0303895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0303e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa093a265>] seq_client_rpc+0x7a5/0x910 [fid] <4> [<ffffffffa0314581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa093a7cf>] seq_client_alloc_seq+0x3ff/0x480 [fid] <4> [<ffffffffa09399c3>] ? seq_fid_alloc_prep+0x43/0xc0 [fid] <4> [<ffffffffa093accf>] seq_client_alloc_fid+0xdf/0x470 [fid] <4> [<ffffffff81065df0>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0f3d7de>] osp_fid_alloc+0xbe/0x100 [osp] <4> [<ffffffffa0e943e7>] lod_declare_xattr_set_lmv+0x7c7/0x1ff0 [lod] <4> [<ffffffffa0e95e50>] lod_dir_striping_create_internal+0x240/0x1460 [lod] <4> [<ffffffffa0c536d8>] ? osd_declare_inode_qid+0x1e8/0x270 [osd_ldiskfs] <4> [<ffffffffa030e4d8>] ? libcfs_log_return+0x28/0x40 [libcfs] <4> [<ffffffffa0e97297>] lod_declare_object_create+0x227/0x390 [lod] <4> [<ffffffffa0eda8c4>] mdd_declare_object_create_internal+0xb4/0x1e0 [mdd] <4> [<ffffffffa0eec7f3>] mdd_create+0x813/0x18a0 [mdd] <4> [<ffffffffa0dc2f83>] mdt_reint_create+0xac3/0xfa0 [mdt] <4> [<ffffffffa0314581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa0dbe5e1>] mdt_reint_rec+0x41/0xe0 [mdt] <4> [<ffffffffa0da3e13>] mdt_reint_internal+0x4c3/0x7c0 [mdt] <4> [<ffffffffa0da469b>] mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa07069ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa06b598a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa06b4c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 Dump with dk logs from all nodes involved is uploading here, should be up in a few minutes: ftp.whamcloud.com uploads/ LU-4850 / LU-4850 -140403.tar.gz

            Hi, Patrick, I updated the patch again, please check. Btw: we do not support default stripe index yet, only default stripe count for now. Please try. Thanks.

            di.wang Di Wang (Inactive) added a comment - Hi, Patrick, I updated the patch again, please check. Btw: we do not support default stripe index yet, only default stripe count for now. Please try. Thanks.

            OK, after reformatting and trying again (with patched client and servers), I've got a new crash for you.

            This happened on the primary MDS when I did this sequence of commands:

            lfs setdirstripe -c 4 striped_directory
            lfs setdirstripe -c 4 -D striped_directory
            mkdir -p striped_directory/test1
            lfs getdirstripe striped_directory/test1
            rm -rf striped_directory/*

            lfs setdirstripe -c 1 -D striped_directory
            mkdir -p striped_directory/test1

            The crash on the first MDS(MDT0 & MDT1) (which is where test1 is created - lmv offset is 0) occurred when I did the last command in that sequence.

            <1>BUG: unable to handle kernel NULL pointer dereference at (null)
            <1>IP: [<ffffffffa0e8ca1a>] lod_dir_declare_xattr_set+0x25a/0x4b0 [lod]
            [......]
            <4>Call Trace:
            <4> [<ffffffffa0e94072>] lod_dir_striping_create_internal+0x452/0x1450 [lod]
            <4> [<ffffffffa0c516d8>] ? osd_declare_inode_qid+0x1e8/0x270 [osd_ldiskfs]
            <4> [<ffffffffa030d4d8>] ? libcfs_log_return+0x28/0x40 [libcfs]
            <4> [<ffffffffa0e95297>] lod_declare_object_create+0x227/0x390 [lod]
            <4> [<ffffffffa0ed88c4>] mdd_declare_object_create_internal+0xb4/0x1e0 [mdd]
            <4> [<ffffffffa0eea7a3>] mdd_create+0x813/0x18a0 [mdd]
            <4> [<ffffffffa0dc0f83>] mdt_reint_create+0xac3/0xfa0 [mdt]
            <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            <4> [<ffffffffa0dbc5e1>] mdt_reint_rec+0x41/0xe0 [mdt]
            <4> [<ffffffffa0da1e13>] mdt_reint_internal+0x4c3/0x7c0 [mdt]
            <4> [<ffffffffa0da269b>] mdt_reint+0x6b/0x120 [mdt]
            <4> [<ffffffffa07059ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc]
            <4> [<ffffffffa06b498a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc]
            <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc]
            <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
            <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
            <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
            <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20

            Dump of this with full debug on, as well as client logs and logs from the second MDS, is here:
            ftp.whamcloud.com/uploads/LU-4850/LU-4850_mds_LBUG-140402.tar.gz

            paf Patrick Farrell (Inactive) added a comment - - edited OK, after reformatting and trying again (with patched client and servers), I've got a new crash for you. This happened on the primary MDS when I did this sequence of commands: lfs setdirstripe -c 4 striped_directory lfs setdirstripe -c 4 -D striped_directory mkdir -p striped_directory/test1 lfs getdirstripe striped_directory/test1 rm -rf striped_directory/* lfs setdirstripe -c 1 -D striped_directory mkdir -p striped_directory/test1 The crash on the first MDS(MDT0 & MDT1) (which is where test1 is created - lmv offset is 0) occurred when I did the last command in that sequence. <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa0e8ca1a>] lod_dir_declare_xattr_set+0x25a/0x4b0 [lod] [......] <4>Call Trace: <4> [<ffffffffa0e94072>] lod_dir_striping_create_internal+0x452/0x1450 [lod] <4> [<ffffffffa0c516d8>] ? osd_declare_inode_qid+0x1e8/0x270 [osd_ldiskfs] <4> [<ffffffffa030d4d8>] ? libcfs_log_return+0x28/0x40 [libcfs] <4> [<ffffffffa0e95297>] lod_declare_object_create+0x227/0x390 [lod] <4> [<ffffffffa0ed88c4>] mdd_declare_object_create_internal+0xb4/0x1e0 [mdd] <4> [<ffffffffa0eea7a3>] mdd_create+0x813/0x18a0 [mdd] <4> [<ffffffffa0dc0f83>] mdt_reint_create+0xac3/0xfa0 [mdt] <4> [<ffffffffa0313581>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4> [<ffffffffa0dbc5e1>] mdt_reint_rec+0x41/0xe0 [mdt] <4> [<ffffffffa0da1e13>] mdt_reint_internal+0x4c3/0x7c0 [mdt] <4> [<ffffffffa0da269b>] mdt_reint+0x6b/0x120 [mdt] <4> [<ffffffffa07059ac>] tgt_request_handle+0x23c/0xac0 [ptlrpc] <4> [<ffffffffa06b498a>] ptlrpc_main+0xd1a/0x1980 [ptlrpc] <4> [<ffffffffa06b3c70>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] <4> [<ffffffff8109aee6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 Dump of this with full debug on, as well as client logs and logs from the second MDS, is here: ftp.whamcloud.com/uploads/ LU-4850 / LU-4850 _mds_LBUG-140402.tar.gz

            Ah, I was afraid of that. I will reformat before testing again. No need to look at the logs unless it interests you.

            paf Patrick Farrell (Inactive) added a comment - Ah, I was afraid of that. I will reformat before testing again. No need to look at the logs unless it interests you.

            People

              di.wang Di Wang (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: