Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4843

2.6: DNE stripe directory - 2.5.0 clients

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • Lustre 2.6.0
    • 3
    • 13345

    Description

      The DNE2 solution architecture states that 2.4 clients should return -ENOTSUPP when trying to access a master/DNE Phase 2 striped directory.

      I did a quick test of this, and did not receive this error. Instead, the client LBUGged. Perhaps this only applies to later versions of 2.4 - I tested 2.4.0.

      "striped_directory" is on 2 master MDSes, with two MDTs each. It was created like this, from a master client:
      lfs setdirstripe -c 4 -D striped_directory

      Here's what happened on my 2.4.0 client:
      [root@centclient18 centssm1]# ls
      striped_directory
      [root@centclient18 centssm1]# cd striped_directory/
      [root@centclient18 striped_directory]# ls
      file file10 file2 file3 file4 file5 file6 file7 file8 file9
      [root@centclient18 striped_directory]# touch file11
      [root@centclient18 striped_directory]# ls -la
      total 8
      drwxr-xr-x 2 root root 4096 Mar 31 04:47 .
      drwxr-xr-x 4 root root 4096 Mar 31 04:36 ..
      rw-rr- 1 root root 0 Mar 31 04:47 file
      rw-rr- 1 root root 0 Mar 31 04:47 file10
      rw-rr- 1 root root 0 Mar 31 04:50 file11
      rw-rr- 1 root root 0 Mar 31 04:47 file2
      rw-rr- 1 root root 0 Mar 31 04:47 file3
      rw-rr- 1 root root 0 Mar 31 04:47 file4
      rw-rr- 1 root root 0 Mar 31 04:47 file5
      rw-rr- 1 root root 0 Mar 31 04:47 file6
      rw-rr- 1 root root 0 Mar 31 04:47 file7
      rw-rr- 1 root root 0 Mar 31 04:47 file8
      rw-rr- 1 root root 0 Mar 31 04:47 file9
      [root@centclient18 striped_directory]# lfs getdirstripe .
      .
      lmv_stripe_count: 1
      lmv_stripe_offset: 0
      mdtidx FID[seq:oid:ver]
      0 [0x600000400:0x1:0x0]

      [root@centclient18 striped_directory]# mkdir test
      [root@centclient18 striped_directory]# ls

      Message from syslogd@centclient18 at Mar 31 04:51:11 ...
      kernel:LustreError: 5906:0:(lmv_obd.c:2351:lmv_unpackmd()) ASSERTION( mea_size == lmm_size ) failed:

      Message from syslogd@centclient18 at Mar 31 04:51:11 ...
      kernel:LustreError: 5906:0:(lmv_obd.c:2351:lmv_unpackmd()) LBUG

      Attachments

        Activity

          [LU-4843] 2.6: DNE stripe directory - 2.5.0 clients

          Patches landed to Master.

          jlevi Jodi Levi (Inactive) added a comment - Patches landed to Master.
          di.wang Di Wang added a comment - http://review.whamcloud.com/10773
          di.wang Di Wang added a comment -

          Ah, in the patch 9956, I use OBD_CONNECT_DISP_STRIPE to check whether the client understand the striped directory, and I had thought this flag is only landed for 2.6 client. Unfortunately, this is included in 2.5 client (2.5.2). I need find a new flag. hmm

          di.wang Di Wang added a comment - Ah, in the patch 9956, I use OBD_CONNECT_DISP_STRIPE to check whether the client understand the striped directory, and I had thought this flag is only landed for 2.6 client. Unfortunately, this is included in 2.5 client (2.5.2). I need find a new flag. hmm

          Di,

          I was busy with other things when the patch landed and never tried it, but I just accidentally tried 2.5.1 clients with 2.6 servers (running master from Thursday) and striped directories, and hit this assertion again.

          Here's the back trace:
          2014-06-21T09:41:10.651307-05:00 c0-0c1s1n2 LustreError: 14818:0:(lmv_obd.c:2587:lmv_unpackmd()) ASSERTION( mea_size == lmm_size ) ailed:
          2014-06-21T09:41:10.651344-05:00 c0-0c1s1n2 LustreError: 14818:0:(lmv_obd.c:2587:lmv_unpackmd()) LBUG
          2014-06-21T09:41:10.651357-05:00 c0-0c1s1n2 Pid: 14818, comm: LUSTRE_PERF
          2014-06-21T09:41:10.651370-05:00 c0-0c1s1n2 Call Trace:
          2014-06-21T09:41:10.677280-05:00 c0-0c1s1n2 [<ffffffff81005eb9>] try_stack_unwind+0x169/0x1b0
          2014-06-21T09:41:10.677319-05:00 c0-0c1s1n2 [<ffffffff81004939>] dump_trace+0x89/0x450
          2014-06-21T09:41:10.677331-05:00 c0-0c1s1n2 [<ffffffffa02eb8c7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
          2014-06-21T09:41:10.704617-05:00 c0-0c1s1n2 [<ffffffffa02ebe27>] lbug_with_loc+0x47/0xc0 [libcfs]
          2014-06-21T09:41:10.704655-05:00 c0-0c1s1n2 [<ffffffffa0a034e1>] lmv_unpackmd+0x5d1/0x820 [lmv]
          2014-06-21T09:41:10.704666-05:00 c0-0c1s1n2 [<ffffffffa075f010>] obd_unpackmd+0xe0/0x360 [mdc]
          2014-06-21T09:41:10.732016-05:00 c0-0c1s1n2 [<ffffffffa076baec>] mdc_get_lustre_md+0xb6c/0x1430 [mdc]
          2014-06-21T09:41:10.732053-05:00 c0-0c1s1n2 [<ffffffffa09fa38b>] lmv_get_lustre_md+0xab/0x310 [lmv]
          2014-06-21T09:41:10.732064-05:00 c0-0c1s1n2 [<ffffffffa08f556d>] ll_prep_inode+0xdd/0xe40 [lustre]
          2014-06-21T09:41:10.759306-05:00 c0-0c1s1n2 [<ffffffffa0905396>] ll_lookup_it_finish+0x1d6/0xd50 [lustre]
          2014-06-21T09:41:10.784541-05:00 c0-0c1s1n2 [<ffffffffa09063e4>] ll_lookup_it+0x4d4/0xad0 [lustre]
          2014-06-21T09:41:10.784572-05:00 c0-0c1s1n2 [<ffffffffa0906a6c>] ll_lookup_nd+0x8c/0x3e0 [lustre]
          2014-06-21T09:41:10.809749-05:00 c0-0c1s1n2 [<ffffffff81159bac>] d_alloc_and_lookup+0x4c/0x80
          2014-06-21T09:41:10.809782-05:00 c0-0c1s1n2 [<ffffffff8115b36e>] do_lookup+0x2ae/0x3b0
          2014-06-21T09:41:10.809793-05:00 c0-0c1s1n2 [<ffffffff8115dba3>] path_lookupat+0xc3/0x5f0
          2014-06-21T09:41:10.834997-05:00 c0-0c1s1n2 [<ffffffff8115e105>] do_path_lookup+0x35/0xd0
          2014-06-21T09:41:10.835031-05:00 c0-0c1s1n2 [<ffffffff8115ee53>] user_path_at_empty+0x83/0xb0
          2014-06-21T09:41:10.835042-05:00 c0-0c1s1n2 [<ffffffff8115ee91>] user_path_at+0x11/0x20
          2014-06-21T09:41:10.835056-05:00 c0-0c1s1n2 [<ffffffff81153a25>] vfs_fstatat+0x55/0x90
          2014-06-21T09:41:10.860193-05:00 c0-0c1s1n2 [<ffffffff81153b8b>] vfs_stat+0x1b/0x20
          2014-06-21T09:41:10.860228-05:00 c0-0c1s1n2 [<ffffffff81153bb4>] sys_newstat+0x24/0x50
          2014-06-21T09:41:10.860239-05:00 c0-0c1s1n2 [<ffffffff815603ab>] system_call_fastpath+0x16/0x1b
          2014-06-21T09:41:10.860255-05:00 c0-0c1s1n2 [<00007f85933a7765>] 0x7f85933a7765

          I also took a dump with debug=-1 on the client if it's of interest.

          paf Patrick Farrell (Inactive) added a comment - Di, I was busy with other things when the patch landed and never tried it, but I just accidentally tried 2.5.1 clients with 2.6 servers (running master from Thursday) and striped directories, and hit this assertion again. Here's the back trace: 2014-06-21T09:41:10.651307-05:00 c0-0c1s1n2 LustreError: 14818:0:(lmv_obd.c:2587:lmv_unpackmd()) ASSERTION( mea_size == lmm_size ) ailed: 2014-06-21T09:41:10.651344-05:00 c0-0c1s1n2 LustreError: 14818:0:(lmv_obd.c:2587:lmv_unpackmd()) LBUG 2014-06-21T09:41:10.651357-05:00 c0-0c1s1n2 Pid: 14818, comm: LUSTRE_PERF 2014-06-21T09:41:10.651370-05:00 c0-0c1s1n2 Call Trace: 2014-06-21T09:41:10.677280-05:00 c0-0c1s1n2 [<ffffffff81005eb9>] try_stack_unwind+0x169/0x1b0 2014-06-21T09:41:10.677319-05:00 c0-0c1s1n2 [<ffffffff81004939>] dump_trace+0x89/0x450 2014-06-21T09:41:10.677331-05:00 c0-0c1s1n2 [<ffffffffa02eb8c7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs] 2014-06-21T09:41:10.704617-05:00 c0-0c1s1n2 [<ffffffffa02ebe27>] lbug_with_loc+0x47/0xc0 [libcfs] 2014-06-21T09:41:10.704655-05:00 c0-0c1s1n2 [<ffffffffa0a034e1>] lmv_unpackmd+0x5d1/0x820 [lmv] 2014-06-21T09:41:10.704666-05:00 c0-0c1s1n2 [<ffffffffa075f010>] obd_unpackmd+0xe0/0x360 [mdc] 2014-06-21T09:41:10.732016-05:00 c0-0c1s1n2 [<ffffffffa076baec>] mdc_get_lustre_md+0xb6c/0x1430 [mdc] 2014-06-21T09:41:10.732053-05:00 c0-0c1s1n2 [<ffffffffa09fa38b>] lmv_get_lustre_md+0xab/0x310 [lmv] 2014-06-21T09:41:10.732064-05:00 c0-0c1s1n2 [<ffffffffa08f556d>] ll_prep_inode+0xdd/0xe40 [lustre] 2014-06-21T09:41:10.759306-05:00 c0-0c1s1n2 [<ffffffffa0905396>] ll_lookup_it_finish+0x1d6/0xd50 [lustre] 2014-06-21T09:41:10.784541-05:00 c0-0c1s1n2 [<ffffffffa09063e4>] ll_lookup_it+0x4d4/0xad0 [lustre] 2014-06-21T09:41:10.784572-05:00 c0-0c1s1n2 [<ffffffffa0906a6c>] ll_lookup_nd+0x8c/0x3e0 [lustre] 2014-06-21T09:41:10.809749-05:00 c0-0c1s1n2 [<ffffffff81159bac>] d_alloc_and_lookup+0x4c/0x80 2014-06-21T09:41:10.809782-05:00 c0-0c1s1n2 [<ffffffff8115b36e>] do_lookup+0x2ae/0x3b0 2014-06-21T09:41:10.809793-05:00 c0-0c1s1n2 [<ffffffff8115dba3>] path_lookupat+0xc3/0x5f0 2014-06-21T09:41:10.834997-05:00 c0-0c1s1n2 [<ffffffff8115e105>] do_path_lookup+0x35/0xd0 2014-06-21T09:41:10.835031-05:00 c0-0c1s1n2 [<ffffffff8115ee53>] user_path_at_empty+0x83/0xb0 2014-06-21T09:41:10.835042-05:00 c0-0c1s1n2 [<ffffffff8115ee91>] user_path_at+0x11/0x20 2014-06-21T09:41:10.835056-05:00 c0-0c1s1n2 [<ffffffff81153a25>] vfs_fstatat+0x55/0x90 2014-06-21T09:41:10.860193-05:00 c0-0c1s1n2 [<ffffffff81153b8b>] vfs_stat+0x1b/0x20 2014-06-21T09:41:10.860228-05:00 c0-0c1s1n2 [<ffffffff81153bb4>] sys_newstat+0x24/0x50 2014-06-21T09:41:10.860239-05:00 c0-0c1s1n2 [<ffffffff815603ab>] system_call_fastpath+0x16/0x1b 2014-06-21T09:41:10.860255-05:00 c0-0c1s1n2 [<00007f85933a7765>] 0x7f85933a7765 I also took a dump with debug=-1 on the client if it's of interest.
          pjones Peter Jones added a comment -

          Landed for 2.6

          pjones Peter Jones added a comment - Landed for 2.6
          di.wang Di Wang added a comment - http://review.whamcloud.com/9956

          Di - One other comment. The lfs setdirstripe command recognizes when the hash type specified with -t isn't valid, but I can't find the valid hash types in the documentation anywhere. I had to read the code in order to identify them. It would be nice if the error message included the valid arguments. (Or if not that, then definitely the man page.)

          Since you're doing various patches in the DNE 2 area, could you add that information to lfs as part of one of them? If not, I could generate and push a patch - It just seems easier to put in to another patch than to push it by itself.

          paf Patrick Farrell (Inactive) added a comment - Di - One other comment. The lfs setdirstripe command recognizes when the hash type specified with -t isn't valid, but I can't find the valid hash types in the documentation anywhere. I had to read the code in order to identify them. It would be nice if the error message included the valid arguments. (Or if not that, then definitely the man page.) Since you're doing various patches in the DNE 2 area, could you add that information to lfs as part of one of them? If not, I could generate and push a patch - It just seems easier to put in to another patch than to push it by itself.
          di.wang Di Wang added a comment -

          Ah, I need add the check for striped dir on 2.6 server.

          di.wang Di Wang added a comment - Ah, I need add the check for striped dir on 2.6 server.

          Di,
          Could you comment on this one?
          Thank you!

          jlevi Jodi Levi (Inactive) added a comment - Di, Could you comment on this one? Thank you!

          Sorry, I accidentally created this ticket before I finished tagging it correctly. My bad.

          paf Patrick Farrell (Inactive) added a comment - Sorry, I accidentally created this ticket before I finished tagging it correctly. My bad.

          People

            di.wang Di Wang
            paf Patrick Farrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: