Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9766

DNE phase 2 - wrong directory inheritance

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.7.0, Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      The inheritance of directory striping with "lfs setdirstripe" is not working as it should:

      Setting a directory with default mdt striping to 2:
      [root@vm4 test]# lfs setdirstripe -D -c 2 2-stripes
      
      Creating directories (which should inherits from parent)
      [root@vm4 test]# mkdir 2-stripes/foo.{0..9}
      
      Some are correct but not all of them:
      [root@vm4 test]#  lfs getdirstripe 2-stripes/
      2-stripes/
      lmv_stripe_count: 2 lmv_stripe_offset: 0
      mdtidx           FID[seq:oid:ver]
          0           [0x300000401:0x60:0x0]
          1           [0x340000402:0x1:0x0]
      2-stripes//foo.2
      lmv_stripe_count: 2 lmv_stripe_offset: 1
      mdtidx           FID[seq:oid:ver]
          1           [0x340000401:0xb4:0x0]
          0           [0x300000402:0x2:0x0]
      2-stripes//foo.6
      lmv_stripe_count: 2 lmv_stripe_offset: 1
      mdtidx           FID[seq:oid:ver]
          1           [0x340000401:0xb6:0x0]
          0           [0x300000402:0x4:0x0]
      2-stripes//foo.5
      lmv_stripe_count: 1 lmv_stripe_offset: 0
      mdtidx           FID[seq:oid:ver]
          0           [0x300000401:0x63:0x0]
      2-stripes//foo.9
      lmv_stripe_count: 1 lmv_stripe_offset: 0
      mdtidx           FID[seq:oid:ver]
          0           [0x300000401:0x65:0x0]
      2-stripes//foo.0
      lmv_stripe_count: 2 lmv_stripe_offset: 1
      mdtidx           FID[seq:oid:ver]
          1           [0x340000401:0xb3:0x0]
          0           [0x300000402:0x1:0x0]
      2-stripes//foo.1
      lmv_stripe_count: 1 lmv_stripe_offset: 0
      mdtidx           FID[seq:oid:ver]
          0           [0x300000401:0x61:0x0]
      2-stripes//foo.8
      lmv_stripe_count: 2 lmv_stripe_offset: 1
      mdtidx           FID[seq:oid:ver]
          1           [0x340000401:0xb7:0x0]
          0           [0x300000402:0x5:0x0]
      2-stripes//foo.7
      lmv_stripe_count: 1 lmv_stripe_offset: 0
      mdtidx           FID[seq:oid:ver]
          0           [0x300000401:0x64:0x0]
      2-stripes//foo.3
      lmv_stripe_count: 1 lmv_stripe_offset: 0
      mdtidx           FID[seq:oid:ver]
          0           [0x300000401:0x62:0x0]
      2-stripes//foo.4
      lmv_stripe_count: 2 lmv_stripe_offset: 1
      mdtidx           FID[seq:oid:ver]
          1           [0x340000401:0xb5:0x0]
          0           [0x300000402:0x3:0x0]
      

      On MDS, in logs I can see that lod_cache_parent_striping does not return the defined striping all the time but the default filesystem striping:

       57168 00000004:00000001:1.0:1499850278.062218:0:8981:0:(lod_object.c:3008:lod_cache_parent_lmv_striping()) Process leaving
       57169 00000004:00000001:1.0:1499850278.062219:0:8981:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0)
       57170 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2
       57171 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3187:lod_ah_init()) inherit EA nr:1 off:-1
       57172 00000004:00000040:1.0:1499850278.062221:0:8981:0:(lod_object.c:3195:lod_ah_init()) final striping count:1, offset:-1
       57173 00000004:00000001:1.0:1499850278.062221:0:8981:0:(lod_object.c:3246:lod_ah_init()) Process leaving
      
      
      581753 00000004:00000001:1.0:1499850525.180299:0:9133:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0)
      581754 00000004:00000040:1.0:1499850525.180299:0:9133:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2
      581755 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3175:lod_ah_init()) set stripe EA nr:2 off:0
      581756 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3195:lod_ah_init()) final striping count:2, offset:0
      581757 00000004:00000001:1.0:1499850525.180301:0:9133:0:(lod_object.c:3246:lod_ah_init()) Process leaving
      

      This is a problem as when the stripe count is incorrect, the assigned resulting MDT is 0, so the MDT0 fills up faster than other MDTs.

      Also "lfs mkdir -i 1" does not work, it creates a directory with a stripe count of 0 and one mdt index. A workaround is to do an "lfs setdirstripe -D -c 1" on the parent directory then create directories with mkdir.

      When creating directories where default striping was specified, I have sometimes timeouts in 2.7 and panics on clients in 2.9

      2.7:

      [root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9}
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.1': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.2': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.3': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.4': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.5': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.6': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.7': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.8': Cannot send after transport endpoint shutdown
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.9': Cannot send after transport endpoint shutdown
      [root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9}
      mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error
      

      2.9:

      crash> bt 2135
      PID: 2135   TASK: ffff880035860000  CPU: 1   COMMAND: "mkdir"
       #0 [ffff880016c4b670] machine_kexec at ffffffff81059cdb
       #1 [ffff880016c4b6d0] __crash_kexec at ffffffff81105182
       #2 [ffff880016c4b7a0] crash_kexec at ffffffff81105270
       #3 [ffff880016c4b7b8] oops_end at ffffffff8168efc8
       #4 [ffff880016c4b7e0] no_context at ffffffff8167ebd3
       #5 [ffff880016c4b830] __bad_area_nosemaphore at ffffffff8167ec69
       #6 [ffff880016c4b878] bad_area at ffffffff8167ef8d
       #7 [ffff880016c4b8a0] __do_page_fault at ffffffff81691e5f
       #8 [ffff880016c4b900] do_page_fault at ffffffff81691f05
       #9 [ffff880016c4b930] page_fault at ffffffff8168e1c8
          [exception RIP: memcpy+22]
          RIP: ffffffff813269a6  RSP: ffff880016c4b9e0  RFLAGS: 00010283
          RAX: ffff8800395fb4c0  RBX: ffff880016c4baf8  RCX: ffff880016c4bfd8
          RDX: ffffffffffffffe5  RSI: 0000000000000000  RDI: ffff8800395fb4c0
          RBP: ffff880016c4bab8   R8: 0000000000019a80   R9: 0000000000000000
          R10: ffff8800395fb4c0  R11: 0000000000aaaaaa  R12: 0000000000000025
          R13: ffff880016c4bae8  R14: ffff8800358789a0  R15: 0000000000000025
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      #10 [ffff880016c4b9e0] ll_lookup_it_finish at ffffffffa0ab5715 [lustre]
      #11 [ffff880016c4bac0] ll_lookup_it at ffffffffa0ab70ae [lustre]
      #12 [ffff880016c4bb78] ll_lookup_nd at ffffffffa0ab89dd [lustre]
      #13 [ffff880016c4bc10] lookup_real at ffffffff812083dd
      #14 [ffff880016c4bc30] __lookup_hash at ffffffff81208d52
      #15 [ffff880016c4bc60] lookup_slow at ffffffff816833cb
      #16 [ffff880016c4bc98] link_path_walk at ffffffff8120b96f
      #17 [ffff880016c4bd48] path_lookupat at ffffffff8120bb6b
      #18 [ffff880016c4bde0] filename_lookup at ffffffff8120c2cb
      #19 [ffff880016c4be18] filename_create at ffffffff8120c3a2
      #20 [ffff880016c4bee8] user_path_create at ffffffff8120eee1
      #21 [ffff880016c4bf18] sys_mkdirat at ffffffff812101f6
      #22 [ffff880016c4bf70] sys_mkdir at ffffffff812102a9
      #23 [ffff880016c4bf80] system_call_fastpath at ffffffff81696709
          RIP: 00007f9ddb6d29a7  RSP: 00007ffcb2ec5690  RFLAGS: 00010246
          RAX: 0000000000000053  RBX: ffffffff81696709  RCX: 00007ffcb2ec57f0
          RDX: 00000000000001ff  RSI: 00000000000001ff  RDI: 00007ffcb2ec9790
          RBP: 00007ffcb2ec87d0   R8: 00000000000001ff   R9: 00000000004029f0
          R10: 000000000000000b  R11: 0000000000000206  R12: ffffffff812102a9
          R13: ffff880016c4bf78  R14: 00000000000001ff  R15: 00007ffcb2ec8820
          ORIG_RAX: 0000000000000053  CS: 0033  SS: 002b
      

      Attachments

        Activity

          [LU-9766] DNE phase 2 - wrong directory inheritance

          Tested this is working properly in (at least) 2.14.0 and later.

          adilger Andreas Dilger added a comment - Tested this is working properly in (at least) 2.14.0 and later.

          Thanks for the inputs.
          Well no I do not have the logs anymore but I can reproduce.

          All MDTs were on the same MDS node and the network failures looked to be more a consequence of the test, not the cause.
          I will reproduce the test with a small lustre setup on a single node to avoid network traffic.

          riauxjb Jean-Baptiste Riaux (Inactive) added a comment - Thanks for the inputs. Well no I do not have the logs anymore but I can reproduce. All MDTs were on the same MDS node and the network failures looked to be more a consequence of the test, not the cause. I will reproduce the test with a small lustre setup on a single node to avoid network traffic.
          di.wang Di Wang added a comment -

          Do you still have the debug log? It seems there are some communication issue between MDTs, that is why it will only create stripe on MDT0.

          According to the debug log you post, the parent's default stripe count is 1,

          57168 00000004:00000001:1.0:1499850278.062218:0:8981:0:(lod_object.c:3008:lod_cache_parent_lmv_striping()) Process leaving
           57169 00000004:00000001:1.0:1499850278.062219:0:8981:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0)
           57170 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2
           57171 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3187:lod_ah_init()) inherit EA nr:1 off:-1
           57172 00000004:00000040:1.0:1499850278.062221:0:8981:0:(lod_object.c:3195:lod_ah_init()) final striping count:1, offset:-1
           57173 00000004:00000001:1.0:1499850278.062221:0:8981:0:(lod_object.c:3246:lod_ah_init()) Process leaving
          

          So the child inherits stripe count correctly.

          Though the bottom half

          581753 00000004:00000001:1.0:1499850525.180299:0:9133:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0)
          581754 00000004:00000040:1.0:1499850525.180299:0:9133:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2
          581755 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3175:lod_ah_init()) set stripe EA nr:2 off:0
          581756 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3195:lod_ah_init()) final striping count:2, offset:0
          581757 00000004:00000001:1.0:1499850525.180301:0:9133:0:(lod_object.c:3246:lod_ah_init(.....
          

          The child seems created by "setdirstripe -c2", so this will override the default stripe, then create the directory with 2 stripes.

          [root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9}
          mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error
          mkdir: cannot create directory `1-stripe-1/foo.0/foo.1': Cannot send after transport endpoint shutdown
          mkdir: cannot create directory `1-stripe-1/foo.0/foo.2': Cannot send after transport en...
          

          These failures also suggests there are some communication issues between MDTs.

          di.wang Di Wang added a comment - Do you still have the debug log? It seems there are some communication issue between MDTs, that is why it will only create stripe on MDT0. According to the debug log you post, the parent's default stripe count is 1, 57168 00000004:00000001:1.0:1499850278.062218:0:8981:0:(lod_object.c:3008:lod_cache_parent_lmv_striping()) Process leaving 57169 00000004:00000001:1.0:1499850278.062219:0:8981:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0) 57170 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2 57171 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3187:lod_ah_init()) inherit EA nr:1 off:-1 57172 00000004:00000040:1.0:1499850278.062221:0:8981:0:(lod_object.c:3195:lod_ah_init()) final striping count:1, offset:-1 57173 00000004:00000001:1.0:1499850278.062221:0:8981:0:(lod_object.c:3246:lod_ah_init()) Process leaving So the child inherits stripe count correctly. Though the bottom half 581753 00000004:00000001:1.0:1499850525.180299:0:9133:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0) 581754 00000004:00000040:1.0:1499850525.180299:0:9133:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2 581755 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3175:lod_ah_init()) set stripe EA nr:2 off:0 581756 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3195:lod_ah_init()) final striping count:2, offset:0 581757 00000004:00000001:1.0:1499850525.180301:0:9133:0:(lod_object.c:3246:lod_ah_init(..... The child seems created by "setdirstripe -c2", so this will override the default stripe, then create the directory with 2 stripes. [root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9} mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error mkdir: cannot create directory `1-stripe-1/foo.0/foo.1': Cannot send after transport endpoint shutdown mkdir: cannot create directory `1-stripe-1/foo.0/foo.2': Cannot send after transport en... These failures also suggests there are some communication issues between MDTs.
          pjones Peter Jones added a comment -

          Di/Lai

          Do you have any advice here?

          Peter

          pjones Peter Jones added a comment - Di/Lai Do you have any advice here? Peter

          People

            riauxjb Jean-Baptiste Riaux (Inactive)
            riauxjb Jean-Baptiste Riaux (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: