Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13730

Check need to mirror Extend on a WRITE_PENDING (wp) FLR file

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 9223372036854775807

    Description

      In the following test scripts, it caused the kernel panic:

              local file=$DIR/$tfile
      
              $LFS mirror create -N -S 4M -c 2 -N -S 1M -c -1 $file ||
                      error "create mirrored file $file failed"
              echo -n pccro_as_mirror_layout > $file
              echo "FLR layout before PCC-RO attach '$file':"
              $LFS getstripe -v $file
      
              $LFS mirror extend -N -S 8M -c -1 $file ||
              error "mirror extend $file failed"
              echo -e "\nFLR layout after extend a mirror:"
              $LFS getstripe -v $file
      
              echo -e "\nWrite '$file' after mirror extend:"
              echo -n write_after_mirror_extend > $file
              $LFS getstripe -v $file
      

      The output for the test script:

      mnt/lustre/f21i.sanity-pcc
      composite_header:
      lcm_magic: 0x0BD60BD0
      lcm_size: 288
      lcm_flags: wp
      lcm_layout_gen: 3
      lcm_mirror_count: 2
      lcm_entry_count: 2
      components:
      - lcme_id: 65537
      lcme_mirror_id: 1
      lcme_flags: init
      lcme_extent.e_start: 0
      lcme_extent.e_end: EOF
      lcme_offset: 128
      lcme_size: 80
      sub_layout:
      lmm_magic: 0x0BD10BD0
      lmm_seq: 0x200000401
      lmm_object_id: 0x3
      lmm_fid: [0x200000401:0x3:0x0]
      lmm_stripe_count: 2
      lmm_stripe_size: 4194304
      lmm_pattern: raid0
      lmm_layout_gen: 0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] }
      - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }- lcme_id: 131074
      lcme_mirror_id: 2
      lcme_flags: init,stale
      lcme_extent.e_start: 0
      lcme_extent.e_end: EOF
      lcme_offset: 208
      lcme_size: 80
      sub_layout:
      lmm_magic: 0x0BD10BD0
      lmm_seq: 0x200000401
      lmm_object_id: 0x3
      lmm_fid: [0x200000401:0x3:0x0]
      lmm_stripe_count: 2
      lmm_stripe_size: 1048576
      lmm_pattern: raid0
      lmm_layout_gen: 0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] }
      - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
      FLR layout after extend a mirror:
      /mnt/lustre/f21i.sanity-pcc
      composite_header:
      lcm_magic: 0x0BD60BD0
      lcm_size: 416
      lcm_flags: wp
      lcm_layout_gen: 4
      lcm_mirror_count: 3
      lcm_entry_count: 3
      components:
      - lcme_id: 65537
      lcme_mirror_id: 1
      lcme_flags: init
      lcme_extent.e_start: 0
      lcme_extent.e_end: EOF
      lcme_offset: 176
      lcme_size: 80
      sub_layout:
      lmm_magic: 0x0BD10BD0
      lmm_seq: 0x200000401
      lmm_object_id: 0x3
      lmm_fid: [0x200000401:0x3:0x0]
      lmm_stripe_count: 2
      lmm_stripe_size: 4194304
      lmm_pattern: raid0
      lmm_layout_gen: 0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] }
      - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }- lcme_id: 131074
      lcme_mirror_id: 2
      lcme_flags: init,stale
      lcme_extent.e_start: 0
      lcme_extent.e_end: EOF
      lcme_offset: 256
      lcme_size: 80
      sub_layout:
      lmm_magic: 0x0BD10BD0
      lmm_seq: 0x200000401
      lmm_object_id: 0x3
      lmm_fid: [0x200000401:0x3:0x0]
      lmm_stripe_count: 2
      lmm_stripe_size: 1048576
      lmm_pattern: raid0
      lmm_layout_gen: 0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] }
      - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }- lcme_id: 196609
      lcme_mirror_id: 3
      lcme_flags: init
      lcme_extent.e_start: 0
      lcme_extent.e_end: EOF
      lcme_offset: 336
      lcme_size: 80
      sub_layout:
      lmm_magic: 0x0BD10BD0
      lmm_seq: 0x200000401
      lmm_object_id: 0x3
      lmm_fid: [0x200000401:0x3:0x0]
      lmm_stripe_count: 2
      lmm_stripe_size: 8388608
      lmm_pattern: raid0
      lmm_layout_gen: 0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
      - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x4:0x0] }
      Write '/mnt/lustre/f21i.sanity-pcc' after mirror extend:
      

      The panic crash dump:

      [27530.518341] LustreError: 70249:0:(lod_object.c:7733:lod_declare_update_write_pending()) ASSERTION( primary < 0 ) failed: [0x200000401:0x3:0x0] has multiple primary: 3 / 1
      [27530.519329] LustreError: 70249:0:(lod_object.c:7733:lod_declare_update_write_pending()) LBUG
      [27530.519834] Pid: 70249, comm: mdt00_003 3.10.0-957.12.2.el7_lustre.2.12.55_47_gf6497eb.x86_64 #1 SMP Mon Jul 1 20:06:03 CST 2019
      [27530.519835] Call Trace:
      [27530.519842] [<ffffffffc06e762c>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [27530.519848] [<ffffffffc06e794c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [27530.519851] [<ffffffffc13a9c89>] lod_declare_update_write_pending+0x969/0xa40 [lod]
      [27530.519860] [<ffffffffc13aa5b8>] lod_declare_layout_change+0x858/0xe20 [lod]
      [27530.519864] [<ffffffffc1244bf3>] mdd_declare_layout_change+0x63/0x130 [mdd]
      [27530.519870] [<ffffffffc124e419>] mdd_layout_change+0x9d9/0x18e0 [mdd]
      

      It seems that FLR can only extend a mirror component for a FLR file in LCM_FL_RDONLY state. 

      When the file in LCM_FL_WRITING_PENDING state, the write after mirror extend will found more than one primary mirrors which will cause panic.

      Two possible solutions are:

      1. Check whether the file is in LCM_FL_RDONLY state before extend a mirror to a given file. If not, return invalid error code immediately.
      2. In lod_declare_update_write_pending, when find more than one candidates for primary, pick one as the primary and stale other synced mirrors.

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: