Details
-
New Feature
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
In the following test scripts, it caused the kernel panic:
local file=$DIR/$tfile $LFS mirror create -N -S 4M -c 2 -N -S 1M -c -1 $file || error "create mirrored file $file failed" echo -n pccro_as_mirror_layout > $file echo "FLR layout before PCC-RO attach '$file':" $LFS getstripe -v $file $LFS mirror extend -N -S 8M -c -1 $file || error "mirror extend $file failed" echo -e "\nFLR layout after extend a mirror:" $LFS getstripe -v $file echo -e "\nWrite '$file' after mirror extend:" echo -n write_after_mirror_extend > $file $LFS getstripe -v $file
The output for the test script:
mnt/lustre/f21i.sanity-pcc
composite_header:
lcm_magic: 0x0BD60BD0
lcm_size: 288
lcm_flags: wp
lcm_layout_gen: 3
lcm_mirror_count: 2
lcm_entry_count: 2
components:
- lcme_id: 65537
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lcme_offset: 128
lcme_size: 80
sub_layout:
lmm_magic: 0x0BD10BD0
lmm_seq: 0x200000401
lmm_object_id: 0x3
lmm_fid: [0x200000401:0x3:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }- lcme_id: 131074
lcme_mirror_id: 2
lcme_flags: init,stale
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lcme_offset: 208
lcme_size: 80
sub_layout:
lmm_magic: 0x0BD10BD0
lmm_seq: 0x200000401
lmm_object_id: 0x3
lmm_fid: [0x200000401:0x3:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] }
- 1: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
FLR layout after extend a mirror:
/mnt/lustre/f21i.sanity-pcc
composite_header:
lcm_magic: 0x0BD60BD0
lcm_size: 416
lcm_flags: wp
lcm_layout_gen: 4
lcm_mirror_count: 3
lcm_entry_count: 3
components:
- lcme_id: 65537
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lcme_offset: 176
lcme_size: 80
sub_layout:
lmm_magic: 0x0BD10BD0
lmm_seq: 0x200000401
lmm_object_id: 0x3
lmm_fid: [0x200000401:0x3:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 4194304
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x3:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }- lcme_id: 131074
lcme_mirror_id: 2
lcme_flags: init,stale
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lcme_offset: 256
lcme_size: 80
sub_layout:
lmm_magic: 0x0BD10BD0
lmm_seq: 0x200000401
lmm_object_id: 0x3
lmm_fid: [0x200000401:0x3:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x3:0x0] }
- 1: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }- lcme_id: 196609
lcme_mirror_id: 3
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: EOF
lcme_offset: 336
lcme_size: 80
sub_layout:
lmm_magic: 0x0BD10BD0
lmm_seq: 0x200000401
lmm_object_id: 0x3
lmm_fid: [0x200000401:0x3:0x0]
lmm_stripe_count: 2
lmm_stripe_size: 8388608
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x5:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x4:0x0] }
Write '/mnt/lustre/f21i.sanity-pcc' after mirror extend:
The panic crash dump:
[27530.518341] LustreError: 70249:0:(lod_object.c:7733:lod_declare_update_write_pending()) ASSERTION( primary < 0 ) failed: [0x200000401:0x3:0x0] has multiple primary: 3 / 1 [27530.519329] LustreError: 70249:0:(lod_object.c:7733:lod_declare_update_write_pending()) LBUG [27530.519834] Pid: 70249, comm: mdt00_003 3.10.0-957.12.2.el7_lustre.2.12.55_47_gf6497eb.x86_64 #1 SMP Mon Jul 1 20:06:03 CST 2019 [27530.519835] Call Trace: [27530.519842] [<ffffffffc06e762c>] libcfs_call_trace+0x8c/0xc0 [libcfs] [27530.519848] [<ffffffffc06e794c>] lbug_with_loc+0x4c/0xa0 [libcfs] [27530.519851] [<ffffffffc13a9c89>] lod_declare_update_write_pending+0x969/0xa40 [lod] [27530.519860] [<ffffffffc13aa5b8>] lod_declare_layout_change+0x858/0xe20 [lod] [27530.519864] [<ffffffffc1244bf3>] mdd_declare_layout_change+0x63/0x130 [mdd] [27530.519870] [<ffffffffc124e419>] mdd_layout_change+0x9d9/0x18e0 [mdd]
It seems that FLR can only extend a mirror component for a FLR file in LCM_FL_RDONLY state.
When the file in LCM_FL_WRITING_PENDING state, the write after mirror extend will found more than one primary mirrors which will cause panic.
Two possible solutions are:
- Check whether the file is in LCM_FL_RDONLY state before extend a mirror to a given file. If not, return invalid error code immediately.
- In lod_declare_update_write_pending, when find more than one candidates for primary, pick one as the primary and stale other synced mirrors.