Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15702

(lov_lock.c:206:lov_lock_sub_init()) ASSERTION( (!(result == 0) || (nr == lovlck->lls_nr)) ) failed:

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      See on v2_15_0-RC2-25-g9884f37985. The failed assertion of LU-14618 is still reachable. To reproduce:

      $LUSTRE/tests/llmount.sh
      lfs setstripe \
        -E 1M -c 4 \
        -E 2M -c 4 \
        -E 3M -c 4 \
        -E 4M -c 4 \
        -E 5M -c 4 \
        -E 6M -c 4 \
        -E 7M -c 4 \
        -E 8M -c 4 \
        -E 9M -c 4 \
        -E 10M -c 4 \
        -E 11M -c 4 \
        -E 12M -c 4 \
        -E 13M -c 4 \
        -E 14M -c 4 \
        -E 15M -c 4 \
        -E eof -c 4 \
        /mnt/lustre
      $LUSTRE/tests/racer.sh
      
      [  248.230101] LustreError: 50541:0:(lov_lock.c:206:lov_lock_sub_init()) LBUG
      [  248.230626] Pid: 50541, comm: truncate 4.18.0-348.7.1.el8.x86_64+debug #1 SMP Mon Mar 7 11:31:18 CST 2022
      [  248.231401] Call Trace TBD:
      [  248.231636] [<0>] libcfs_call_trace+0x99/0x140 [libcfs]
      [  248.232050] [<0>] lbug_with_loc+0x8f/0x100 [libcfs]
      [  248.232664] [<0>] lov_lock_sub_init+0x2115/0x2330 [lov]
      [  248.233081] [<0>] lov_lock_init_composite+0xf1/0x1f0 [lov]
      [  248.233560] [<0>] cl_lock_init+0x23d/0x410 [obdclass]
      [  248.234029] [<0>] cl_lock_request+0x11f/0x370 [obdclass]
      [  248.234619] [<0>] cl_io_lock+0x36e/0xa20 [obdclass]
      [  248.235045] [<0>] cl_io_loop+0x16d/0x480 [obdclass]
      [  248.235471] [<0>] cl_setattr_ost+0x77e/0xaa0 [lustre]
      [  248.235876] [<0>] ll_setattr_raw+0x15fe/0x2d30 [lustre]
      [  248.236290] [<0>] notify_change+0x743/0xd78
      [  248.236613] [<0>] do_truncate+0xe2/0x180
      [  248.236912] [<0>] vfs_truncate+0x368/0x400
      [  248.237217] [<0>] do_sys_truncate.part.10+0xe0/0x100
      [  248.237576] [<0>] do_syscall_64+0xa5/0x430
      [  248.237877] [<0>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
      [  248.238279] Kernel panic - not syncing: LBUG
      [  248.238596] CPU: 2 PID: 50541 Comm: truncate Kdump: loaded Tainted: G        W  OE    

      This is caused by a data race on the lo_trunc_stripeno member of struct lov_layout_raid0.

      It is possible to cause slab out of bounds errors (and hence memory corruption) in the same way:

      [ 1494.352844] ==================================================================
      [ 1494.353591] BUG: KASAN: slab-out-of-bounds in lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.354310] Read of size 8 at addr ffff888059512700 by task truncate/28602
      [ 1494.354928]
      [ 1494.355058] CPU: 3 PID: 28602 Comm: truncate Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-348.7\
      .1.el8.x86_64+debug #1
      [ 1494.356192] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      [ 1494.356990] Call Trace:
      [ 1494.357253]  dump_stack+0x8e/0xd0
      [ 1494.357549]  ? lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.357984]  print_address_description.constprop.5+0x1e/0x230
      [ 1494.358521]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
      [ 1494.359006]  ? lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.359455]  ? lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.359917]  ? lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.360389]  __kasan_report.cold.7+0x37/0x86
      [ 1494.360757]  ? lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.361211]  kasan_report+0x37/0x50
      [ 1494.361493]  lov_lock_sub_init+0x201b/0x2640 [lov]
      [ 1494.361895]  ? lov_lock_fini+0x810/0x810 [lov]
      [ 1494.362269]  ? lov_io_iter_init+0x901/0x1960 [lov]
      [ 1494.362684]  ? libcfs_debug_msg+0x9d/0xd0 [libcfs]
      [ 1494.362866] LustreError: 27720:0:(lov_lock.c:209:lov_lock_sub_init()) result=0, nr=28, lls_nr=27
      [ 1494.363141]  lov_lock_init_composite+0xf5/0x200 [lov]
      [ 1494.363183]  cl_lock_init+0x23d/0x3f0 [obdclass]
      [ 1494.364977]  cl_lock_request+0x120/0x350 [obdclass]
      [ 1494.365397]  cl_io_lock+0x85c/0x1240 [obdclass]
      [ 1494.365795]  ? vvp_io_init+0x5aa/0xb90 [lustre]
      [ 1494.366261]  cl_io_loop+0x16d/0x490 [obdclass]
      [ 1494.366718]  cl_setattr_ost+0x76a/0xa90 [lustre]
      [ 1494.367136]  ? cl_glimpse_size0+0x680/0x680 [lustre]
      [ 1494.367621]  ? _raw_spin_unlock+0x1f/0x30
      [ 1494.367987]  ? __ptlrpc_req_finished+0x50f/0x1320 [ptlrpc]
      [ 1494.368483]  ? up_write+0x15c/0x490
      [ 1494.368795]  ll_setattr_raw+0x1a1d/0x2ee0 [lustre]
      [ 1494.369265]  ? ll_finish_md_op_data+0x210/0x210 [lustre]
      [ 1494.369687]  ? ktime_get_coarse_real_ts64+0x127/0x1b0
      [ 1494.370119]  notify_change+0x743/0xd78
      [ 1494.370421]  ? libcfs_debug_vmsg2+0x23e0/0x23e0 [libcfs]
      [ 1494.370866]  do_truncate+0xe2/0x180
      [ 1494.371187]  ? file_open_root+0x1b0/0x1b0
      [ 1494.371507]  ? inode_permission+0x25b/0x390
      [ 1494.371847]  vfs_truncate+0x368/0x400
      [ 1494.372193]  ? do_truncate+0x180/0x180
      [ 1494.372499]  do_sys_truncate.part.10+0xe0/0x100
      [ 1494.372863]  ? vfs_truncate+0x400/0x400
      [ 1494.373224]  ? do_syscall_64+0x22/0x430
      [ 1494.373520]  do_syscall_64+0xa5/0x430
      [ 1494.373814]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
      [ 1494.374257] RIP: 0033:0x7f87d92a5e2b
      [ 1494.374550] Code: 8b 15 61 90 2c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 d6 e9 f0 fe ff ff f3 0f 1e fa b8 4c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 29 90 2c 00 f7
      

      This information (lo_trunc_stripno) must be moved out of the object and into the io. We do not have locking available to us to protect it from its initialization in lov_io_iter_init() to its use in lov_lock_sub_init().

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: