Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5362

(mdc_locks.c:235:mdc_clear_replay_flag()) LBUG

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.7.0, Lustre 2.12.0, Lustre 2.13.0
    • 3
    • 14951

    Description

      In mdc_enqueue() if ldlm_cli_enqueue() fails but the request has a transno then I see the folllowing:

      [  257.072572] LustreError: 19779:0:(mdc_locks.c:904:mdc_enqueue()) @@@ transno returned on error rc -12  req@ffff8801f6191330 x1473823878678260/t4294967302(4294967302) o101->lustre-MDT0000-mdc-ffff8801f03e8ee0@0@lo:12/10 lens 632/544 e 0 to 0 dl 1405548101 ref 2 fl Complete:R/4/0 rc 0/0
      [  257.076802] LustreError: 19779:0:(mdc_locks.c:904:mdc_enqueue()) LBUG
      [  257.077920] Pid: 19779, comm: sys_open
      [  257.078613] 
      [  257.078614] Call Trace:
      [  257.079369]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [  257.080581]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [  257.081681]  [<ffffffffa090ec36>] mdc_enqueue+0x1a26/0x1ab0 [mdc]
      [  257.082731]  [<ffffffffa090eea2>] mdc_intent_lock+0x1e2/0x52f [mdc]
      [  257.083853]  [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  257.085084]  [<ffffffffa0661b70>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
      [  257.086282]  [<ffffffffa08c8d1b>] lmv_intent_open+0x31b/0x9f0 [lmv]
      [  257.087380]  [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  257.088552]  [<ffffffffa08c96d0>] lmv_intent_lock+0x2e0/0x1180 [lmv]
      [  257.089691]  [<ffffffffa041c5e6>] ? obd_alloc_fail+0x56/0x4e0 [obdclass]
      [  257.090846]  [<ffffffff81186c34>] ? kmem_cache_alloc_trace+0x254/0x2b0
      [  257.092020]  [<ffffffffa0e76f79>] ? ll_i2suppgid+0x19/0x30 [lustre]
      [  257.093145]  [<ffffffffa0e76fbe>] ? ll_i2gids+0x2e/0xd0 [lustre]
      [  257.094185]  [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  257.095381]  [<ffffffffa0e7ad1a>] ll_lookup_it+0x25a/0xad0 [lustre]
      [  257.096489]  [<ffffffffa0e7b61c>] ll_lookup_nd+0x8c/0x4a0 [lustre]
      [  257.097564]  [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
      [  257.098477]  [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
      [  257.099492]  [<ffffffff811b398a>] path_walk+0x6a/0xe0
      [  257.100401]  [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
      [  257.101374]  [<ffffffff811b5074>] do_filp_open+0x104/0xd30
      [  257.102348]  [<ffffffff811c255b>] ? alloc_fd+0x3b/0x160
      [  257.103249]  [<ffffffff811c255b>] ? alloc_fd+0x3b/0x160
      [  257.104156]  [<ffffffff815547cb>] ? _spin_unlock+0x2b/0x40
      [  257.105111]  [<ffffffff811c25cb>] ? alloc_fd+0xab/0x160
      [  257.105978]  [<ffffffff8119f809>] do_sys_open+0x69/0x140
      [  257.106901]  [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [  257.108009]  [<ffffffff8119f920>] sys_open+0x20/0x30
      [  257.108903]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [  257.109934] 
      

      Note that there are several ways that ldlm_cli_enqueue() may fail even if the request succeeded. Failure from ldlm_cli_enqueue_fini() is one example. This was found via memory allocation fault injection, specifically failing in ldlm_resource_new().

      Attachments

        Issue Links

          Activity

            [LU-5362] (mdc_locks.c:235:mdc_clear_replay_flag()) LBUG

            It looks like a duplicate of LU-15120

            askulysh Andriy Skulysh added a comment - It looks like a duplicate of LU-15120
            green Oleg Drokin added a comment -

            This hits on master/master-next still

            and the frequency somewhat increases with LU-11623 patches so we really need to do something here I guess.

            green Oleg Drokin added a comment - This hits on master/master-next still and the frequency somewhat increases with LU-11623 patches so we really need to do something here I guess.
            green Oleg Drokin added a comment -

            Still happening with 2.12 too.

            [  491.544077] LustreError: 25274:0:(mdc_locks.c:202:mdc_clear_replay_flag()) @@@ transno returned on error rc -5  req@ffff8800912e6c80 x1610652424649696/t4295299960(4295299960) o101->lustre-MDT0000-mdc-ffff88005ad40800@0@lo:12/10 lens 376/312 e 0 to 0 dl 1536038359 ref 1 fl Complete:R/0/0 rc 0/0
            [  491.544079] LustreError: 25274:0:(mdc_locks.c:203:mdc_clear_replay_flag()) LBUG
            [  491.544080] Pid: 25274, comm: dd 3.10.0-7.5-debug #2 SMP Sat Jun 2 18:41:57 EDT 2018
            [  491.544081] Call Trace:
            [  491.544097]  [<ffffffffa01a67dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            [  491.544102]  [<ffffffffa01a688c>] lbug_with_loc+0x4c/0xa0 [libcfs]
            [  491.544108]  [<ffffffffa0853049>] osc_env_info.part.16+0x0/0x39 [mdc]
            [  491.544112]  [<ffffffffa084ac11>] mdc_enqueue_base+0x1771/0x19d0 [mdc]
            [  491.544117]  [<ffffffffa084b5bf>] mdc_intent_lock+0x12f/0x550 [mdc]
            [  491.544122]  [<ffffffffa0514586>] lmv_intent_lookup+0x226/0xb40 [lmv]
            [  491.544125]  [<ffffffffa0515209>] lmv_intent_lock+0x369/0xba0 [lmv]
            [  491.544137]  [<ffffffffa0e238cf>] ll_layout_intent+0x20f/0x770 [lustre]
            [  491.544147]  [<ffffffffa0e34d98>] ll_layout_write_intent+0x58/0x110 [lustre]
            [  491.544159]  [<ffffffffa0e72d7d>] vvp_io_fini+0x20d/0x6b0 [lustre]
            [  491.544169]  [<ffffffffa0e732a5>] vvp_io_setattr_fini+0x45/0x80 [lustre]
            [  491.544204]  [<ffffffffa036ab85>] cl_io_fini+0x75/0x250 [obdclass]
            [  491.544213]  [<ffffffffa0e6bda4>] cl_setattr_ost+0x154/0x3c0 [lustre]
            [  491.544221]  [<ffffffffa0e48408>] ll_setattr_raw+0xe78/0xed0 [lustre]
            [  491.544228]  [<ffffffffa0e484cc>] ll_setattr+0x6c/0xd0 [lustre]
            [  491.544231]  [<ffffffff8122cd41>] notify_change+0x2c1/0x420
            [  491.544233]  [<ffffffff8120b7f5>] do_truncate+0x75/0xc0
            [  491.544235]  [<ffffffff8120bbf0>] do_sys_ftruncate.constprop.19+0x120/0x180
            [  491.544236]  [<ffffffff8120bc8e>] SyS_ftruncate+0xe/0x10
            [  491.544239]  [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b
            [  491.544242]  [<ffffffffffffffff>] 0xffffffffffffffff
            
            green Oleg Drokin added a comment - Still happening with 2.12 too. [ 491.544077] LustreError: 25274:0:(mdc_locks.c:202:mdc_clear_replay_flag()) @@@ transno returned on error rc -5 req@ffff8800912e6c80 x1610652424649696/t4295299960(4295299960) o101->lustre-MDT0000-mdc-ffff88005ad40800@0@lo:12/10 lens 376/312 e 0 to 0 dl 1536038359 ref 1 fl Complete:R/0/0 rc 0/0 [ 491.544079] LustreError: 25274:0:(mdc_locks.c:203:mdc_clear_replay_flag()) LBUG [ 491.544080] Pid: 25274, comm: dd 3.10.0-7.5-debug #2 SMP Sat Jun 2 18:41:57 EDT 2018 [ 491.544081] Call Trace: [ 491.544097] [<ffffffffa01a67dc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 491.544102] [<ffffffffa01a688c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 491.544108] [<ffffffffa0853049>] osc_env_info.part.16+0x0/0x39 [mdc] [ 491.544112] [<ffffffffa084ac11>] mdc_enqueue_base+0x1771/0x19d0 [mdc] [ 491.544117] [<ffffffffa084b5bf>] mdc_intent_lock+0x12f/0x550 [mdc] [ 491.544122] [<ffffffffa0514586>] lmv_intent_lookup+0x226/0xb40 [lmv] [ 491.544125] [<ffffffffa0515209>] lmv_intent_lock+0x369/0xba0 [lmv] [ 491.544137] [<ffffffffa0e238cf>] ll_layout_intent+0x20f/0x770 [lustre] [ 491.544147] [<ffffffffa0e34d98>] ll_layout_write_intent+0x58/0x110 [lustre] [ 491.544159] [<ffffffffa0e72d7d>] vvp_io_fini+0x20d/0x6b0 [lustre] [ 491.544169] [<ffffffffa0e732a5>] vvp_io_setattr_fini+0x45/0x80 [lustre] [ 491.544204] [<ffffffffa036ab85>] cl_io_fini+0x75/0x250 [obdclass] [ 491.544213] [<ffffffffa0e6bda4>] cl_setattr_ost+0x154/0x3c0 [lustre] [ 491.544221] [<ffffffffa0e48408>] ll_setattr_raw+0xe78/0xed0 [lustre] [ 491.544228] [<ffffffffa0e484cc>] ll_setattr+0x6c/0xd0 [lustre] [ 491.544231] [<ffffffff8122cd41>] notify_change+0x2c1/0x420 [ 491.544233] [<ffffffff8120b7f5>] do_truncate+0x75/0xc0 [ 491.544235] [<ffffffff8120bbf0>] do_sys_ftruncate.constprop.19+0x120/0x180 [ 491.544236] [<ffffffff8120bc8e>] SyS_ftruncate+0xe/0x10 [ 491.544239] [<ffffffff817838e9>] system_call_fastpath+0x16/0x1b [ 491.544242] [<ffffffffffffffff>] 0xffffffffffffffff

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: