Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5362

(mdc_locks.c:235:mdc_clear_replay_flag()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • Lustre 2.7.0, Lustre 2.12.0, Lustre 2.13.0
    • 3
    • 14951

    Description

      In mdc_enqueue() if ldlm_cli_enqueue() fails but the request has a transno then I see the folllowing:

      [  257.072572] LustreError: 19779:0:(mdc_locks.c:904:mdc_enqueue()) @@@ transno returned on error rc -12  req@ffff8801f6191330 x1473823878678260/t4294967302(4294967302) o101->lustre-MDT0000-mdc-ffff8801f03e8ee0@0@lo:12/10 lens 632/544 e 0 to 0 dl 1405548101 ref 2 fl Complete:R/4/0 rc 0/0
      [  257.076802] LustreError: 19779:0:(mdc_locks.c:904:mdc_enqueue()) LBUG
      [  257.077920] Pid: 19779, comm: sys_open
      [  257.078613] 
      [  257.078614] Call Trace:
      [  257.079369]  [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      [  257.080581]  [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs]
      [  257.081681]  [<ffffffffa090ec36>] mdc_enqueue+0x1a26/0x1ab0 [mdc]
      [  257.082731]  [<ffffffffa090eea2>] mdc_intent_lock+0x1e2/0x52f [mdc]
      [  257.083853]  [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  257.085084]  [<ffffffffa0661b70>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
      [  257.086282]  [<ffffffffa08c8d1b>] lmv_intent_open+0x31b/0x9f0 [lmv]
      [  257.087380]  [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  257.088552]  [<ffffffffa08c96d0>] lmv_intent_lock+0x2e0/0x1180 [lmv]
      [  257.089691]  [<ffffffffa041c5e6>] ? obd_alloc_fail+0x56/0x4e0 [obdclass]
      [  257.090846]  [<ffffffff81186c34>] ? kmem_cache_alloc_trace+0x254/0x2b0
      [  257.092020]  [<ffffffffa0e76f79>] ? ll_i2suppgid+0x19/0x30 [lustre]
      [  257.093145]  [<ffffffffa0e76fbe>] ? ll_i2gids+0x2e/0xd0 [lustre]
      [  257.094185]  [<ffffffffa0e79110>] ? ll_md_blocking_ast+0x0/0x800 [lustre]
      [  257.095381]  [<ffffffffa0e7ad1a>] ll_lookup_it+0x25a/0xad0 [lustre]
      [  257.096489]  [<ffffffffa0e7b61c>] ll_lookup_nd+0x8c/0x4a0 [lustre]
      [  257.097564]  [<ffffffff811b29b5>] do_lookup+0x1a5/0x230
      [  257.098477]  [<ffffffff811b2fc4>] __link_path_walk+0x584/0x840
      [  257.099492]  [<ffffffff811b398a>] path_walk+0x6a/0xe0
      [  257.100401]  [<ffffffff811b3b9b>] filename_lookup+0x6b/0xc0
      [  257.101374]  [<ffffffff811b5074>] do_filp_open+0x104/0xd30
      [  257.102348]  [<ffffffff811c255b>] ? alloc_fd+0x3b/0x160
      [  257.103249]  [<ffffffff811c255b>] ? alloc_fd+0x3b/0x160
      [  257.104156]  [<ffffffff815547cb>] ? _spin_unlock+0x2b/0x40
      [  257.105111]  [<ffffffff811c25cb>] ? alloc_fd+0xab/0x160
      [  257.105978]  [<ffffffff8119f809>] do_sys_open+0x69/0x140
      [  257.106901]  [<ffffffff81554222>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [  257.108009]  [<ffffffff8119f920>] sys_open+0x20/0x30
      [  257.108903]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      [  257.109934] 
      

      Note that there are several ways that ldlm_cli_enqueue() may fail even if the request succeeded. Failure from ldlm_cli_enqueue_fini() is one example. This was found via memory allocation fault injection, specifically failing in ldlm_resource_new().

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: