Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13263

"(lu_ref.c:257:lu_ref_del()) ASSERTION( 0 ) failed" triggered by lu_ref_del() in osc_ldlm_glimpse_ast(), because no corresponding lu_ref_link posted, with recent master configured with USE_LU_REF defined

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This problem has occurred when running "auster sanity", with latest master configured with USE_LU_REF defined ("configure --enable-lu_ref").
      Here are the log and the backtrace for this crash/LBUG :

      crash> dmesg | less
      ………………………………….
      [73238.236340] Lustre: DEBUG MARKER: == sanity test 56w: check lfs_migrate -c stripe_count works ========================================== 11:58:03 (1582027083)
      [73240.207988] LustreError: 132905:0:(lu_ref.c:96:lu_ref_print()) lu_ref: ffff9757ff247ca0 4 0 ldlm_lock_new:495
      [73240.208120] LustreError: 146472:0:(lu_ref.c:98:lu_ref_print())      link: hash ffff9757ff242d00
      [73240.235163] LustreError: 132905:0:(lu_ref.c:96:lu_ref_print()) Skipped 11 previous similar messages
      [73240.294633] LustreError: 146472:0:(lu_ref.c:257:lu_ref_del()) ASSERTION( 0 ) failed:
      [73240.294636] LustreError: 146472:0:(lu_ref.c:257:lu_ref_del()) LBUG
      [73240.294639] Pid: 146472, comm: ldlm_cb01_009 3.10.0-862.14.4.el7_lustre_ClientSymlink_279c264.x86_64 #1 SMP Thu Oct 17 10:54:24 UTC 2019
      [73240.294639] Call Trace:
      [73240.294674]  [<ffffffffc0d670ec>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [73240.294685]  [<ffffffffc0d6719c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [73240.294764]  [<ffffffffc0aceab0>] lu_ref_set_at+0x0/0x160 [obdclass]
      [73240.294787]  [<ffffffffc104bae8>] osc_ldlm_glimpse_ast+0x128/0x510 [osc]
      [73240.294872]  [<ffffffffc0e36dcb>] ldlm_callback_handler.part.27+0xb0b/0x1e30 [ptlrpc]
      [73240.294934]  [<ffffffffc0e38127>] ldlm_callback_handler+0x37/0xd0 [ptlrpc]
      [73240.294994]  [<ffffffffc0e67d96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
      [73240.295056]  [<ffffffffc0e6be24>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
      [73240.295062]  [<ffffffffbdcbdf21>] kthread+0xd1/0xe0
      [73240.295068]  [<ffffffffbe3255f7>] ret_from_fork_nospec_end+0x0/0x39
      [73240.295095]  [<ffffffffffffffff>] 0xffffffffffffffff
      [73240.295096] Kernel panic - not syncing: LBUG
      [73240.295101] CPU: 26 PID: 146472 Comm: ldlm_cb01_009 Kdump: loaded Tainted: G        W IOE  ------------   3.10.0-862.14.4.el7_lustre_ClientSymlink_279c264.x86_64 #1
      [73240.295102] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
      [73240.295103] Call Trace:
      [73240.295109]  [<ffffffffbe313754>] dump_stack+0x19/0x1b
      [73240.295114]  [<ffffffffbe30d29f>] panic+0xe8/0x21f
      [73240.295127]  [<ffffffffc0d671eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [73240.295159]  [<ffffffffc0aceab0>] lu_ref_del+0x230/0x230 [obdclass]
      [73240.295172]  [<ffffffffc104bae8>] osc_ldlm_glimpse_ast+0x128/0x510 [osc]
      [73240.295216]  [<ffffffffc0e36dcb>] ldlm_callback_handler.part.27+0xb0b/0x1e30 [ptlrpc]
      [73240.295284]  [<ffffffffc0e38127>] ldlm_callback_handler+0x37/0xd0 [ptlrpc]
      [73240.295338]  [<ffffffffc0e67d96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
      [73240.295388]  [<ffffffffc0e6be24>] ptlrpc_main+0xbb4/0x1550 [ptlrpc]
      [73240.295440]  [<ffffffffc0e6b270>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
      [73240.295443]  [<ffffffffbdcbdf21>] kthread+0xd1/0xe0
      [73240.295446]  [<ffffffffbdcbde50>] ? insert_kthread_work+0x40/0x40
      [73240.295465]  [<ffffffffbe3255f7>] ret_from_fork_nospec_begin+0x21/0x21
      [73240.295467]  [<ffffffffbdcbde50>] ? insert_kthread_work+0x40/0x40
      (END)
      crash> bt
      PID: 146472  TASK: ffff9758562c4f10  CPU: 26  COMMAND: "ldlm_cb01_009"
       #0 [ffff9757d6a279e8] machine_kexec at ffffffffbdc62a0a
       #1 [ffff9757d6a27a48] __crash_kexec at ffffffffbdd166c2
       #2 [ffff9757d6a27b18] panic at ffffffffbe30d2aa
       #3 [ffff9757d6a27b98] lbug_with_loc at ffffffffc0d671eb [libcfs]
       #4 [ffff9757d6a27bf0] osc_ldlm_glimpse_ast at ffffffffc104bae8 [osc]
       #5 [ffff9757d6a27ca8] ldlm_callback_handler at ffffffffc0e36dcb [ptlrpc]
       #6 [ffff9757d6a27d20] ldlm_callback_handler at ffffffffc0e38127 [ptlrpc]
       #7 [ffff9757d6a27d38] ptlrpc_server_handle_request at ffffffffc0e67d96 [ptlrpc]
       #8 [ffff9757d6a27df0] ptlrpc_main at ffffffffc0e6be24 [ptlrpc]
       #9 [ffff9757d6a27ec8] kthread at ffffffffbdcbdf21
      crash> 
      

      Crash-dump analysis, along with concerned source code browsing, points to the fact that this problem could have been introduced by commit b3461d11dcb from LU-11670, where, in osc_ldlm_glimpse_ast() (in lustre/osc/osc_lock.c) LDLM_LOCK_PUT() is called after LDLM_LOCK_GET(), instead of LDLM_LOCK_RELEASE() or without a preceding lu_ref_add() !!….
      I will push a patch soon as a fix tentative for this problem.

      Attachments

        Activity

          People

            bruno Bruno Faccini (Inactive)
            bruno Bruno Faccini (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: