Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11223

Changed resource in completion ast

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      Hit this in master next, but serching crash history reveals this was also hit in April and May. Current high suspect is lock conversion code.

      [10408.160302] Lustre: DEBUG MARKER: == sanityn test 90: open/create and unlink striped directory ========================================= 14:47:16 (1533581236)
      [10463.288146] rm (14401) used greatest stack depth: 10120 bytes left
      [10489.045922] LustreError: 2355:0:(ldlm_lockd.c:1799:ldlm_handle_cp_callback()) change resource!
      [10489.051802] LustreError: 2355:0:(ldlm_lock.c:1056:ldlm_granted_list_add_lock()) ASSERTION( list_empty(&lock->l_res_link) ) failed: 
      [10489.054140] LustreError: 2355:0:(ldlm_lock.c:1056:ldlm_granted_list_add_lock()) LBUG
      [10489.055686] CPU: 15 PID: 2355 Comm: ldlm_cb07_001 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.5-debug #1
      [10489.057781] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [10489.058791] Call Trace:
      [10489.059667]  [<ffffffff8176fc9a>] dump_stack+0x19/0x1b
      [10489.060709]  [<ffffffffa020f7c2>] libcfs_call_trace+0x72/0x80 [libcfs]
      [10489.061590]  [<ffffffffa020f84c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [10489.062666]  [<ffffffffa070c8f9>] ldlm_grant_lock_with_skiplist+0x6b9/0x760 [ptlrpc]
      [10489.065043]  [<ffffffffa070ca88>] ldlm_grant_lock+0xe8/0x270 [ptlrpc]
      [10489.072174]  [<ffffffffa07324c1>] ldlm_handle_cp_callback+0x281/0xb70 [ptlrpc]
      [10489.074293]  [<ffffffffa073a69e>] ldlm_callback_handler.part.27+0x154e/0x1de0 [ptlrpc]
      [10489.075988]  [<ffffffffa0215f97>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [10489.078543]  [<ffffffffa073af67>] ldlm_callback_handler+0x37/0xd0 [ptlrpc]
      [10489.079906]  [<ffffffffa0767ec6>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc]
      [10489.084875]  [<ffffffff810b9398>] ? __wake_up_common+0x58/0x90
      [10489.086238]  [<ffffffff813ccd2b>] ? do_raw_spin_unlock+0x4b/0x90
      [10489.087514]  [<ffffffffa076bcbe>] ptlrpc_main+0xabe/0x1f80 [ptlrpc]
      [10489.090181]  [<ffffffffa076b200>] ? ptlrpc_register_service+0xeb0/0xeb0 [ptlrpc]
      [10489.091872]  [<ffffffff810ae864>] kthread+0xe4/0xf0
      [10489.092674]  [<ffffffff810ae780>] ? kthread_create_on_node+0x140/0x140
      [10489.093572]  [<ffffffff81783777>] ret_from_fork_nospec_begin+0x21/0x21
      [10489.094415]  [<ffffffff810ae780>] ? kthread_create_on_node+0x140/0x140
      [10489.095692] Kernel panic - not syncing: LBUG
      

      Checking the crash logs we can see that it appears to be a race between completion ast and grant from cli enqueue both of which are doing a resource change (so there should be no cp ast as we ar eobviously performing an intent request):

      00010000:00010000:15.0:1533581317.463955:0:2355:0:(ldlm_lockd.c:1729:ldlm_handle_cp_callback()) ### client completion callback handler START ns: lustre-MDT0000-mdc-ffff8802672ed800 lock: ffff880066b22d80/0x3366c6fd8fb32490 lrc: 5/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x1/0x0 rrc: 3 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 15684 timeout: 0 lvb_type: 0
      00010000:00010000:13.0:1533581317.463959:0:11560:0:(ldlm_lock.c:748:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(CR) ns: ?? lock: ffff8801fe1c7d80/0x3366c6fd8fb324cf lrc: 3/1,0 mode: --/CR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 11560 timeout: 0 lvb_type: 0
      00010000:00010000:13.0:1533581317.463962:0:11560:0:(ldlm_request.c:942:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x1000 ns: lustre-MDT0000-mdc-ffff8802e1767800 lock: ffff8801fe1c7d80/0x3366c6fd8fb324cf lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 11560 timeout: 0 lvb_type: 0
      00010000:00010000:15.0:1533581317.463962:0:2355:0:(ldlm_lockd.c:1776:ldlm_handle_cp_callback()) ### completion AST, new lock mode ns: ?? lock: ffff880066b22d80/0x3366c6fd8fb32490 lrc: 5/1,0 mode: --/PR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 15684 timeout: 0 lvb_type: 0
      00010000:00010000:15.0:1533581317.463965:0:2355:0:(ldlm_lockd.c:1784:ldlm_handle_cp_callback()) ### completion AST, new policy data ns: ?? lock: ffff880066b22d80/0x3366c6fd8fb32490 lrc: 5/1,0 mode: --/PR res: ?? rrc=?? type: ??? flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 15684 timeout: 0 lvb_type: 0
      00010000:00010000:13.0:1533581317.463967:0:11560:0:(ldlm_request.c:1014:ldlm_cli_enqueue()) ### sending request ns: lustre-MDT0000-mdc-ffff8802e1767800 lock: ffff8801fe1c7d80/0x3366c6fd8fb324cf lrc: 3/1,0 mode: --/CR res: [0x200000007:0x1:0x0].0x0 bits 0x2/0x0 rrc: 3 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 11560 timeout: 0 lvb_type: 0
      00010000:00010000:15.0:1533581317.463974:0:2355:0:(ldlm_lockd.c:1798:ldlm_handle_cp_callback()) ### completion AST, new resource ns: lustre-MDT0000-mdc-ffff8802672ed800 lock: ffff880066b22d80/0x3366c6fd8fb32490 lrc: 5/1,0 mode: --/PR res: [0x240000403:0xc6d:0x0].0x0 bits 0x1/0x0 rrc: 2 type: IBT flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 15684 timeout: 0 lvb_type: 0
      00000100:00100000:13.0:1533581317.463975:0:11560:0:(client.c:1625:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc bash:67cbe2ec-b14d-235a-d5d5-d57298cdcb90:11560:1608073853012208:0@lo:101
      00010000:00020000:15.0:1533581317.463977:0:2355:0:(ldlm_lockd.c:1799:ldlm_handle_cp_callback()) change resource!
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: