Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7232

racer deadlock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      I am hitting deadlocks in racer now (no DNE) that started very recently, though notging in the affected path seems to have changed for quite a while which is really strange.

      Traces look like this:

      [282420.297213] LNet: Service thread pid 19612 was inactive for 62.01s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [282420.298976] Pid: 19612, comm: mdt01_009
      [282420.301584] 
      [282420.301585] Call Trace:
      [282420.302572]  [<ffffffffa189d750>] ? _ldlm_lock_debug+0x300/0x690 [ptlrpc]
      [282420.303111]  [<ffffffff81530c14>] ? _spin_lock_irqsave+0x24/0x30
      [282420.307584]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
      [282420.308111]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
      [282420.309183]  [<ffffffffa18beac0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
      [282420.310721]  [<ffffffffa18c3709>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc]
      [282420.311557]  [<ffffffff81063a80>] ? default_wake_function+0x0/0x20
      [282420.312337]  [<ffffffffa18c2a6e>] ldlm_cli_enqueue_local+0x21e/0x8b0 [ptlrpc]
      [282420.313710]  [<ffffffffa18c3100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [282420.314663]  [<ffffffffa09f30d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [282420.315355]  [<ffffffffa09ff63b>] mdt_object_local_lock+0x1bb/0xa80 [mdt]
      [282420.316058]  [<ffffffffa09f30d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [282420.316914]  [<ffffffffa18c3100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [282420.317818]  [<ffffffffa0f750f1>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      [282420.318513]  [<ffffffffa0a0026f>] mdt_object_lock_internal+0x5f/0x2d0 [mdt]
      [282420.319335]  [<ffffffffa0a005a1>] mdt_object_lock+0x11/0x20 [mdt]
      [282420.320011]  [<ffffffffa0a18371>] mdt_reint_unlink+0x831/0x10e0 [mdt]
      [282420.320838]  [<ffffffffa0f90100>] ? lu_ucred+0x20/0x30 [obdclass]
      [282420.321531]  [<ffffffffa09f2245>] ? mdt_ucred+0x15/0x20 [mdt]
      [282420.322188]  [<ffffffffa0a0b71c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
      [282420.323033]  [<ffffffffa0a0f85d>] mdt_reint_rec+0x5d/0x200 [mdt]
      [282420.323843]  [<ffffffffa09f8c1b>] mdt_reint_internal+0x62b/0xa40 [mdt]
      [282420.324529]  [<ffffffffa09f94cb>] mdt_reint+0x6b/0x120 [mdt]
      [282420.325248]  [<ffffffffa195685c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
      [282420.326144]  [<ffffffffa1901b74>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
      [282420.327006]  [<ffffffffa1900e00>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      [282420.327769]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
      [282420.328515]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
      [282420.333000]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
      [282420.333702]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
      [282420.334388] 
      [282420.341475] LustreError: dumping log to /tmp/lustre-log.1443344508.19612
      [282422.536531] LNet: Service thread pid 28020 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [282422.538633] Pid: 28020, comm: mdt00_000
      [282422.539095] 
      [282422.539095] Call Trace:
      [282422.550545]  [<ffffffffa189d750>] ? _ldlm_lock_debug+0x300/0x690 [ptlrpc]
      [282422.551181]  [<ffffffff81530c14>] ? _spin_lock_irqsave+0x24/0x30
      [282422.551798]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
      [282422.552330]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
      [282422.552959]  [<ffffffffa18beac0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
      [282422.554046]  [<ffffffffa18c3709>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc]
      [282422.554605]  [<ffffffff81063a80>] ? default_wake_function+0x0/0x20
      [282422.555195]  [<ffffffffa18c2a6e>] ldlm_cli_enqueue_local+0x21e/0x8b0 [ptlrpc]
      [282422.556246]  [<ffffffffa18c3100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [282422.557344]  [<ffffffffa09f30d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [282422.557890]  [<ffffffffa09ff63b>] mdt_object_local_lock+0x1bb/0xa80 [mdt]
      [282422.558695]  [<ffffffffa09f30d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [282422.559262]  [<ffffffffa18c3100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [282422.559915]  [<ffffffffa0f750f1>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      [282422.560732]  [<ffffffffa0a0026f>] mdt_object_lock_internal+0x5f/0x2d0 [mdt]
      [282422.561378]  [<ffffffffa0a005a1>] mdt_object_lock+0x11/0x20 [mdt]
      [282422.561941]  [<ffffffffa0a06ee6>] mdt_getattr_name_lock+0xf16/0x1910 [mdt]
      [282422.562501]  [<ffffffffa0a0be99>] ? old_init_ucred+0x1b9/0x390 [mdt]
      [282422.563036]  [<ffffffffa0a07e02>] mdt_intent_getattr+0x292/0x470 [mdt]
      [282422.563592]  [<ffffffffa09f7694>] mdt_intent_policy+0x494/0xc40 [mdt]
      [282422.564269]  [<ffffffffa18a311f>] ldlm_lock_enqueue+0x12f/0x860 [ptlrpc]
      [282422.564834]  [<ffffffffa0844c01>] ? cfs_hash_for_each_enter+0x11/0xa0 [libcfs]
      [282422.565795]  [<ffffffffa18cf067>] ldlm_handle_enqueue0+0x807/0x1580 [ptlrpc]
      [282422.566376]  [<ffffffffa1943c61>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc]
      [282422.566950]  [<ffffffffa1955dd1>] tgt_enqueue+0x61/0x230 [ptlrpc]
      [282422.567501]  [<ffffffffa195685c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
      [282422.568062]  [<ffffffffa1901b74>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
      [282422.568619]  [<ffffffffa1900e00>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      [282422.569140]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
      [282422.569757]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
      [282422.570246]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
      [282422.570836]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
      [282422.571377] 
      [282422.571926] LustreError: dumping log to /tmp/lustre-log.1443344510.28020
      ...
      [283258.816371] LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 901s: evicting client at 0@lo  ns: mdt-lustre-MDT0000_UUID lock: ffff8800b7997db8/0x116f98c7376ef4db lrc: 3/0,0 mode: PR/PR res: [0x200000402:0x256c:0x0].0 bits 0x1b rrc: 3 type: IBT flags: 0x60200000000020 nid: 0@lo remote: 0x116f98c7376ef4b1 expref: 1380 pid: 28373 timeout: 4365706867 lvb_type: 0
      [283258.831059] LustreError: 28025:0:(ldlm_lockd.c:1404:ldlm_handle_enqueue0()) ### lock on destroyed export ffff880070a307f0 ns: mdt-lustre-MDT0000_UUID lock: ffff880036b72db8/0x116f98c737716a4f lrc: 3/0,0 mode: PR/PR res: [0x200000401:0x25ee:0x0].0 bits 0x13 rrc: 18 type: IBT flags: 0x50200000000000 nid: 0@lo remote: 0x116f98c737716a33 expref: 373 pid: 28025 timeout: 0 lvb_type: 0
      [283258.831132] Lustre: 19612:0:(service.c:2097:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600:301s); client may timeout.  req@ffff88008508ace8 x1513454483028456/t4295116216(0) o36->b54de1b7-f374-b4f0-2bb1-81a9b6732569@0@lo:221/0 lens 608/424 e 23 to 0 dl 1443345046 ref 1 fl Complete:/0/0 rc 0/0
      [283258.832328] LNet: Service thread pid 19612 completed after 900.54s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      

      This actually might indicate that something on the client is holdign the lock, I guess.

      Another instance:

      [14352.684182] Pid: 32225, comm: mdt00_005
      [14352.685471] 
      [14352.685473] Call Trace:
      [14352.687868]  [<ffffffffa13ac750>] ? _ldlm_lock_debug+0x300/0x690 [ptlrpc]
      [14352.689215]  [<ffffffff81530c14>] ? _spin_lock_irqsave+0x24/0x30
      [14352.690435]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
      [14352.691589]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
      [14352.692847]  [<ffffffffa13cdac0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
      [14352.695227]  [<ffffffffa13d2709>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc]
      [14352.696517]  [<ffffffff81063a80>] ? default_wake_function+0x0/0x20
      [14352.697885]  [<ffffffffa13d1a6e>] ldlm_cli_enqueue_local+0x21e/0x8b0 [ptlrpc]
      [14352.699005]  [<ffffffffa13d2100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [14352.700185]  [<ffffffffa08a00d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [14352.701376]  [<ffffffffa08ac63b>] mdt_object_local_lock+0x1bb/0xa80 [mdt]
      [14352.702496]  [<ffffffffa08a00d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [14352.703696]  [<ffffffffa13d2100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [14352.704971]  [<ffffffffa11e00f1>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      [14352.707110]  [<ffffffffa08ad26f>] mdt_object_lock_internal+0x5f/0x2d0 [mdt]
      [14352.708140]  [<ffffffffa08ad5a1>] mdt_object_lock+0x11/0x20 [mdt]
      [14352.709256]  [<ffffffffa08c5371>] mdt_reint_unlink+0x831/0x10e0 [mdt]
      [14352.710261]  [<ffffffffa11fb100>] ? lu_ucred+0x20/0x30 [obdclass]
      [14352.711259]  [<ffffffffa089f245>] ? mdt_ucred+0x15/0x20 [mdt]
      [14352.712261]  [<ffffffffa08b871c>] ? mdt_root_squash+0x2c/0x3f0 [mdt]
      [14352.713320]  [<ffffffffa08bc85d>] mdt_reint_rec+0x5d/0x200 [mdt]
      [14352.714268]  [<ffffffffa08a5c1b>] mdt_reint_internal+0x62b/0xa40 [mdt]
      [14352.715290]  [<ffffffffa08a64cb>] mdt_reint+0x6b/0x120 [mdt]
      [14352.716257]  [<ffffffffa146585c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
      [14352.717317]  [<ffffffffa1410b74>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
      [14352.718264]  [<ffffffffa140fe00>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      [14352.719187]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
      [14352.720006]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
      [14352.720856]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
      [14352.721748]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
      [14352.722571] 
      [14352.723270] LustreError: dumping log to /tmp/lustre-log.1443076451.32225
      [14352.860186] Pid: 30999, comm: mdt00_000
      [14352.860651] 
      [14352.860652] Call Trace:
      [14352.861604]  [<ffffffffa13ac750>] ? _ldlm_lock_debug+0x300/0x690 [ptlrpc]
      [14352.862129]  [<ffffffff81530c14>] ? _spin_lock_irqsave+0x24/0x30
      [14352.862628]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
      [14352.863133]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
      [14352.863656]  [<ffffffffa13cdac0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
      [14352.864646]  [<ffffffffa13d2709>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc]
      [14352.865201]  [<ffffffff81063a80>] ? default_wake_function+0x0/0x20
      [14352.865735]  [<ffffffffa13d1a6e>] ldlm_cli_enqueue_local+0x21e/0x8b0 [ptlrpc]
      [14352.866279]  [<ffffffffa13d2100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [14352.866815]  [<ffffffffa08a00d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [14352.867337]  [<ffffffffa08ac63b>] mdt_object_local_lock+0x1bb/0xa80 [mdt]
      [14352.867861]  [<ffffffffa08a00d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [14352.868392]  [<ffffffffa13d2100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [14352.868942]  [<ffffffffa11e00f1>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      [14352.869511]  [<ffffffffa08ad26f>] mdt_object_lock_internal+0x5f/0x2d0 [mdt]
      [14352.870048]  [<ffffffffa08ad5a1>] mdt_object_lock+0x11/0x20 [mdt]
      [14352.870577]  [<ffffffffa08b3ee6>] mdt_getattr_name_lock+0xf16/0x1910 [mdt]
      [14352.871098]  [<ffffffffa08b8e99>] ? old_init_ucred+0x1b9/0x390 [mdt]
      [14352.871632]  [<ffffffffa08b4e02>] mdt_intent_getattr+0x292/0x470 [mdt]
      [14352.872171]  [<ffffffffa08a4694>] mdt_intent_policy+0x494/0xc40 [mdt]
      [14352.872731]  [<ffffffffa13b211f>] ldlm_lock_enqueue+0x12f/0x860 [ptlrpc]
      [14352.873280]  [<ffffffffa10d6c01>] ? cfs_hash_for_each_enter+0x11/0xa0 [libcfs]
      [14352.874264]  [<ffffffffa13de067>] ldlm_handle_enqueue0+0x807/0x1580 [ptlrpc]
      [14352.874852]  [<ffffffffa1452c61>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc]
      [14352.875387]  [<ffffffffa1464dd1>] tgt_enqueue+0x61/0x230 [ptlrpc]
      [14352.875916]  [<ffffffffa146585c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
      [14352.876447]  [<ffffffffa1410b74>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
      [14352.876972]  [<ffffffffa140fe00>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      [14352.877486]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
      [14352.878027]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
      [14352.878501]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
      [14352.878999]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
      [14352.879508] 
      [14352.904289] Pid: 32083, comm: mdt00_004
      [14352.905061] 
      [14352.905062] Call Trace:
      [14352.906744]  [<ffffffffa13ac750>] ? _ldlm_lock_debug+0x300/0x690 [ptlrpc]
      [14352.907771]  [<ffffffff81530c14>] ? _spin_lock_irqsave+0x24/0x30
      [14352.908749]  [<ffffffff8152d911>] schedule_timeout+0x191/0x2e0
      [14352.909837]  [<ffffffff81088290>] ? process_timeout+0x0/0x10
      [14352.929740]  [<ffffffffa13cdac0>] ? ldlm_expired_completion_wait+0x0/0x370 [ptlrpc]
      [14352.931364]  [<ffffffffa13d2709>] ldlm_completion_ast+0x609/0x9b0 [ptlrpc]
      [14352.932198]  [<ffffffff81063a80>] ? default_wake_function+0x0/0x20
      [14352.933066]  [<ffffffffa13d1a6e>] ldlm_cli_enqueue_local+0x21e/0x8b0 [ptlrpc]
      [14352.934005]  [<ffffffffa13d2100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [14352.934895]  [<ffffffffa08a00d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [14352.935734]  [<ffffffffa08ac63b>] mdt_object_local_lock+0x1bb/0xa80 [mdt]
      [14352.936567]  [<ffffffffa08a00d0>] ? mdt_blocking_ast+0x0/0x2a0 [mdt]
      [14352.937429]  [<ffffffffa13d2100>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [14352.938325]  [<ffffffffa11e00f1>] ? lu_object_find_at+0xb1/0xe0 [obdclass]
      [14352.939119]  [<ffffffffa08ad26f>] mdt_object_lock_internal+0x5f/0x2d0 [mdt]
      [14352.939954]  [<ffffffffa08ad5a1>] mdt_object_lock+0x11/0x20 [mdt]
      [14352.940772]  [<ffffffffa08b3ee6>] mdt_getattr_name_lock+0xf16/0x1910 [mdt]
      [14352.941575]  [<ffffffffa08b8e99>] ? old_init_ucred+0x1b9/0x390 [mdt]
      [14352.942437]  [<ffffffffa08b4e02>] mdt_intent_getattr+0x292/0x470 [mdt]
      [14352.943210]  [<ffffffffa08a4694>] mdt_intent_policy+0x494/0xc40 [mdt]
      [14352.944072]  [<ffffffffa13b211f>] ldlm_lock_enqueue+0x12f/0x860 [ptlrpc]
      [14352.944902]  [<ffffffffa10d6c01>] ? cfs_hash_for_each_enter+0x11/0xa0 [libcfs]
      [14352.946718]  [<ffffffffa13de067>] ldlm_handle_enqueue0+0x807/0x1580 [ptlrpc]
      [14352.947607]  [<ffffffffa1452c61>] ? tgt_lookup_reply+0x31/0x190 [ptlrpc]
      [14352.948420]  [<ffffffffa1464dd1>] tgt_enqueue+0x61/0x230 [ptlrpc]
      [14352.949236]  [<ffffffffa146585c>] tgt_request_handle+0x8bc/0x12e0 [ptlrpc]
      [14352.950061]  [<ffffffffa1410b74>] ptlrpc_main+0xd74/0x1850 [ptlrpc]
      [14352.950873]  [<ffffffffa140fe00>] ? ptlrpc_main+0x0/0x1850 [ptlrpc]
      [14352.951644]  [<ffffffff8109f82e>] kthread+0x9e/0xc0
      [14352.952376]  [<ffffffff8100c2ca>] child_rip+0xa/0x20
      [14352.953105]  [<ffffffff8109f790>] ? kthread+0x0/0xc0
      [14352.953879]  [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
      [14352.954613] 
      ...
      [14885.680137] Lustre: 2487:0:(service.c:1336:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
      [14885.680141]   req@ffff880079253ce8 x1513175160155132/t0(0) o36->a8b0df4a-0659-2218-5d9c-1d72e2e0bf80@0@lo:189/0 lens 608/3128 e 23 to 0 dl 1443076989 ref 2 fl Interpret:/0/0 rc 0/0
      [14891.680180] Lustre: 28828:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1443076389/real 1443076389]  req@ffff88002fe14ce8 x1513175160155132/t0(0) o36->lustre-MDT0000-mdc-ffff8800871fa7f0@0@lo:12/10 lens 608/856 e 23 to 1 dl 1443076990 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [14891.688920] Lustre: lustre-MDT0000-mdc-ffff8800871fa7f0: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      [14891.693285] Lustre: lustre-MDT0000: Client a8b0df4a-0659-2218-5d9c-1d72e2e0bf80 (at 0@lo) reconnecting
      [14891.695699] Lustre: lustre-MDT0000: Connection restored to 192.168.10.218@tcp (at 0@lo)
      [14891.697793] Lustre: Skipped 5 previous similar messages
      [15190.816179] LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 900s: evicting client at 0@lo  ns: mdt-lustre-MDT0000_UUID lock: ffff8800140f0db8/0x24d4ac7f4cf853c0 lrc: 3/0,0 mode: PR/PR res: [0x200000401:0xcc7:0x0].0 bits 0x1b rrc: 3 type: IBT flags: 0x60200000000020 nid: 0@lo remote: 0x24d4ac7f4cf8538f expref: 431 pid: 32765 timeout: 4298689966 lvb_type: 0
      [15190.835968] Lustre: 32225:0:(service.c:2097:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (600:300s); client may timeout.  req@ffff880079253ce8 x1513175160155132/t4295000184(0) o36->a8b0df4a-0659-2218-5d9c-1d72e2e0bf80@0@lo:189/0 lens 608/424 e 23 to 0 dl 1443076989 ref 1 fl Complete:/0/0 rc 0/0
      [15190.837500] LustreError: 30999:0:(ldlm_lockd.c:1404:ldlm_handle_enqueue0()) ### lock on destroyed export ffff88009a5237f0 ns: mdt-lustre-MDT0000_UUID lock: ffff8800198b2db8/0x24d4ac7f4cf86bde lrc: 3/0,0 mode: PR/PR res: [0x200000401:0xc2b:0x0].0 bits 0x13 rrc: 2 type: IBT flags: 0x50200000000000 nid: 0@lo remote: 0x24d4ac7f4cf86bc2 expref: 5 pid: 30999 timeout: 0 lvb_type: 0
      [15190.837757] LustreError: 11-0: lustre-MDT0000-mdc-ffff8800871fa7f0: operation ldlm_enqueue to node 0@lo failed: rc = -107
      [15190.837770] Lustre: lustre-MDT0000-mdc-ffff8800871fa7f0: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
      [15190.837776] Lustre: Skipped 1 previous similar message
      [15190.837864] LNet: Service thread pid 30999 completed after 900.01s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [15190.838528] Lustre: lustre-MDT0000: Connection restored to 192.168.10.218@tcp (at 0@lo)
      [15190.838532] Lustre: Skipped 3 previous similar messages
      [15190.838756] LustreError: 167-0: lustre-MDT0000-mdc-ffff8800871fa7f0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [15190.842929] LustreError: 29000:0:(lmv_obd.c:1323:lmv_fid_alloc()) Can't alloc new fid, rc -19
      [15190.850803] LustreError: 29117:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000402:0xcee:0x0] mdc close failed: rc = -108
      [15190.850806] LustreError: 29117:0:(file.c:184:ll_close_inode_openhandle()) Skipped 2 previous similar messages
      [15190.853624] LustreError: 29065:0:(mdc_request.c:1283:mdc_read_page()) lustre-MDT0000-mdc-ffff8800871fa7f0: [0x200000401:0x1:0x0] lock enqueue fails: rc = -108
      [15190.858848] LustreError: 29117:0:(ldlm_resource.c:887:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff8800871fa7f0: namespace resource [0x200000401:0xcc7:0x0].0 (ffff880006543ef8) refcount nonzero (1) after lock cleanup; forcing cleanup.
      [15190.858853] LustreError: 29117:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x200000401:0xcc7:0x0].0 (ffff880006543ef8) refcount = 2
      [15190.858855] LustreError: 29117:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      [15190.858861] LustreError: 29117:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-mdc-ffff8800871fa7f0 lock: ffff88003fd3ddb8/0x24d4ac7f4cf8538f lrc: 2/1,0 mode: PR/PR res: [0x200000401:0xcc7:0x0].0 bits 0x1b rrc: 2 type: IBT flags: 0x526400000000 nid: local remote: 0x24d4ac7f4cf853c0 expref: -99 pid: 28865 timeout: 0 lvb_type: 3
      [15190.869860] Lustre: 32225:0:(service.c:2097:ptlrpc_server_handle_request()) Skipped 2 previous similar messages
      [15197.481908] LustreError: 31219:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000402:0xdd4:0x0] mdc close failed: rc = -13
      [15197.483545] LustreError: 31219:0:(file.c:184:ll_close_inode_openhandle()) Skipped 2 previous similar messages
      [15200.905742] LustreError: 817:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0x1f5:0x0]: -16
      [15210.349962] LustreError: 4266:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000402:0x10e7:0x0] mdc close failed: rc = -13
      [15210.369363] LustreError: 4266:0:(file.c:184:ll_close_inode_openhandle()) Skipped 2 previous similar messages
      [15228.546840] LustreError: 11861:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000402:0x1477:0x0]: -16
      [15253.218613] LustreError: 21121:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000403:0xac1:0x0] mdc close failed: rc = -13
      [15253.220182] LustreError: 21121:0:(file.c:184:ll_close_inode_openhandle()) Skipped 6 previous similar messages
      [15258.216446] LustreError: 23425:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000402:0x1a05:0x0]: -16
      [15289.724580] LustreError: 2807:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800b04e17f0: inode [0x200000402:0x1fad:0x0] mdc close failed: rc = -13
      [15289.726205] LustreError: 2807:0:(file.c:184:ll_close_inode_openhandle()) Skipped 6 previous similar messages
      [15373.365463] LustreError: 2566:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000402:0x2e62:0x0] mdc close failed: rc = -13
      [15373.367206] LustreError: 2566:0:(file.c:184:ll_close_inode_openhandle()) Skipped 12 previous similar messages
      [15395.126758] LustreError: 1585:0:(mdt_handler.c:895:mdt_getattr_internal()) lustre-MDT0000: getattr error for [0x200000403:0x237e:0x0]: rc = -2
      [15474.339855] LustreError: 8844:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000402:0x4094:0x0]: -16
      [15501.430962] LustreError: 18674:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000402:0x441d:0x0] mdc close failed: rc = -13
      [15501.433615] LustreError: 18674:0:(file.c:184:ll_close_inode_openhandle()) Skipped 24 previous similar messages
      [15648.026942] LustreError: 22568:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0x46d9:0x0]: -16
      [15773.082317] LustreError: 21578:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800871fa7f0: inode [0x200000402:0x6332:0x0] mdc close failed: rc = -13
      [15773.085200] LustreError: 21578:0:(file.c:184:ll_close_inode_openhandle()) Skipped 37 previous similar messages
      [15795.694420] LustreError: 32225:0:(mdd_object.c:70:mdd_la_get()) lustre-MDD0000: object [0x200000402:0x66f6:0x0] not found: rc = -2
      [15985.390214] LustreError: 1601:0:(mdd_object.c:70:mdd_la_get()) lustre-MDD0000: object [0x200000402:0x7cf5:0x0] not found: rc = -2
      [16285.219995] LustreError: 15290:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800b04e17f0: inode [0x200000403:0x9b4a:0x0] mdc close failed: rc = -13
      [16285.223121] LustreError: 15290:0:(file.c:184:ll_close_inode_openhandle()) Skipped 75 previous similar messages
      [16329.835537] LustreError: 32323:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0xa3e7:0x0]: -16
      [16445.032943] LustreError: 9902:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0xb71a:0x0]: -16
      [16506.129728] LustreError: 31397:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0xc166:0x0]: -16
      [16575.379907] LustreError: 23646:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0xcce2:0x0]: -16
      [16616.252674] LustreError: 6409:0:(lcommon_cl.c:185:cl_file_inode_init()) Failure to initialize cl object [0x200000403:0xd417:0x0]: -16
      [16770.245965] 4[29243]: segfault at 8 ip 00007fea27c3c3a3 sp 00007ffef4109070 error 4 in ld-2.12.so[7fea27c31000+20000]
      [16771.988668] 17[29822]: segfault at 8 ip 00007fc052daa3a3 sp 00007ffe66d195f0 error 4 in ld-2.12.so[7fc052d9f000+20000]
      [16889.764587] LustreError: 7954:0:(file.c:184:ll_close_inode_openhandle()) lustre-clilmv-ffff8800b04e17f0: inode [0x200000402:0x10e7e:0x0] mdc close failed: rc = -13
      [16889.766871] LustreError: 7954:0:(file.c:184:ll_close_inode_openhandle()) Skipped 96 previous similar messages
      [16928.134762] Lustre: DEBUG MARKER: == racer test complete, duration 2752 sec == 03:17:06 (1443079026)
      [16928.705843] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
      [16928.871543] LustreError: 20126:0:(ldlm_resource.c:887:ldlm_resource_complain()) lustre-MDT0000-mdc-ffff8800871fa7f0: namespace resource [0x200000401:0xcc7:0x0].0 (ffff880006543ef8) refcount nonzero (1) after lock cleanup; forcing cleanup.
      [16928.874285] LustreError: 20126:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x200000401:0xcc7:0x0].0 (ffff880006543ef8) refcount = 2
      [16928.875716] LustreError: 20126:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      [16928.876930] LustreError: 20126:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-mdc-ffff8800871fa7f0 lock: ffff88003fd3ddb8/0x24d4ac7f4cf8538f lrc: 2/1,0 mode: PR/PR res: [0x200000401:0xcc7:0x0].0 bits 0x1b rrc: 2 type: IBT flags: 0x526400000000 nid: local remote: 0x24d4ac7f4cf853c0 expref: -99 pid: 28865 timeout: 0 lvb_type: 3
      [16929.220780] LustreError: 20126:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x200000401:0xcc7:0x0].0 (ffff880006543ef8) refcount = 2
      [16929.222171] LustreError: 20126:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Granted locks (in reverse order):
      [16934.220047] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16939.220157] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16944.224128] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16949.228140] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16954.228154] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16964.228106] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16964.231892] LustreError: Skipped 1 previous similar message
      [16981.388377] Lustre: lustre-OST0000: haven't heard from client a8b0df4a-0659-2218-5d9c-1d72e2e0bf80 (at 0@lo) in 55 seconds. I think it's dead, and I am evicting it. exp ffff8800634167f0, cur 1443079080 expire 1443079050 last 1443079025
      [16982.108960] Lustre: lustre-MDT0000: haven't heard from client a8b0df4a-0659-2218-5d9c-1d72e2e0bf80 (at 0@lo) in 54 seconds. I think it's dead, and I am evicting it. exp ffff88003feef7f0, cur 1443079080 expire 1443079050 last 1443079026
      [16982.112411] Lustre: Skipped 1 previous similar message
      [16984.232172] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      [16984.236232] LustreError: Skipped 3 previous similar messages
      [17019.236099] LustreError: 0-0: Forced cleanup waiting for lustre-MDT0000-mdc-ffff8800871fa7f0 namespace with 1 resources in use, (rc=-110)
      

      The cleanup waiting is happening every time at the end

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: