Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6777

replay-single test 34 hung in tgt_txn_stop_cb

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Hit this today, other tests might be affected too:

      [35924.999839] Lustre: DEBUG MARKER: == replay-single test 34: abort recovery before client does replay (test mds_cleanup_orphans) == 20:34:40 (1435624480)
      [35932.384654] Lustre: lustre-OST0000: Export ffff8800b45b67f0 already connecting from 0@lo
      [35937.384931] Lustre: lustre-OST0000: Export ffff8800b45b67f0 already connecting from 0@lo
      [35942.385306] Lustre: lustre-OST0000: Export ffff8800b45b67f0 already connecting from 0@lo
      [35947.384695] Lustre: lustre-OST0000: Export ffff8800b45b67f0 already connecting from 0@lo
      [35952.385167] Lustre: lustre-OST0000: Export ffff8800b45b67f0 already connecting from 0@lo
      [35955.528070] LNet: Service thread pid 31469 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [35955.528898] Pid: 31469, comm: ll_ost00_006
      [35955.529197] 
      [35955.529198] Call Trace:
      [35955.529613]  [<ffffffff8117394a>] ? cache_alloc_debugcheck_after+0x14a/0x210
      [35955.529898]  [<ffffffff8151ff50>] __mutex_lock_slowpath+0x120/0x2e0
      [35955.530201]  [<ffffffff81520141>] mutex_lock+0x31/0x50
      [35955.530511]  [<ffffffffa175e970>] tgt_txn_stop_cb+0x380/0xd60 [ptlrpc]
      [35955.530788]  [<ffffffff8151fabe>] ? mutex_unlock+0xe/0x10
      [35955.531104]  [<ffffffffa0e6eace>] dt_txn_hook_stop+0x5e/0x90 [obdclass]
      [35955.531402]  [<ffffffffa0771730>] osd_trans_stop+0x190/0x590 [osd_ldiskfs]
      [35955.531692]  [<ffffffffa0784c85>] ? osd_object_destroy+0x295/0x680 [osd_ldiskfs]
      [35955.532199]  [<ffffffffa0a9ecef>] ofd_trans_stop+0x1f/0x60 [ofd]
      [35955.532489]  [<ffffffffa0aa10c1>] ofd_object_destroy+0x2d1/0x8e0 [ofd]
      [35955.532774]  [<ffffffffa0a9b15d>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
      [35955.533139]  [<ffffffffa16d88d0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
      [35955.533457]  [<ffffffffa16da210>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [35955.533749]  [<ffffffffa0a949da>] ofd_destroy_hdl+0x2fa/0xb60 [ofd]
      [35955.534076]  [<ffffffffa176bbfe>] tgt_request_handle+0xa2e/0x1230 [ptlrpc]
      [35955.534398]  [<ffffffffa1718d64>] ptlrpc_main+0xe94/0x19e0 [ptlrpc]
      [35955.534699]  [<ffffffffa1717ed0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      [35955.534972]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
      [35955.535293]  [<ffffffff8100c24a>] child_rip+0xa/0x20
      [35955.535553]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      [35955.535802]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      [35955.536051] 
      [35955.536307] LustreError: dumping log to /tmp/lustre-log.1435624510.31469
      [35955.558142] Pid: 30313, comm: ll_ost00_005
      [35955.558431] 
      [35955.558432] Call Trace:
      [35955.558878]  [<ffffffff8104e658>] ? __change_page_attr_set_clr+0x808/0xcc0
      [35955.559167]  [<ffffffffa037a0f1>] start_this_handle+0x291/0x4b0 [jbd2]
      [35955.559475]  [<ffffffffa037a4b1>] ? jbd2_journal_start+0x81/0x100 [jbd2]
      [35955.559762]  [<ffffffff8109d2d0>] ? autoremove_wake_function+0x0/0x40
      [35955.560067]  [<ffffffffa037a4e5>] jbd2_journal_start+0xb5/0x100 [jbd2]
      [35955.560397]  [<ffffffffa0722296>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
      [35955.560977]  [<ffffffffa0771d0f>] osd_trans_start+0x1df/0x660 [osd_ldiskfs]
      [35955.561326]  [<ffffffffa175fabd>] tgt_client_data_update+0x29d/0x680 [ptlrpc]
      [35955.561710]  [<ffffffffa1760122>] tgt_client_del+0x282/0x600 [ptlrpc]
      [35955.562072]  [<ffffffffa0aacb83>] ? ofd_grant_discard+0xb3/0x1c0 [ofd]
      [35955.562360]  [<ffffffffa0a95f8b>] ofd_obd_disconnect+0x1bb/0x200 [ofd]
      [35955.562695]  [<ffffffffa16c27f1>] target_handle_disconnect+0x1b1/0x480 [ptlrpc]
      [35955.563244]  [<ffffffffa176a729>] tgt_disconnect+0x39/0x160 [ptlrpc]
      [35955.563569]  [<ffffffffa176bbfe>] tgt_request_handle+0xa2e/0x1230 [ptlrpc]
      [35955.563900]  [<ffffffffa1718d64>] ptlrpc_main+0xe94/0x19e0 [ptlrpc]
      [35955.564228]  [<ffffffffa1717ed0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      [35955.564511]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
      [35955.564823]  [<ffffffff8100c24a>] child_rip+0xa/0x20
      [35955.565124]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      [35955.565400]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      [35955.566732] 
      [35955.566924] Pid: 31471, comm: ll_ost00_008
      [35955.567187] 
      [35955.567188] Call Trace:
      [35955.567590]  [<ffffffffa0381ab5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
      [35955.567899]  [<ffffffff8109d2d0>] ? autoremove_wake_function+0x0/0x40
      [35955.568211]  [<ffffffff8152245e>] ? _spin_unlock+0xe/0x10
      [35955.568495]  [<ffffffffa03799c4>] jbd2_journal_stop+0x1e4/0x2b0 [jbd2]
      [35955.568857]  [<ffffffffa0722208>] __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
      [35955.569215]  [<ffffffffa0771772>] osd_trans_stop+0x1d2/0x590 [osd_ldiskfs]
      [35955.569549]  [<ffffffffa0784c85>] ? osd_object_destroy+0x295/0x680 [osd_ldiskfs]
      [35955.570076]  [<ffffffffa0a9ecef>] ofd_trans_stop+0x1f/0x60 [ofd]
      [35955.570370]  [<ffffffffa0aa10c1>] ofd_object_destroy+0x2d1/0x8e0 [ofd]
      [35955.570653]  [<ffffffffa0a9b15d>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
      [35955.570998]  [<ffffffffa16d88d0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
      [35955.571342]  [<ffffffffa16da210>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [35955.572135]  [<ffffffffa0a949da>] ofd_destroy_hdl+0x2fa/0xb60 [ofd]
      [35955.572478]  [<ffffffffa176bbfe>] tgt_request_handle+0xa2e/0x1230 [ptlrpc]
      [35955.572837]  [<ffffffffa1718d64>] ptlrpc_main+0xe94/0x19e0 [ptlrpc]
      [35955.573166]  [<ffffffffa1717ed0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      [35955.573487]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
      [35955.573755]  [<ffffffff8100c24a>] child_rip+0xa/0x20
      [35955.574008]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      [35955.574290]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      [35955.574546] 
      [35955.574735] Pid: 25759, comm: ll_ost01_004
      [35955.574964] 
      [35955.574965] Call Trace:
      [35955.575404]  [<ffffffffa0381ab5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
      [35955.575680]  [<ffffffff8109d2d0>] ? autoremove_wake_function+0x0/0x40
      [35955.575953]  [<ffffffff8152245e>] ? _spin_unlock+0xe/0x10
      [35955.576247]  [<ffffffffa03799c4>] jbd2_journal_stop+0x1e4/0x2b0 [jbd2]
      [35955.576564]  [<ffffffffa0722208>] __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]
      [35955.576888]  [<ffffffffa0771772>] osd_trans_stop+0x1d2/0x590 [osd_ldiskfs]
      [35955.577182]  [<ffffffffa0784c85>] ? osd_object_destroy+0x295/0x680 [osd_ldiskfs]
      [35955.577715]  [<ffffffffa0a9ecef>] ofd_trans_stop+0x1f/0x60 [ofd]
      [35955.577982]  [<ffffffffa0aa10c1>] ofd_object_destroy+0x2d1/0x8e0 [ofd]
      [35955.578259]  [<ffffffffa0a9b15d>] ofd_destroy_by_fid+0x35d/0x620 [ofd]
      [35955.578604]  [<ffffffffa16d88d0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
      [35955.578910]  [<ffffffffa16da210>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc]
      [35955.579200]  [<ffffffffa0a949da>] ofd_destroy_hdl+0x2fa/0xb60 [ofd]
      [35955.579570]  [<ffffffffa176bbfe>] tgt_request_handle+0xa2e/0x1230 [ptlrpc]
      [35955.579883]  [<ffffffffa1718d64>] ptlrpc_main+0xe94/0x19e0 [ptlrpc]
      [35955.580184]  [<ffffffffa1717ed0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      [35955.580524]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
      [35955.580801]  [<ffffffff8100c24a>] child_rip+0xa/0x20
      [35955.581051]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      [35955.581325]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      [35955.581636] 
      [35962.384528] Lustre: lustre-OST0000: Export ffff8800b45b67f0 already connecting from 0@lo
      [35962.385358] Lustre: Skipped 1 previous similar message
      [35977.385175] Lustre: lustre-OST0000: haven't heard from client lustre-MDT0000-mdtlov_UUID (at (no nid)) in 55 seconds. I think it's dead, and I am evicting it. exp ffff8800b45b67f0, cur 1435624532 expire 1435624502 last 1435624477
      [35984.376066] LNet: Service thread pid 31470 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [35984.377655] LNet: Skipped 3 previous similar messages
      [35984.378107] Pid: 31470, comm: ll_ost00_007
      [35984.378547] 
      [35984.378548] Call Trace:
      [35984.379309]  [<ffffffff8104e658>] ? __change_page_attr_set_clr+0x808/0xcc0
      [35984.379829]  [<ffffffff8109d5fe>] ? prepare_to_wait+0x4e/0x80
      [35984.380314]  [<ffffffffa037a0f1>] start_this_handle+0x291/0x4b0 [jbd2]
      [35984.380888]  [<ffffffffa037a4b1>] ? jbd2_journal_start+0x81/0x100 [jbd2]
      [35984.381462]  [<ffffffff8109d2d0>] ? autoremove_wake_function+0x0/0x40
      [35984.381898]  [<ffffffffa037a4e5>] jbd2_journal_start+0xb5/0x100 [jbd2]
      [35984.382408]  [<ffffffffa0722296>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
      [35984.383116]  [<ffffffffa0771d0f>] osd_trans_start+0x1df/0x660 [osd_ldiskfs]
      [35984.383603]  [<ffffffffa175fabd>] tgt_client_data_update+0x29d/0x680 [ptlrpc]
      [35984.384208]  [<ffffffffa1760bcc>] tgt_client_new+0x41c/0x600 [ptlrpc]
      [35984.384660]  [<ffffffffa0a97ec3>] ofd_obd_connect+0x363/0x400 [ofd]
      [35984.385225]  [<ffffffffa16c7d84>] target_handle_connect+0xe94/0x2d60 [ptlrpc]
      [35984.385699]  [<ffffffff8152245e>] ? _spin_unlock+0xe/0x10
      [35984.386116]  [<ffffffffa0d4e34f>] ? cfs_trace_unlock_tcd+0x3f/0xa0 [libcfs]
      [35984.386554]  [<ffffffff812918f0>] ? string+0x40/0x100
      [35984.387012]  [<ffffffffa176b762>] tgt_request_handle+0x592/0x1230 [ptlrpc]
      [35984.387494]  [<ffffffffa1718d64>] ptlrpc_main+0xe94/0x19e0 [ptlrpc]
      [35984.387977]  [<ffffffffa1717ed0>] ? ptlrpc_main+0x0/0x19e0 [ptlrpc]
      [35984.388423]  [<ffffffff8109ce4e>] kthread+0x9e/0xc0
      [35984.388862]  [<ffffffff8100c24a>] child_rip+0xa/0x20
      [35984.389257]  [<ffffffff8109cdb0>] ? kthread+0x0/0xc0
      [35984.389692]  [<ffffffff8100c240>] ? child_rip+0x0/0x20
      [35984.390097] 
      [35984.390408] LustreError: dumping log to /tmp/lustre-log.1435624539.31470
      

      This is is basically current master with a couple of patches that don't seem related (tag in my source tree master-20150629)

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: