Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10572

Hang in conf-sanity test 69 on zfs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      I introduced conf-sanity testing with hang-detection in my testing recently and immediately hit a problem in conf-sanity 69 with zfs.

      Console output looks like this:

      [ 8960.194121] Lustre: DEBUG MARKER: == conf-sanity test 69: replace an OST with the same index =========================================== 13:23:05 (1517163785)
      [ 8961.087061] Lustre: Lustre: Build Version: 2.10.57_58_gb6ec609
      [ 8961.516996] LNet: Added LNI 192.168.123.104@tcp [8/512/0/180] 
      [ 8961.518530] LNet: Accept all, port 988
      [ 8963.178685] Lustre: 7186:0:(gss_svc_upcall.c:1185:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active
      [ 8963.181215] Lustre: 7186:0:(gss_mech_switch.c:71:lgss_mech_register()) Register gssnull mechanism
      [ 8963.182155] Key type lgssc registered
      [ 8964.538931] Lustre: Echo OBD driver; http://www.lustre.org/
      [ 8970.072662] Lustre: MGS: Connection restored to MGC192.168.123.104@tcp_0 (at 0@lo)
      [ 8971.975498] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
      [ 8975.966431] Lustre: lustre-MDT0000: Connection restored to 192.168.123.104@tcp (at 0@lo)
      [ 8976.096272] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 60-180
      [ 8977.021749] Lustre: lustre-OST0000: Connection restored to 192.168.123.104@tcp (at 0@lo)
      [ 8977.023981] Lustre: Skipped 1 previous similar message
      [ 8977.042514] Lustre: lustre-OST0000: deleting orphan objects from 0x0:3 to 0x0:33
      [ 8977.556824] Lustre: Mounted lustre-client
      [ 8979.082258] Lustre: DEBUG MARKER: On OST0, 2176 inodes available. Want 99936.
      [ 9329.642647] LNet: Service thread pid 18129 was inactive for 40.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [ 9329.645105] Pid: 18129, comm: mdt03_003
      [ 9329.645761] 
      Call Trace:
      [ 9329.648546]  [<ffffffff81704319>] schedule+0x29/0x70
      [ 9329.649263]  [<ffffffff81700952>] schedule_timeout+0x162/0x2a0
      [ 9329.650032]  [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs]
      [ 9329.651617]  [<ffffffff810879d0>] ? process_timeout+0x0/0x10
      [ 9329.652397]  [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
      [ 9329.653319]  [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp]
      [ 9329.654066]  [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20
      [ 9329.654770]  [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp]
      [ 9329.655491]  [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod]
      [ 9329.656237]  [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
      [ 9329.668533]  [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod]
      [ 9329.669452]  [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod]
      [ 9329.670180]  [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod]
      [ 9329.671284]  [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod]
      [ 9329.672301]  [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod]
      [ 9329.672874]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 9329.673435]  [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
      [ 9329.674527]  [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd]
      [ 9329.675323]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [ 9329.676039]  [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd]
      [ 9329.676746]  [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt]
      [ 9329.677500]  [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt]
      [ 9329.679262]  [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt]
      [ 9329.681088]  [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt]
      [ 9329.682780]  [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [ 9329.684883]  [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [ 9329.686409]  [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt]
      [ 9329.688190]  [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc]
      [ 9329.693351]  [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc]
      [ 9329.694593]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [ 9329.696037]  [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [ 9329.697482]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [ 9329.698628]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [ 9329.700755]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [ 9329.702545]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [ 9329.704139]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [ 9329.705436]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [ 9329.706325]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [ 9329.707374]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [ 9329.708383] 
      [ 9329.709145] LustreError: dumping log to /tmp/lustre-log.1517164156.18129
      [ 9331.070281] LNet: Service thread pid 18129 completed after 41.49s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [10031.724772] LNet: Service thread pid 29224 was inactive for 40.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [10031.727847] Pid: 29224, comm: ll_ost00_007
      [10031.736752] 
      Call Trace:
      [10031.738429]  [<ffffffff81704319>] schedule+0x29/0x70
      [10031.739283]  [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl]
      [10031.740453]  [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40
      [10031.740960]  [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl]
      [10031.741456]  [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs]
      [10031.741957]  [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs]
      [10031.742442]  [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs]
      [10031.742969]  [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs]
      [10031.743500]  [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd]
      [10031.743995]  [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd]
      [10031.744519]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [10031.808155]  [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [10031.809367]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [10031.810419]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [10031.811779]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [10031.812493]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [10031.813192]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [10031.813831]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10031.814592]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [10031.815264]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10031.815936] 
      [10031.816501] LustreError: dumping log to /tmp/lustre-log.1517164858.29224
      [10043.338621] LNet: Service thread pid 29224 completed after 51.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [10281.450667] LNet: Service thread pid 8784 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [10281.453783] Pid: 8784, comm: ll_ost06_001
      [10281.455676] 
      Call Trace:
      [10281.456876]  [<ffffffff81704319>] schedule+0x29/0x70
      [10281.470252]  [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl]
      [10281.471091]  [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40
      [10281.471942]  [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl]
      [10281.472837]  [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs]
      [10281.473563]  [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs]
      [10281.483714]  [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs]
      [10281.484251]  [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs]
      [10281.485140]  [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd]
      [10281.485926]  [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd]
      [10281.486792]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [10281.496076]  [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [10281.497057]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [10281.497794]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [10281.499265]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [10281.512041]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [10281.512751]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [10281.513402]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10281.514076]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [10281.514786]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10281.515472] 
      [10281.516186] LustreError: dumping log to /tmp/lustre-log.1517165107.8784
      [10293.208563] LNet: Service thread pid 8784 completed after 51.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [10656.875336] LNet: Service thread pid 687 was inactive for 40.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [10656.878646] Pid: 687, comm: ll_ost01_004
      [10656.879276] 
      Call Trace:
      [10656.880437]  [<ffffffff81704319>] schedule+0x29/0x70
      [10656.883153]  [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl]
      [10656.883758]  [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40
      [10656.884387]  [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl]
      [10656.884938]  [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs]
      [10656.885448]  [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs]
      [10656.885962]  [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs]
      [10656.886435]  [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs]
      [10656.886957]  [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd]
      [10656.887439]  [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd]
      [10656.888005]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [10656.888573]  [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [10656.889143]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [10656.910113]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [10656.911377]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [10656.912038]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [10656.912543]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [10656.913030]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10656.913633]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [10656.914298]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10656.914770] 
      [10656.915235] LustreError: dumping log to /tmp/lustre-log.1517165483.687
      [10664.977508] LNet: Service thread pid 687 completed after 48.21s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [10817.394032] LNet: Service thread pid 3812 was inactive for 40.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [10817.396885] Pid: 3812, comm: ll_ost05_007
      [10817.397532] 
      Call Trace:
      [10817.407582]  [<ffffffff81704319>] schedule+0x29/0x70
      [10817.414462]  [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl]
      [10817.415219]  [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40
      [10817.415935]  [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl]
      [10817.416663]  [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs]
      [10817.417376]  [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs]
      [10817.434514]  [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs]
      [10817.435237]  [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs]
      [10817.435958]  [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd]
      [10817.436663]  [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd]
      [10817.437367]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [10817.453458]  [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [10817.459190]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [10817.460135]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [10817.461755]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [10817.462857]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [10817.463724]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [10817.464563]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10817.465320]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [10817.474397]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10817.475266] 
      [10817.475867] LustreError: dumping log to /tmp/lustre-log.1517165643.3812
      [10828.195354] LNet: Service thread pid 3812 completed after 50.85s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [10875.628351] LNet: Service thread pid 7993 was inactive for 40.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [10875.631117] Pid: 7993, comm: mdt06_002
      [10875.631734] 
      Call Trace:
      [10875.633730]  [<ffffffff81704319>] schedule+0x29/0x70
      [10875.634417]  [<ffffffff81700952>] schedule_timeout+0x162/0x2a0
      [10875.636968]  [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs]
      [10875.637682]  [<ffffffff810879d0>] ? process_timeout+0x0/0x10
      [10875.638380]  [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
      [10875.639103]  [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp]
      [10875.639799]  [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20
      [10875.648290]  [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp]
      [10875.649029]  [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod]
      [10875.649755]  [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
      [10875.650483]  [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod]
      [10875.651199]  [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod]
      [10875.651900]  [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod]
      [10875.653563]  [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod]
      [10875.654291]  [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod]
      [10875.655017]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [10875.655885]  [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
      [10875.668080]  [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd]
      [10875.677779]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [10875.678567]  [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd]
      [10875.679110]  [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt]
      [10875.679778]  [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt]
      [10875.680363]  [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt]
      [10875.680878]  [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt]
      [10875.682050]  [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [10875.682629]  [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [10875.683321]  [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt]
      [10875.683850]  [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc]
      [10875.684545]  [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc]
      [10875.685128]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [10875.685825]  [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [10875.686373]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [10875.686909]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [10875.688364]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [10875.688910]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [10875.689628]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [10875.690224]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10875.690698]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [10875.691349]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [10875.691818] 
      [10875.692431] LustreError: dumping log to /tmp/lustre-log.1517165701.7993
      [10875.723487] LNet: Service thread pid 7993 completed after 40.13s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [11386.093776] LNet: Service thread pid 7976 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [11386.096569] Pid: 7976, comm: mdt01_000
      [11386.099291] 
      Call Trace:
      [11386.100643]  [<ffffffff81704319>] schedule+0x29/0x70
      [11386.101324]  [<ffffffff81700952>] schedule_timeout+0x162/0x2a0
      [11386.102030]  [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs]
      [11386.102855]  [<ffffffff810879d0>] ? process_timeout+0x0/0x10
      [11386.103616]  [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
      [11386.104349]  [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp]
      [11386.105054]  [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20
      [11386.105869]  [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp]
      [11386.106720]  [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod]
      [11386.108825]  [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
      [11386.163398]  [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod]
      [11386.164178]  [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod]
      [11386.166825]  [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod]
      [11386.168090]  [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod]
      [11386.168794]  [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod]
      [11386.178636]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11386.180684]  [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
      [11386.197534]  [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd]
      [11386.202676]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [11386.203980]  [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd]
      [11386.213411]  [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt]
      [11386.218277]  [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt]
      [11386.228289]  [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt]
      [11386.245324]  [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt]
      [11386.298037]  [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [11386.298932]  [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [11386.300428]  [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt]
      [11386.312540]  [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc]
      [11386.313331]  [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc]
      [11386.314102]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11386.314873]  [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [11386.371225]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [11386.372055]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [11386.373355]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [11386.392312]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [11386.393037]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [11386.393686]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [11386.394358]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [11386.395042]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [11386.395735] 
      [11386.396429] LustreError: dumping log to /tmp/lustre-log.1517166212.7976
      [11391.441778] LNet: Service thread pid 7976 completed after 45.44s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [11646.381483] LNet: Service thread pid 27607 was inactive for 40.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [11646.406010] Pid: 27607, comm: ll_ost04_009
      [11646.406793] 
      Call Trace:
      [11646.408187]  [<ffffffff81704319>] schedule+0x29/0x70
      [11646.427239]  [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl]
      [11646.427962]  [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40
      [11646.428667]  [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl]
      [11646.429380]  [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs]
      [11646.483045]  [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs]
      [11646.483791]  [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs]
      [11646.487195]  [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs]
      [11646.488017]  [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd]
      [11646.488726]  [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd]
      [11646.489443]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [11646.490213]  [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [11646.490978]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [11646.491713]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [11646.585560]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [11646.587445]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [11646.588380]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [11646.589045]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [11646.589502]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [11646.589984]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [11646.590438] 
      [11646.590834] LustreError: dumping log to /tmp/lustre-log.1517166472.27607
      [11650.848143] Pid: 7986, comm: mdt04_001
      [11650.911957] 
      Call Trace:
      [11650.915395]  [<ffffffff81704319>] schedule+0x29/0x70
      [11650.915855]  [<ffffffff81700952>] schedule_timeout+0x162/0x2a0
      [11650.916386]  [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs]
      [11650.917146]  [<ffffffff810879d0>] ? process_timeout+0x0/0x10
      [11650.917808]  [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
      [11650.919882]  [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp]
      [11650.920577]  [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20
      [11650.927677]  [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp]
      [11650.928576]  [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod]
      [11650.929423]  [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
      [11650.930299]  [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod]
      [11650.931383]  [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod]
      [11650.932126]  [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod]
      [11650.933401]  [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod]
      [11650.942676]  [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod]
      [11650.943445]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11650.944187]  [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd]
      [11650.945472]  [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd]
      [11650.955257]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [11650.955987]  [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd]
      [11650.956711]  [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt]
      [11650.957500]  [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt]
      [11650.958246]  [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt]
      [11650.958967]  [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt]
      [11650.959667]  [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt]
      [11650.960434]  [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [11650.961232]  [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt]
      [11650.961996]  [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc]
      [11650.962893]  [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc]
      [11650.963881]  [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
      [11650.964779]  [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [11650.965676]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [11650.967478]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [11650.968910]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [11650.969707]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [11650.970237]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [11650.982955]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [11650.983796]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [11651.002062]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [11651.002815] 
      [11651.003438] LustreError: dumping log to /tmp/lustre-log.1517166477.7986
      [11657.848212] LNet: Service thread pid 7986 completed after 47.10s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [12003.206088] LNet: Service thread pid 29404 was inactive for 40.16s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [12003.248977] LNet: Skipped 1 previous similar message
      [12003.249773] Pid: 29404, comm: ll_ost00_008
      [12003.250417] 
      Call Trace:
      [12003.251575]  [<ffffffff81704319>] schedule+0x29/0x70
      [12003.252248]  [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl]
      [12003.266969]  [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40
      [12003.267718]  [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl]
      [12003.268463]  [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs]
      [12003.270220]  [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs]
      [12003.270931]  [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs]
      [12003.271637]  [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs]
      [12003.278620]  [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd]
      [12003.279417]  [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd]
      [12003.288353]  [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs]
      [12003.289170]  [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc]
      [12003.289937]  [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc]
      [12003.290673]  [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc]
      [12003.298125]  [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc]
      [12003.298851]  [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc]
      [12003.299707]  [<ffffffff810a2eba>] kthread+0xea/0xf0
      [12003.309644]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [12003.310163]  [<ffffffff8170fb98>] ret_from_fork+0x58/0x90
      [12003.310627]  [<ffffffff810a2dd0>] ? kthread+0x0/0xf0
      [12003.311173] 
      [12003.311602] LustreError: dumping log to /tmp/lustre-log.1517166829.29404
      [12008.410465] LNet: Service thread pid 29404 completed after 45.36s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [12008.414754] LNet: Skipped 1 previous similar message
      [12037.466656] SysRq : Trigger a crash
      

      As far as I can tell ldiskfs is unaffected.

      I checked in maloo and it passes there in like 600 seconds, so it's not a regular zfs slowness or anything of the sort.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: