Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
I introduced conf-sanity testing with hang-detection in my testing recently and immediately hit a problem in conf-sanity 69 with zfs.
Console output looks like this:
[ 8960.194121] Lustre: DEBUG MARKER: == conf-sanity test 69: replace an OST with the same index =========================================== 13:23:05 (1517163785) [ 8961.087061] Lustre: Lustre: Build Version: 2.10.57_58_gb6ec609 [ 8961.516996] LNet: Added LNI 192.168.123.104@tcp [8/512/0/180] [ 8961.518530] LNet: Accept all, port 988 [ 8963.178685] Lustre: 7186:0:(gss_svc_upcall.c:1185:gss_init_svc_upcall()) Init channel is not opened by lsvcgssd, following request might be dropped until lsvcgssd is active [ 8963.181215] Lustre: 7186:0:(gss_mech_switch.c:71:lgss_mech_register()) Register gssnull mechanism [ 8963.182155] Key type lgssc registered [ 8964.538931] Lustre: Echo OBD driver; http://www.lustre.org/ [ 8970.072662] Lustre: MGS: Connection restored to MGC192.168.123.104@tcp_0 (at 0@lo) [ 8971.975498] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180 [ 8975.966431] Lustre: lustre-MDT0000: Connection restored to 192.168.123.104@tcp (at 0@lo) [ 8976.096272] Lustre: lustre-OST0000: Imperative Recovery not enabled, recovery window 60-180 [ 8977.021749] Lustre: lustre-OST0000: Connection restored to 192.168.123.104@tcp (at 0@lo) [ 8977.023981] Lustre: Skipped 1 previous similar message [ 8977.042514] Lustre: lustre-OST0000: deleting orphan objects from 0x0:3 to 0x0:33 [ 8977.556824] Lustre: Mounted lustre-client [ 8979.082258] Lustre: DEBUG MARKER: On OST0, 2176 inodes available. Want 99936. [ 9329.642647] LNet: Service thread pid 18129 was inactive for 40.06s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [ 9329.645105] Pid: 18129, comm: mdt03_003 [ 9329.645761] Call Trace: [ 9329.648546] [<ffffffff81704319>] schedule+0x29/0x70 [ 9329.649263] [<ffffffff81700952>] schedule_timeout+0x162/0x2a0 [ 9329.650032] [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs] [ 9329.651617] [<ffffffff810879d0>] ? process_timeout+0x0/0x10 [ 9329.652397] [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] [ 9329.653319] [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp] [ 9329.654066] [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20 [ 9329.654770] [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp] [ 9329.655491] [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod] [ 9329.656237] [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] [ 9329.668533] [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod] [ 9329.669452] [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod] [ 9329.670180] [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod] [ 9329.671284] [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod] [ 9329.672301] [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod] [ 9329.672874] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 9329.673435] [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] [ 9329.674527] [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd] [ 9329.675323] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [ 9329.676039] [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd] [ 9329.676746] [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt] [ 9329.677500] [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt] [ 9329.679262] [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt] [ 9329.681088] [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt] [ 9329.682780] [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt] [ 9329.684883] [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [ 9329.686409] [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt] [ 9329.688190] [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc] [ 9329.693351] [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc] [ 9329.694593] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 9329.696037] [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc] [ 9329.697482] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [ 9329.698628] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [ 9329.700755] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [ 9329.702545] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [ 9329.704139] [<ffffffff810a2eba>] kthread+0xea/0xf0 [ 9329.705436] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [ 9329.706325] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [ 9329.707374] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [ 9329.708383] [ 9329.709145] LustreError: dumping log to /tmp/lustre-log.1517164156.18129 [ 9331.070281] LNet: Service thread pid 18129 completed after 41.49s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [10031.724772] LNet: Service thread pid 29224 was inactive for 40.02s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [10031.727847] Pid: 29224, comm: ll_ost00_007 [10031.736752] Call Trace: [10031.738429] [<ffffffff81704319>] schedule+0x29/0x70 [10031.739283] [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl] [10031.740453] [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40 [10031.740960] [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl] [10031.741456] [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs] [10031.741957] [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs] [10031.742442] [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs] [10031.742969] [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs] [10031.743500] [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd] [10031.743995] [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd] [10031.744519] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [10031.808155] [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [10031.809367] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [10031.810419] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [10031.811779] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [10031.812493] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [10031.813192] [<ffffffff810a2eba>] kthread+0xea/0xf0 [10031.813831] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10031.814592] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [10031.815264] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10031.815936] [10031.816501] LustreError: dumping log to /tmp/lustre-log.1517164858.29224 [10043.338621] LNet: Service thread pid 29224 completed after 51.63s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [10281.450667] LNet: Service thread pid 8784 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [10281.453783] Pid: 8784, comm: ll_ost06_001 [10281.455676] Call Trace: [10281.456876] [<ffffffff81704319>] schedule+0x29/0x70 [10281.470252] [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl] [10281.471091] [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40 [10281.471942] [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl] [10281.472837] [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs] [10281.473563] [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs] [10281.483714] [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs] [10281.484251] [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs] [10281.485140] [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd] [10281.485926] [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd] [10281.486792] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [10281.496076] [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [10281.497057] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [10281.497794] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [10281.499265] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [10281.512041] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [10281.512751] [<ffffffff810a2eba>] kthread+0xea/0xf0 [10281.513402] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10281.514076] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [10281.514786] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10281.515472] [10281.516186] LustreError: dumping log to /tmp/lustre-log.1517165107.8784 [10293.208563] LNet: Service thread pid 8784 completed after 51.86s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [10656.875336] LNet: Service thread pid 687 was inactive for 40.11s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [10656.878646] Pid: 687, comm: ll_ost01_004 [10656.879276] Call Trace: [10656.880437] [<ffffffff81704319>] schedule+0x29/0x70 [10656.883153] [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl] [10656.883758] [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40 [10656.884387] [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl] [10656.884938] [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs] [10656.885448] [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs] [10656.885962] [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs] [10656.886435] [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs] [10656.886957] [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd] [10656.887439] [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd] [10656.888005] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [10656.888573] [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [10656.889143] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [10656.910113] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [10656.911377] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [10656.912038] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [10656.912543] [<ffffffff810a2eba>] kthread+0xea/0xf0 [10656.913030] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10656.913633] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [10656.914298] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10656.914770] [10656.915235] LustreError: dumping log to /tmp/lustre-log.1517165483.687 [10664.977508] LNet: Service thread pid 687 completed after 48.21s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [10817.394032] LNet: Service thread pid 3812 was inactive for 40.05s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [10817.396885] Pid: 3812, comm: ll_ost05_007 [10817.397532] Call Trace: [10817.407582] [<ffffffff81704319>] schedule+0x29/0x70 [10817.414462] [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl] [10817.415219] [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40 [10817.415935] [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl] [10817.416663] [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs] [10817.417376] [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs] [10817.434514] [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs] [10817.435237] [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs] [10817.435958] [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd] [10817.436663] [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd] [10817.437367] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [10817.453458] [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [10817.459190] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [10817.460135] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [10817.461755] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [10817.462857] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [10817.463724] [<ffffffff810a2eba>] kthread+0xea/0xf0 [10817.464563] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10817.465320] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [10817.474397] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10817.475266] [10817.475867] LustreError: dumping log to /tmp/lustre-log.1517165643.3812 [10828.195354] LNet: Service thread pid 3812 completed after 50.85s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [10875.628351] LNet: Service thread pid 7993 was inactive for 40.03s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [10875.631117] Pid: 7993, comm: mdt06_002 [10875.631734] Call Trace: [10875.633730] [<ffffffff81704319>] schedule+0x29/0x70 [10875.634417] [<ffffffff81700952>] schedule_timeout+0x162/0x2a0 [10875.636968] [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs] [10875.637682] [<ffffffff810879d0>] ? process_timeout+0x0/0x10 [10875.638380] [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] [10875.639103] [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp] [10875.639799] [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20 [10875.648290] [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp] [10875.649029] [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod] [10875.649755] [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] [10875.650483] [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod] [10875.651199] [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod] [10875.651900] [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod] [10875.653563] [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod] [10875.654291] [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod] [10875.655017] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [10875.655885] [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] [10875.668080] [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd] [10875.677779] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [10875.678567] [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd] [10875.679110] [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt] [10875.679778] [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt] [10875.680363] [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt] [10875.680878] [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt] [10875.682050] [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt] [10875.682629] [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [10875.683321] [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt] [10875.683850] [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc] [10875.684545] [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc] [10875.685128] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [10875.685825] [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc] [10875.686373] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [10875.686909] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [10875.688364] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [10875.688910] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [10875.689628] [<ffffffff810a2eba>] kthread+0xea/0xf0 [10875.690224] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10875.690698] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [10875.691349] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [10875.691818] [10875.692431] LustreError: dumping log to /tmp/lustre-log.1517165701.7993 [10875.723487] LNet: Service thread pid 7993 completed after 40.13s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [11386.093776] LNet: Service thread pid 7976 was inactive for 40.10s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [11386.096569] Pid: 7976, comm: mdt01_000 [11386.099291] Call Trace: [11386.100643] [<ffffffff81704319>] schedule+0x29/0x70 [11386.101324] [<ffffffff81700952>] schedule_timeout+0x162/0x2a0 [11386.102030] [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs] [11386.102855] [<ffffffff810879d0>] ? process_timeout+0x0/0x10 [11386.103616] [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] [11386.104349] [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp] [11386.105054] [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20 [11386.105869] [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp] [11386.106720] [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod] [11386.108825] [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] [11386.163398] [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod] [11386.164178] [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod] [11386.166825] [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod] [11386.168090] [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod] [11386.168794] [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod] [11386.178636] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [11386.180684] [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] [11386.197534] [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd] [11386.202676] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [11386.203980] [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd] [11386.213411] [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt] [11386.218277] [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt] [11386.228289] [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt] [11386.245324] [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt] [11386.298037] [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt] [11386.298932] [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [11386.300428] [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt] [11386.312540] [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc] [11386.313331] [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc] [11386.314102] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [11386.314873] [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc] [11386.371225] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [11386.372055] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [11386.373355] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [11386.392312] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [11386.393037] [<ffffffff810a2eba>] kthread+0xea/0xf0 [11386.393686] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [11386.394358] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [11386.395042] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [11386.395735] [11386.396429] LustreError: dumping log to /tmp/lustre-log.1517166212.7976 [11391.441778] LNet: Service thread pid 7976 completed after 45.44s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [11646.381483] LNet: Service thread pid 27607 was inactive for 40.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [11646.406010] Pid: 27607, comm: ll_ost04_009 [11646.406793] Call Trace: [11646.408187] [<ffffffff81704319>] schedule+0x29/0x70 [11646.427239] [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl] [11646.427962] [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40 [11646.428667] [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl] [11646.429380] [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs] [11646.483045] [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs] [11646.483791] [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs] [11646.487195] [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs] [11646.488017] [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd] [11646.488726] [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd] [11646.489443] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [11646.490213] [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [11646.490978] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [11646.491713] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [11646.585560] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [11646.587445] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [11646.588380] [<ffffffff810a2eba>] kthread+0xea/0xf0 [11646.589045] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [11646.589502] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [11646.589984] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [11646.590438] [11646.590834] LustreError: dumping log to /tmp/lustre-log.1517166472.27607 [11650.848143] Pid: 7986, comm: mdt04_001 [11650.911957] Call Trace: [11650.915395] [<ffffffff81704319>] schedule+0x29/0x70 [11650.915855] [<ffffffff81700952>] schedule_timeout+0x162/0x2a0 [11650.916386] [<ffffffffa0227ea5>] ? cfs_block_sigsinv+0x45/0xa0 [libcfs] [11650.917146] [<ffffffff810879d0>] ? process_timeout+0x0/0x10 [11650.917808] [<ffffffffa0227ed1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs] [11650.919882] [<ffffffffa13be2c8>] osp_precreate_reserve+0x318/0x830 [osp] [11650.920577] [<ffffffff810b7cc0>] ? default_wake_function+0x0/0x20 [11650.927677] [<ffffffffa13b3b53>] osp_declare_create+0x193/0x590 [osp] [11650.928576] [<ffffffffa1367ab2>] lod_sub_declare_create+0xe2/0x210 [lod] [11650.929423] [<ffffffffa1360dce>] lod_qos_declare_object_on+0xbe/0x3a0 [lod] [11650.930299] [<ffffffffa136688d>] lod_qos_prep_create+0x10dd/0x1800 [lod] [11650.931383] [<ffffffffa13674e5>] lod_prepare_create+0x255/0x350 [lod] [11650.932126] [<ffffffffa13588de>] lod_declare_striped_create+0x1ee/0x980 [lod] [11650.933401] [<ffffffffa1367ab2>] ? lod_sub_declare_create+0xe2/0x210 [lod] [11650.942676] [<ffffffffa135d074>] lod_declare_create+0x204/0x5a0 [lod] [11650.943445] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [11650.944187] [<ffffffffa12300ff>] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] [11650.945472] [<ffffffffa1220953>] mdd_declare_create+0x53/0xe30 [mdd] [11650.955257] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [11650.955987] [<ffffffffa1224c69>] mdd_create+0x859/0x1410 [mdd] [11650.956711] [<ffffffffa12a75f4>] mdt_reint_open+0x2144/0x3190 [mdt] [11650.957500] [<ffffffffa129b900>] mdt_reint_rec+0x80/0x210 [mdt] [11650.958246] [<ffffffffa127b5cb>] mdt_reint_internal+0x5fb/0x990 [mdt] [11650.958967] [<ffffffffa1287927>] mdt_intent_reint+0x157/0x420 [mdt] [11650.959667] [<ffffffffa127e6a2>] mdt_intent_opc+0x442/0xad0 [mdt] [11650.960434] [<ffffffffa05e2490>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc] [11650.961232] [<ffffffffa1286116>] mdt_intent_policy+0x1a6/0x360 [mdt] [11650.961996] [<ffffffffa059209a>] ldlm_lock_enqueue+0x35a/0x8d0 [ptlrpc] [11650.962893] [<ffffffffa05ba543>] ldlm_handle_enqueue0+0x913/0x13f0 [ptlrpc] [11650.963881] [<ffffffffa0232f47>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [11650.964779] [<ffffffffa063ec32>] tgt_enqueue+0x62/0x210 [ptlrpc] [11650.965676] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [11650.967478] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [11650.968910] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [11650.969707] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [11650.970237] [<ffffffff810a2eba>] kthread+0xea/0xf0 [11650.982955] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [11650.983796] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [11651.002062] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [11651.002815] [11651.003438] LustreError: dumping log to /tmp/lustre-log.1517166477.7986 [11657.848212] LNet: Service thread pid 7986 completed after 47.10s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [12003.206088] LNet: Service thread pid 29404 was inactive for 40.16s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [12003.248977] LNet: Skipped 1 previous similar message [12003.249773] Pid: 29404, comm: ll_ost00_008 [12003.250417] Call Trace: [12003.251575] [<ffffffff81704319>] schedule+0x29/0x70 [12003.252248] [<ffffffffa09a4fad>] cv_wait_common+0x17d/0x240 [spl] [12003.266969] [<ffffffff810a4070>] ? autoremove_wake_function+0x0/0x40 [12003.267718] [<ffffffffa09a5085>] __cv_wait+0x15/0x20 [spl] [12003.268463] [<ffffffffa0b5d293>] txg_wait_open+0xe3/0x1a0 [zfs] [12003.270220] [<ffffffffa0b065d2>] dmu_tx_wait+0x5c2/0x5d0 [zfs] [12003.270931] [<ffffffffa0b06675>] dmu_tx_assign+0x95/0x5a0 [zfs] [12003.271637] [<ffffffffa09bad87>] osd_trans_start+0xa7/0x3c0 [osd_zfs] [12003.278620] [<ffffffffa1415596>] ofd_precreate_objects+0xa46/0x1aa0 [ofd] [12003.279417] [<ffffffffa1408cda>] ofd_create_hdl+0x47a/0x20b0 [ofd] [12003.288353] [<ffffffffa022f734>] ? libcfs_log_return+0x24/0x30 [libcfs] [12003.289170] [<ffffffffa05e07a0>] ? lustre_pack_reply_v2+0x1a0/0x2a0 [ptlrpc] [12003.289937] [<ffffffffa0646e5b>] tgt_request_handle+0x93b/0x13e0 [ptlrpc] [12003.290673] [<ffffffffa05ebc71>] ptlrpc_server_handle_request+0x261/0xaf0 [ptlrpc] [12003.298125] [<ffffffffa05efa28>] ptlrpc_main+0xa58/0x1df0 [ptlrpc] [12003.298851] [<ffffffffa05eefd0>] ? ptlrpc_main+0x0/0x1df0 [ptlrpc] [12003.299707] [<ffffffff810a2eba>] kthread+0xea/0xf0 [12003.309644] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [12003.310163] [<ffffffff8170fb98>] ret_from_fork+0x58/0x90 [12003.310627] [<ffffffff810a2dd0>] ? kthread+0x0/0xf0 [12003.311173] [12003.311602] LustreError: dumping log to /tmp/lustre-log.1517166829.29404 [12008.410465] LNet: Service thread pid 29404 completed after 45.36s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). [12008.414754] LNet: Skipped 1 previous similar message [12037.466656] SysRq : Trigger a crash
As far as I can tell ldiskfs is unaffected.
I checked in maloo and it passes there in like 600 seconds, so it's not a regular zfs slowness or anything of the sort.
Attachments
Issue Links
- is related to
-
LU-10250 replay-single test_74: hang and timed out
- Open