Mar 21 20:53:33 localhost kernel: [192552.341237] Lustre: iliad-OST0017-osc-MDT0000: Connection to iliad-OST0017 (at 172.16.25.11@o2ib) was lost; in progress operations using this service will wait for recovery to complete Mar 21 20:53:35 localhost kernel: [192553.451263] Lustre: Found index 25 for iliad-OST0019, updating log Mar 21 20:53:35 localhost kernel: [192553.473394] Lustre: iliad-MDT0000: Connection restored to 172.16.25.11@o2ib (at 172.16.25.11@o2ib) Mar 21 20:53:41 localhost kernel: [192560.350206] LNet: 4834:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 172.16.24.220@o2ib: 0 seconds Mar 21 20:53:41 localhost kernel: [192560.350213] LNet: 4834:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 9 previous similar messages Mar 21 20:53:46 localhost kernel: [192565.309370] Lustre: iliad-OST0017-osc-MDT0000: Connection restored to 172.16.25.11@o2ib (at 172.16.25.11@o2ib) Mar 21 20:53:57 localhost dbus[2396]: [system] Activating service name='org.freedesktop.problems' (using servicehelper) Mar 21 20:53:57 localhost dbus[2396]: [system] Successfully activated service 'org.freedesktop.problems' Mar 21 20:54:06 localhost kernel: [192584.518333] Lustre: iliad-OST0019-osc-MDT0000: Connection to iliad-OST0019 (at 172.16.25.11@o2ib) was lost; in progress operations using this service will wait for recovery to complete Mar 21 20:54:06 localhost kernel: [192584.518942] LustreError: 167-0: iliad-OST0019-osc-MDT0000: This client was evicted by iliad-OST0019; in progress operations using this service will fail. Mar 21 20:54:06 localhost kernel: [192584.534593] Lustre: iliad-OST0019-osc-MDT0000: Connection restored to 172.16.25.11@o2ib (at 172.16.25.11@o2ib) Mar 21 20:54:09 localhost kernel: [192587.941838] Lustre: iliad-OST0018-osc-MDT0000: Connection to iliad-OST0018 (at 172.16.25.11@o2ib) was lost; in progress operations using this service will wait for recovery to complete Mar 21 20:54:09 localhost kernel: [192587.942399] LustreError: 167-0: iliad-OST0018-osc-MDT0000: This client was evicted by iliad-OST0018; in progress operations using this service will fail. Mar 21 20:59:42 localhost kernel: [192921.178317] LNet: Service thread pid 4963 was inactive for 212.77s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Mar 21 20:59:42 localhost kernel: [192921.178323] Pid: 4963, comm: mdt01_001 Mar 21 20:59:42 localhost kernel: [192921.178325] Mar 21 20:59:42 localhost kernel: [192921.178325] Call Trace: Mar 21 20:59:42 localhost kernel: [192921.178337] [] schedule+0x29/0x70 Mar 21 20:59:42 localhost kernel: [192921.178341] [] schedule_timeout+0x174/0x2c0 Mar 21 20:59:42 localhost kernel: [192921.178348] [] ? process_timeout+0x0/0x10 Mar 21 20:59:42 localhost kernel: [192921.178365] [] osp_precreate_reserve+0x2e8/0x800 [osp] Mar 21 20:59:42 localhost kernel: [192921.178372] [] ? default_wake_function+0x0/0x20 Mar 21 20:59:42 localhost kernel: [192921.178382] [] osp_declare_create+0x193/0x590 [osp] Mar 21 20:59:42 localhost kernel: [192921.178432] [] ? lprocfs_counter_add+0xf9/0x160 [obdclass] Mar 21 20:59:42 localhost kernel: [192921.178453] [] lod_sub_declare_create+0xdc/0x210 [lod] Mar 21 20:59:42 localhost kernel: [192921.178468] [] lod_qos_declare_object_on+0xbe/0x3a0 [lod] Mar 21 20:59:42 localhost kernel: [192921.178483] [] lod_alloc_qos.constprop.17+0xea2/0x1590 [lod] Mar 21 20:59:42 localhost kernel: [192921.178487] [] ? wake_up_q+0x5b/0x80 Mar 21 20:59:42 localhost kernel: [192921.178501] [] lod_qos_prep_create+0x1291/0x17f0 [lod] Mar 21 20:59:42 localhost kernel: [192921.178518] [] ? qsd_op_begin+0xb0/0x4d0 [lquota] Mar 21 20:59:42 localhost kernel: [192921.178542] [] ? osd_scrub_check_update+0x5b1/0x10c0 [osd_ldiskfs] Mar 21 20:59:42 localhost kernel: [192921.178558] [] lod_prepare_create+0x298/0x3f0 [lod] Mar 21 20:59:42 localhost kernel: [192921.178568] [] ? osd_idc_find_and_init+0x7e/0x100 [osd_ldiskfs] Mar 21 20:59:42 localhost kernel: [192921.178582] [] lod_declare_striped_create+0x1ee/0x970 [lod] Mar 21 20:59:42 localhost kernel: [192921.178595] [] lod_declare_create+0x1e4/0x540 [lod] Mar 21 20:59:42 localhost kernel: [192921.178612] [] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] Mar 21 20:59:42 localhost kernel: [192921.178623] [] mdd_declare_create+0x53/0xe20 [mdd] Mar 21 20:59:42 localhost kernel: [192921.178634] [] mdd_create+0x7d9/0x1320 [mdd] Mar 21 20:59:42 localhost kernel: [192921.178660] [] mdt_reint_open+0x218c/0x31a0 [mdt] Mar 21 20:59:42 localhost kernel: [192921.178695] [] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass] Mar 21 20:59:42 localhost kernel: [192921.178712] [] ? ucred_set_jobid+0x53/0x70 [mdt] Mar 21 20:59:42 localhost kernel: [192921.178730] [] mdt_reint_rec+0x80/0x210 [mdt] Mar 21 20:59:42 localhost kernel: [192921.178745] [] mdt_reint_internal+0x5fb/0x9c0 [mdt] Mar 21 20:59:42 localhost kernel: [192921.178759] [] mdt_intent_reint+0x162/0x430 [mdt] Mar 21 20:59:42 localhost kernel: [192921.178774] [] mdt_intent_policy+0x43e/0xc70 [mdt] Mar 21 20:59:42 localhost kernel: [192921.178836] [] ? ldlm_resource_get+0x5e2/0xa30 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.178876] [] ldlm_lock_enqueue+0x387/0x970 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.178924] [] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.178981] [] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179052] [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179117] [] tgt_request_handle+0x925/0x1370 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179172] [] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179271] [] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179285] [] ? default_wake_function+0x12/0x20 Mar 21 20:59:42 localhost kernel: [192921.179299] [] ? __wake_up_common+0x58/0x90 Mar 21 20:59:42 localhost kernel: [192921.179358] [] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179411] [] ? ptlrpc_main+0x0/0x1e40 [ptlrpc] Mar 21 20:59:42 localhost kernel: [192921.179416] [] kthread+0xcf/0xe0 Mar 21 20:59:42 localhost kernel: [192921.179419] [] ? kthread+0x0/0xe0 Mar 21 20:59:42 localhost kernel: [192921.179425] [] ret_from_fork+0x58/0x90 Mar 21 20:59:42 localhost kernel: [192921.179428] [] ? kthread+0x0/0xe0 Mar 21 20:59:42 localhost kernel: [192921.179430] Mar 21 20:59:42 localhost kernel: [192921.179434] LustreError: dumping log to /tmp/lustre-log.1521665982.4963 Mar 21 21:00:20 localhost kernel: [192959.106052] INFO: task mdt01_006:5415 blocked for more than 120 seconds. Mar 21 21:00:20 localhost kernel: [192959.113884] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 21 21:00:20 localhost kernel: [192959.123191] mdt01_006 D ffff881032e82f70 0 5415 2 0x00000080 ************ Mar 21 21:00:20 localhost kernel: [192959.141454] Call Trace: Mar 21 21:00:20 localhost kernel: [192959.141472] [] ? cfs_hash_multi_bd_unlock+0x62/0x80 [libcfs] Mar 21 21:00:20 localhost kernel: [192959.141476] [] schedule+0x29/0x70 Mar 21 21:00:20 localhost kernel: [192959.141480] [] rwsem_down_write_failed+0x225/0x3a0 Mar 21 21:00:20 localhost kernel: [192959.141494] [] call_rwsem_down_write_failed+0x17/0x30 Mar 21 21:00:20 localhost kernel: [192959.141507] [] down_write+0x2d/0x3d Mar 21 21:00:20 localhost kernel: [192959.141523] [] lod_qos_prep_create+0xaa4/0x17f0 [lod] Mar 21 21:00:20 localhost kernel: [192959.141541] [] ? qsd_op_begin+0xb0/0x4d0 [lquota] Mar 21 21:00:20 localhost kernel: [192959.141571] [] ? osd_declare_qid+0x1f0/0x480 [osd_ldiskfs] Mar 21 21:00:20 localhost kernel: [192959.141592] [] lod_prepare_create+0x298/0x3f0 [lod] Mar 21 21:00:20 localhost kernel: [192959.141604] [] ? osd_idc_find_and_init+0x7e/0x100 [osd_ldiskfs] Mar 21 21:00:20 localhost kernel: [192959.141619] [] lod_declare_striped_create+0x1ee/0x970 [lod] Mar 21 21:00:20 localhost kernel: [192959.141637] [] lod_declare_create+0x1e4/0x540 [lod] Mar 21 21:00:20 localhost kernel: [192959.141652] [] mdd_declare_create_object_internal+0xdf/0x2f0 [mdd] Mar 21 21:00:20 localhost kernel: [192959.141666] [] mdd_declare_create+0x53/0xe20 [mdd] Mar 21 21:00:20 localhost kernel: [192959.141679] [] mdd_create+0x7d9/0x1320 [mdd] Mar 21 21:00:20 localhost kernel: [192959.141699] [] mdt_reint_open+0x218c/0x31a0 [mdt] Mar 21 21:00:20 localhost kernel: [192959.141738] [] ? upcall_cache_get_entry+0x20e/0x8f0 [obdclass] Mar 21 21:00:20 localhost kernel: [192959.141759] [] ? ucred_set_jobid+0x53/0x70 [mdt] Mar 21 21:00:20 localhost kernel: [192959.141778] [] mdt_reint_rec+0x80/0x210 [mdt] Mar 21 21:00:20 localhost kernel: [192959.141797] [] mdt_reint_internal+0x5fb/0x9c0 [mdt] Mar 21 21:00:20 localhost kernel: [192959.141823] [] mdt_intent_reint+0x162/0x430 [mdt] Mar 21 21:00:20 localhost kernel: [192959.141837] [] mdt_intent_policy+0x43e/0xc70 [mdt] Mar 21 21:00:20 localhost kernel: [192959.141873] [] ? ldlm_resource_get+0x9f/0xa30 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.141909] [] ldlm_lock_enqueue+0x387/0x970 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.141950] [] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142006] [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142069] [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142132] [] tgt_request_handle+0x925/0x1370 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142190] [] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142245] [] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142257] [] ? default_wake_function+0x12/0x20 Mar 21 21:00:20 localhost kernel: [192959.142261] [] ? __wake_up_common+0x58/0x90 Mar 21 21:00:20 localhost kernel: [192959.142305] [] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142354] [] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.142359] [] kthread+0xcf/0xe0 Mar 21 21:00:20 localhost kernel: [192959.142364] [] ? insert_kthread_work+0x40/0x40 Mar 21 21:00:20 localhost kernel: [192959.142368] [] ret_from_fork+0x58/0x90 Mar 21 21:00:20 localhost kernel: [192959.142371] [] ? insert_kthread_work+0x40/0x40 Mar 21 21:00:20 localhost kernel: [192959.142374] INFO: task mdt00_009:6345 blocked for more than 120 seconds. Mar 21 21:00:20 localhost kernel: [192959.150198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 21 21:00:20 localhost kernel: [192959.159504] mdt00_009 D ffff881034722f70 0 6345 2 0x00000080 Mar 21 21:00:20 localhost kernel: [192959.177797] [] ? ldlm_resource_get+0x5e2/0xa30 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.177841] [] ldlm_lock_enqueue+0x387/0x970 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.177890] [] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.177952] [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178003] [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178061] [] tgt_request_handle+0x925/0x1370 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178113] [] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178164] [] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178180] [] ? default_wake_function+0x12/0x20 Mar 21 21:00:20 localhost kernel: [192959.178189] [] ? __wake_up_common+0x58/0x90 Mar 21 21:00:20 localhost kernel: [192959.178227] [] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178265] [] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] Mar 21 21:00:20 localhost kernel: [192959.178270] [] kthread+0xcf/0xe0 Mar 21 21:00:20 localhost kernel: [192959.178274] [] ? insert_kthread_work+0x40/0x40 Mar 21 21:00:20 localhost kernel: [192959.178277] [] ret_from_fork+0x58/0x90 Mar 21 21:00:20 localhost kernel: [192959.178280] [] ? insert_kthread_work+0x40/0x40 Mar 21 21:00:20 localhost kernel: [192959.178290] INFO: task mdt01_010:6359 blocked for more than 120 seconds. Mar 21 21:00:20 localhost kernel: [192959.186111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 21 21:00:20 localhost kernel: [192959.195417] mdt01_010 D ffff880fcf3acf10 0 6359 2 0x00000080 ************ Mar 21 21:02:20 localhost kernel: [193079.195092] INFO: task mdt01_000:4962 blocked for more than 120 seconds. Mar 21 21:02:20 localhost kernel: [193079.202921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 21 21:02:20 localhost kernel: [193079.212234] mdt01_000 D ffff88101cd00fd0 0 4962 2 0x00000080 Mar 21 21:02:20 localhost kernel: [193079.212240] Call Trace: Mar 21 21:02:20 localhost kernel: [193079.212255] [] ? __find_get_block+0xbc/0x120 Mar 21 21:02:20 localhost kernel: [193079.212267] [] schedule+0x29/0x70 Mar 21 21:02:20 localhost kernel: [193079.212279] [] rwsem_down_write_failed+0x225/0x3a0 Mar 21 21:02:20 localhost kernel: [193079.212307] [] ? __ldiskfs_get_inode_loc+0x110/0x3e0 [ldiskfs] Mar 21 21:02:20 localhost kernel: [193079.212313] [] call_rwsem_down_write_failed+0x17/0x30 Mar 21 21:02:20 localhost kernel: [193079.212321] [] down_write+0x2d/0x3d Mar 21 21:02:20 localhost kernel: [193079.212343] [] lod_qos_prep_create+0xaa4/0x17f0 [lod] Mar 21 21:02:20 localhost kernel: [193079.212361] [] ? ldiskfs_xattr_get+0x7d/0x2e0 [ldiskfs] Mar 21 21:02:20 localhost kernel: [193079.212379] [] ? osd_xattr_get+0x231/0x820 [osd_ldiskfs] Mar 21 21:02:20 localhost kernel: [193079.212398] [] lod_prepare_create+0x298/0x3f0 [lod] Mar 21 21:02:20 localhost kernel: [193079.212415] [] lod_declare_striped_create+0x1ee/0x970 [lod] Mar 21 21:02:20 localhost kernel: [193079.212430] [] lod_declare_xattr_set+0x221/0xe40 [lod] Mar 21 21:02:20 localhost kernel: [193079.212447] [] mdd_create_data+0x487/0x720 [mdd] Mar 21 21:02:20 localhost kernel: [193079.212474] [] mdt_mfd_open+0xc5a/0xe70 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212508] [] mdt_finish_open+0x57b/0x690 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212531] [] mdt_reint_open+0x179c/0x31a0 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212547] [] ? cfs_match_nid+0x96/0xd0 [lnet] Mar 21 21:02:20 localhost kernel: [193079.212569] [] ? mdt_root_squash+0xc3/0x430 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212590] [] mdt_reint_rec+0x80/0x210 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212610] [] mdt_reint_internal+0x5fb/0x9c0 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212627] [] mdt_intent_reint+0x162/0x430 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212648] [] mdt_intent_policy+0x43e/0xc70 [mdt] Mar 21 21:02:20 localhost kernel: [193079.212709] [] ? ldlm_resource_get+0x9f/0xa30 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.212761] [] ldlm_lock_enqueue+0x387/0x970 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.212819] [] ldlm_handle_enqueue0+0x9c3/0x1680 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.212884] [] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.212961] [] tgt_enqueue+0x62/0x210 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.213040] [] tgt_request_handle+0x925/0x1370 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.213114] [] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.213189] [] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.213207] [] ? default_wake_function+0x12/0x20 Mar 21 21:02:20 localhost kernel: [193079.213219] [] ? __wake_up_common+0x58/0x90 Mar 21 21:02:20 localhost kernel: [193079.213284] [] ptlrpc_main+0xa92/0x1e40 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.213349] [] ? ptlrpc_register_service+0xe30/0xe30 [ptlrpc] Mar 21 21:02:20 localhost kernel: [193079.213357] [] kthread+0xcf/0xe0 Mar 21 21:02:20 localhost kernel: [193079.213361] [] ? insert_kthread_work+0x40/0x40 Mar 21 21:02:20 localhost kernel: [193079.213367] [] ret_from_fork+0x58/0x90 Mar 21 21:02:20 localhost kernel: [193079.213371] [] ? insert_kthread_work+0x40/0x40