Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.9.0
-
4
-
9223372036854775807
Description
We had 4 Lustre clients wolf-6-9 and the Lustre server nodes had crashed, taken a dump and come back up. When Lustre servers were back up and the file system was restored all the clients started asserting with:
[22181.389692] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 82 previous similar messages [22806.001932] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1492387170/real 1492387170] req@ffff8809df2aa100 x1564860107186640/t0(0) o38->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:12/10 lens 520/544 e 0 to 1 dl 1492387225 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [22806.044008] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 82 previous similar messages [23130.818757] LustreError: 167-0: lsdraid-OST0000-osc-ffff880832a4b000: This client was evicted by lsdraid-OST0000; in progress operations using this service will fail. [23131.304299] Lustre: 6376:0:(llite_lib.c:2619:ll_dirty_page_discard_warn()) lsdraid: dirty page discard: 192.168.1.5@o2ib:/lsdraid/fid: [0x200000401:0x18:0x0]// may get corrupted (rc -108) [23131.304639] Lustre: lsdraid-OST0000-osc-ffff880832a4b000: Connection restored to 192.168.1.3@o2ib (at 192.168.1.3@o2ib) [23131.315453] Lustre: 6353:0:(llite_lib.c:2619:ll_dirty_page_discard_warn()) lsdraid: dirty page discard: 192.168.1.5@o2ib:/lsdraid/fid: [0x200000401:0x16:0x0]// may get corrupted (rc -108) [23131.321505] Lustre: 6377:0:(llite_lib.c:2619:ll_dirty_page_discard_warn()) lsdraid: dirty page discard: 192.168.1.5@o2ib:/lsdraid/fid: [0x20000040c:0xb43b:0x0]// may get corrupted (rc -108) [23155.804921] Lustre: Evicted from MGS (at 192.168.1.5@o2ib) after server handle changed from 0x9b83d124804f915d to 0x3fceeb65e42b218b [23155.824502] LustreError: 167-0: lsdraid-MDT0000-mdc-ffff880832a4b000: This client was evicted by lsdraid-MDT0000; in progress operations using this service will fail. [23155.824529] Lustre: MGC192.168.1.5@o2ib: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib) [23155.824532] Lustre: Skipped 1 previous similar message [23155.869652] LustreError: Skipped 1 previous similar message [23155.879415] LustreError: 35603:0:(file.c:180:ll_close_inode_openhandle()) lsdraid-clilmv-ffff880832a4b000: inode [0x200000401:0x59:0x0] mdc close failed: rc = -5 [23155.879478] LustreError: 48874:0:(file.c:3403:ll_inode_revalidate_fini()) lsdraid: revalidate FID [0x200000007:0x1:0x0] error: rc = -5 [23155.879537] LustreError: 27229:0:(vvp_io.c:1429:vvp_io_init()) lsdraid: refresh file layout [0x20000040c:0xc035:0x0] error -5. [23155.880185] traps: fsx[6464] general protection ip:7f42c4af1ab4 sp:7ffd375bc060 error:0 in libc-2.17.so[7f42c4aa9000+1b7000] [23155.880204] traps: fsx[6463] general protection ip:7f45b5297ab4 sp:7ffc1eae97e0 error:0 in libc-2.17.so[7f45b524f000+1b7000] [23155.969495] LustreError: 35603:0:(file.c:180:ll_close_inode_openhandle()) Skipped 7 previous similar messages [23156.181966] LustreError: 48928:0:(lcommon_cl.c:148:cl_file_inode_init()) ASSERTION( (((inode->i_mode) & 00170000) == 0100000) ) failed: [23156.181969] LustreError: 48928:0:(lcommon_cl.c:148:cl_file_inode_init()) LBUG [23156.181970] Pid: 48928, comm: prove [23156.181970] Call Trace: [23156.181994] [<ffffffffa09507d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [23156.182001] [<ffffffffa0950d75>] lbug_with_loc+0x45/0xc0 [libcfs] [23156.182041] [<ffffffffa0dc28f8>] cl_file_inode_init+0x248/0x300 [lustre] [23156.182054] [<ffffffffa0d9ae65>] ll_update_inode+0x305/0x5d0 [lustre] [23156.182064] [<ffffffffa0b68c29>] ? lmv_get_lustre_md+0x149/0x2d0 [lmv] [23156.182077] [<ffffffffa0d9d08d>] ll_prep_inode+0x12d/0xc90 [lustre] [23156.182090] [<ffffffffa0da9753>] ? ll_i2gids+0x33/0xb0 [lustre] [23156.182110] [<ffffffff81101302>] ? from_kgid+0x12/0x20 [23156.182121] [<ffffffffa0d687c1>] ll_revalidate_it_finish+0x71/0x170 [lustre] [23156.182132] [<ffffffffa0d83b9b>] __ll_inode_revalidate+0x19b/0xc00 [lustre] [23156.182143] [<ffffffffa0d84a22>] ll_getattr+0x62/0x7c0 [lustre] [23156.182150] [<ffffffff811e3c66>] vfs_getattr+0x46/0x80 [23156.182153] [<ffffffff811e3d95>] vfs_fstatat+0x75/0xc0 [23156.182156] [<ffffffff811e42ee>] SYSC_newstat+0x2e/0x60 [23156.182163] [<ffffffff81642363>] ? do_page_fault+0x23/0x80 [23156.182165] [<ffffffff8163e648>] ? page_fault+0x28/0x30 [23156.182168] [<ffffffff811e45ce>] SyS_newstat+0xe/0x10 [23156.182174] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [23156.182175] [23156.182316] Kernel panic - not syncing: LBUG [23156.182319] CPU: 38 PID: 48928 Comm: prove Tainted: G IOE ------------ 3.10.0-327.36.3.el7.x86_64 #1 [23156.182320] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [23156.182326] ffffffffa096ddef 0000000056b71082 ffff88103be23ad0 ffffffff81636431 [23156.182331] ffff88103be23b50 ffffffff8162fcc0 ffffffff00000008 ffff88103be23b60 [23156.182336] ffff88103be23b00 0000000056b71082 ffffffffa0dd3780 0000000000000000 [23156.182336] Call Trace: [23156.182346] [<ffffffff81636431>] dump_stack+0x19/0x1b [23156.182349] [<ffffffff8162fcc0>] panic+0xd8/0x1e7 [23156.182357] [<ffffffffa0950ddb>] lbug_with_loc+0xab/0xc0 [libcfs] [23156.182373] [<ffffffffa0dc28f8>] cl_file_inode_init+0x248/0x300 [lustre] [23156.182385] [<ffffffffa0d9ae65>] ll_update_inode+0x305/0x5d0 [lustre] [23156.182391] [<ffffffffa0b68c29>] ? lmv_get_lustre_md+0x149/0x2d0 [lmv] [23156.182403] [<ffffffffa0d9d08d>] ll_prep_inode+0x12d/0xc90 [lustre] [23156.182416] [<ffffffffa0da9753>] ? ll_i2gids+0x33/0xb0 [lustre] [23156.182419] [<ffffffff81101302>] ? from_kgid+0x12/0x20 [23156.182429] [<ffffffffa0d687c1>] ll_revalidate_it_finish+0x71/0x170 [lustre] [23156.182440] [<ffffffffa0d83b9b>] __ll_inode_revalidate+0x19b/0xc00 [lustre] [23156.182451] [<ffffffffa0d84a22>] ll_getattr+0x62/0x7c0 [lustre] [23156.182455] [<ffffffff811e3c66>] vfs_getattr+0x46/0x80 [23156.182458] [<ffffffff811e3d95>] vfs_fstatat+0x75/0xc0 [23156.182461] [<ffffffff811e42ee>] SYSC_newstat+0x2e/0x60 [23156.182464] [<ffffffff81642363>] ? do_page_fault+0x23/0x80 [23156.182467] [<ffffffff8163e648>] ? page_fault+0x28/0x30 [23156.182470] [<ffffffff811e45ce>] SyS_newstat+0xe/0x10 [23156.182473] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
Before this we did see this on the various clients
[ 7621.463514] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [ 7680.841904] INFO: task fsx:6463 blocked for more than 120 seconds. [ 7680.850637] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7680.861238] fsx D ffff880830affeb0 0 6463 6459 0x00000080 [ 7680.870886] ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8 [ 7680.880858] ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10 [ 7680.890701] ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0 [ 7680.900454] Call Trace: [ 7680.904601] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 7680.911469] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 7680.919304] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 7680.928615] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 7680.937161] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 7680.945840] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 7680.955264] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 7680.963198] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 7680.970537] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 7680.978788] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 7680.987737] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7680.995551] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 7681.003654] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 7681.011803] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 7681.019088] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7681.026856] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 7681.034706] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 7681.042840] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 7681.051174] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 7681.058714] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 7681.065875] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 7681.072636] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 7681.080185] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 7681.088901] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 7681.095774] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 7681.103428] INFO: task fsx:6464 blocked for more than 120 seconds. [ 7681.111278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7681.120987] fsx D ffff88083b5efeb0 0 6464 6459 0x00000080 [ 7681.129878] ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8 [ 7681.139209] ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650 [ 7681.148535] ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0 [ 7681.157861] Call Trace: [ 7681.161544] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 7681.168024] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 7681.175467] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 7681.184430] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 7681.192611] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 7681.200963] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 7681.210050] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 7681.217661] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 7681.224683] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 7681.232649] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 7681.241303] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7681.248866] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 7681.256715] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 7681.264660] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 7681.271743] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7681.279302] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 7681.286951] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 7681.294894] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 7681.303025] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 7681.310366] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 7681.317316] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 7681.323870] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 7681.331196] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 7681.339693] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 7681.346340] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 7707.260211] Lustre: 6361:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1492372056/real 1492372056] req@ffff88016636a400 x1564860107137792/t0(0) o103->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:17/18 lens 416/224 e 0 to 1 dl 1492372063 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [ 7707.297453] Lustre: 6361:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [ 7707.309810] Lustre: lsdraid-MDT0000-mdc-ffff880832a4b000: Connection to lsdraid-MDT0000 (at 192.168.1.5@o2ib) was lost; in progress operations using this service will wait for recovery to complete [ 7739.314486] LustreError: 166-1: MGC192.168.1.5@o2ib: Connection to MGS (at 192.168.1.5@o2ib) was lost; in progress operations using this service will fail [ 7801.288244] INFO: task fsx:6463 blocked for more than 120 seconds. [ 7801.296906] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7801.307434] fsx D ffff880830affeb0 0 6463 6459 0x00000080 [ 7801.317015] ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8 [ 7801.326920] ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10 [ 7801.336722] ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0 [ 7801.346435] Call Trace: [ 7801.350508] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 7801.357404] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 7801.365221] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 7801.374549] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 7801.383050] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 7801.391743] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 7801.401132] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 7801.409054] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 7801.416405] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 7801.424667] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 7801.433634] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7801.441454] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 7801.449568] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 7801.457722] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 7801.465010] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7801.472742] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 7801.480564] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 7801.488662] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 7801.496959] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 7801.504464] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 7801.511594] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 7801.518325] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 7801.525833] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 7801.534511] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 7801.541335] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 7801.548948] INFO: task fsx:6464 blocked for more than 120 seconds. [ 7801.556758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7801.566424] fsx D ffff88083b5efeb0 0 6464 6459 0x00000080 [ 7801.575267] ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8 [ 7801.584558] ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650 [ 7801.593846] ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0 [ 7801.603132] Call Trace: [ 7801.606799] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 7801.613263] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 7801.620689] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 7801.629636] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 7801.637804] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 7801.646153] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 7801.655416] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 7801.663196] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 7801.670366] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 7801.678492] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 7801.687310] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7801.695042] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 7801.703072] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 7801.711180] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 7801.718425] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7801.725989] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 7801.733633] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 7801.741573] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 7801.749702] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 7801.757038] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 7801.763987] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 7801.770540] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 7801.777865] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 7801.786350] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 7801.792996] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 7839.277447] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1492372195/real 1492372195] req@ffff8809df2a9b00 x1564860107138192/t0(0) o38->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:12/10 lens 520/544 e 0 to 1 dl 1492372226 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1 [ 7839.315041] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 20 previous similar messages [ 7921.733573] INFO: task fsx:6463 blocked for more than 120 seconds. [ 7921.742066] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7921.752431] fsx D ffff880830affeb0 0 6463 6459 0x00000080 [ 7921.761858] ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8 [ 7921.771634] ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10 [ 7921.781416] ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0 [ 7921.791098] Call Trace: [ 7921.795146] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 7921.801932] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 7921.809719] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 7921.818969] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 7921.827468] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 7921.836091] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 7921.845456] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 7921.853295] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 7921.860546] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 7921.868721] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 7921.877597] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7921.885333] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 7921.893362] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 7921.901445] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 7921.908666] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7921.916316] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 7921.924069] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 7921.932109] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 7921.940338] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 7921.947769] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 7921.954813] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 7921.961460] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 7921.968883] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 7921.977476] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 7921.984224] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 7921.991749] INFO: task fsx:6464 blocked for more than 120 seconds. [ 7921.999474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 7922.009057] fsx D ffff88083b5efeb0 0 6464 6459 0x00000080 [ 7922.017819] ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8 [ 7922.027030] ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650 [ 7922.036242] ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0 [ 7922.045439] Call Trace: [ 7922.049028] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 7922.055436] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 7922.062832] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 7922.071769] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 7922.079951] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 7922.088305] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 7922.097396] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 7922.105008] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 7922.112014] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 7922.119982] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 7922.128637] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7922.136386] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 7922.144462] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 7922.152623] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 7922.159869] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 7922.167613] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 7922.175421] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 7922.183526] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 7922.191819] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 7922.199319] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 7922.206268] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 7922.212823] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 7922.220147] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 7922.228632] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 7922.235281] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 7923.592450] LustreError: 27229:0:(osc_cache.c:951:osc_extent_wait()) extent ffff88012eaeb068@{[69632 -> 73727/73727], [3|0|+|rpc|wiY|ffff8806c50ac850], [17432576|1|+|-|ffff8802ae377000|4096|ffff88104d932280]} lsdraid-OST0001-osc-ffff880832a4b000: wait ext to 0 timedout, recovery in progress? [ 7923.625757] LustreError: 27229:0:(osc_cache.c:951:osc_extent_wait()) ### extent: ffff88012eaeb068 ns: lsdraid-OST0001-osc-ffff880832a4b000 lock: ffff8802ae377000/0x766280d790dde765 lrc: 11/0,1 mode: PW/PW res: [0x85588:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20000000000 nid: local remote: 0x17e15c0cc7f9c1bf expref: -99 pid: 26746 timeout: 0 lvb_type: 1 [ 8042.175864] INFO: task fsx:6463 blocked for more than 120 seconds. [ 8042.184502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 8042.195019] fsx D ffff880830affeb0 0 6463 6459 0x00000080 [ 8042.204575] ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8 [ 8042.214464] ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10 [ 8042.224251] ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0 [ 8042.233932] Call Trace: [ 8042.238032] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 8042.244837] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 8042.252625] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 8042.261883] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 8042.270308] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 8042.278917] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 8042.288229] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 8042.296078] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 8042.303299] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 8042.311491] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 8042.320343] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 8042.328108] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 8042.336118] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 8042.344221] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 8042.351457] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 8042.359196] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 8042.367001] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 8042.375099] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 8042.383383] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 8042.390874] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 8042.397978] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 8042.404692] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 8042.412185] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 8042.420849] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 8042.427670] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 8042.435264] INFO: task fsx:6464 blocked for more than 120 seconds. [ 8042.443057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 8042.452712] fsx D ffff88083b5efeb0 0 6464 6459 0x00000080 [ 8042.461542] ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8 [ 8042.470819] ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650 [ 8042.480091] ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0 [ 8042.489357] Call Trace: [ 8042.493008] [<ffffffff8163bb39>] schedule+0x29/0x70 [ 8042.499480] [<ffffffff81639829>] schedule_timeout+0x209/0x2d0 [ 8042.506915] [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc] [ 8042.515873] [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc] [ 8042.524049] [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc] [ 8042.532586] [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc] [ 8042.541853] [<ffffffff8163bf06>] wait_for_completion+0x116/0x170 [ 8042.549637] [<ffffffff810b8940>] ? wake_up_state+0x20/0x20 [ 8042.556810] [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc] [ 8042.564949] [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov] [ 8042.573771] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 8042.581502] [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov] [ 8042.589512] [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov] [ 8042.597456] [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov] [ 8042.604533] [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass] [ 8042.612090] [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass] [ 8042.619734] [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre] [ 8042.627675] [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre] [ 8042.635802] [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre] [ 8042.643140] [<ffffffff811fbe29>] notify_change+0x279/0x3d0 [ 8042.650088] [<ffffffff811dcf13>] do_truncate+0x73/0xc0 [ 8042.656639] [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110 [ 8042.663963] [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180 [ 8042.672448] [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10 [ 8042.679094] [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b [ 8114.126337] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1492372470/real 1492372470] req@ffff8809df2acb00 x1564860107139152/t0(0) o38->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:12/10 lens 520/544 e 0 to 1 dl 1492372525 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
All clients that had the Lustre FS mounted crashed with the same error.