Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9352

Lustre clients asserting (lcommon_cl.c:148:cl_file_inode_init()) ASSERTION( (((inode->i_mode) & 00170000) == 0100000) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.9.0
    • 4
    • 9223372036854775807

    Description

      We had 4 Lustre clients wolf-6-9 and the Lustre server nodes had crashed, taken a dump and come back up. When Lustre servers were back up and the file system was restored all the clients started asserting with:

      [22181.389692] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 82 previous similar messages
      [22806.001932] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1492387170/real 1492387170]  req@ffff8809df2aa100 x1564860107186640/t0(0) o38->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:12/10 lens 520/544 e 0 to 1 dl 1492387225 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
      [22806.044008] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 82 previous similar messages
      [23130.818757] LustreError: 167-0: lsdraid-OST0000-osc-ffff880832a4b000: This client was evicted by lsdraid-OST0000; in progress operations using this service will fail.
      [23131.304299] Lustre: 6376:0:(llite_lib.c:2619:ll_dirty_page_discard_warn()) lsdraid: dirty page discard: 192.168.1.5@o2ib:/lsdraid/fid: [0x200000401:0x18:0x0]// may get corrupted (rc -108)
      [23131.304639] Lustre: lsdraid-OST0000-osc-ffff880832a4b000: Connection restored to 192.168.1.3@o2ib (at 192.168.1.3@o2ib)
      [23131.315453] Lustre: 6353:0:(llite_lib.c:2619:ll_dirty_page_discard_warn()) lsdraid: dirty page discard: 192.168.1.5@o2ib:/lsdraid/fid: [0x200000401:0x16:0x0]// may get corrupted (rc -108)
      [23131.321505] Lustre: 6377:0:(llite_lib.c:2619:ll_dirty_page_discard_warn()) lsdraid: dirty page discard: 192.168.1.5@o2ib:/lsdraid/fid: [0x20000040c:0xb43b:0x0]// may get corrupted (rc -108)
      [23155.804921] Lustre: Evicted from MGS (at 192.168.1.5@o2ib) after server handle changed from 0x9b83d124804f915d to 0x3fceeb65e42b218b
      [23155.824502] LustreError: 167-0: lsdraid-MDT0000-mdc-ffff880832a4b000: This client was evicted by lsdraid-MDT0000; in progress operations using this service will fail.
      [23155.824529] Lustre: MGC192.168.1.5@o2ib: Connection restored to 192.168.1.5@o2ib (at 192.168.1.5@o2ib)
      [23155.824532] Lustre: Skipped 1 previous similar message
      [23155.869652] LustreError: Skipped 1 previous similar message
      [23155.879415] LustreError: 35603:0:(file.c:180:ll_close_inode_openhandle()) lsdraid-clilmv-ffff880832a4b000: inode [0x200000401:0x59:0x0] mdc close failed: rc = -5
      [23155.879478] LustreError: 48874:0:(file.c:3403:ll_inode_revalidate_fini()) lsdraid: revalidate FID [0x200000007:0x1:0x0] error: rc = -5
      [23155.879537] LustreError: 27229:0:(vvp_io.c:1429:vvp_io_init()) lsdraid: refresh file layout [0x20000040c:0xc035:0x0] error -5.
      [23155.880185] traps: fsx[6464] general protection ip:7f42c4af1ab4 sp:7ffd375bc060 error:0 in libc-2.17.so[7f42c4aa9000+1b7000]
      [23155.880204] traps: fsx[6463] general protection ip:7f45b5297ab4 sp:7ffc1eae97e0 error:0 in libc-2.17.so[7f45b524f000+1b7000]
      [23155.969495] LustreError: 35603:0:(file.c:180:ll_close_inode_openhandle()) Skipped 7 previous similar messages
      [23156.181966] LustreError: 48928:0:(lcommon_cl.c:148:cl_file_inode_init()) ASSERTION( (((inode->i_mode) & 00170000) == 0100000) ) failed: 
      [23156.181969] LustreError: 48928:0:(lcommon_cl.c:148:cl_file_inode_init()) LBUG
      [23156.181970] Pid: 48928, comm: prove
      [23156.181970] 
      Call Trace:
      [23156.181994]  [<ffffffffa09507d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
      [23156.182001]  [<ffffffffa0950d75>] lbug_with_loc+0x45/0xc0 [libcfs]
      [23156.182041]  [<ffffffffa0dc28f8>] cl_file_inode_init+0x248/0x300 [lustre]
      [23156.182054]  [<ffffffffa0d9ae65>] ll_update_inode+0x305/0x5d0 [lustre]
      [23156.182064]  [<ffffffffa0b68c29>] ? lmv_get_lustre_md+0x149/0x2d0 [lmv]
      [23156.182077]  [<ffffffffa0d9d08d>] ll_prep_inode+0x12d/0xc90 [lustre]
      [23156.182090]  [<ffffffffa0da9753>] ? ll_i2gids+0x33/0xb0 [lustre]
      [23156.182110]  [<ffffffff81101302>] ? from_kgid+0x12/0x20
      [23156.182121]  [<ffffffffa0d687c1>] ll_revalidate_it_finish+0x71/0x170 [lustre]
      [23156.182132]  [<ffffffffa0d83b9b>] __ll_inode_revalidate+0x19b/0xc00 [lustre]
      [23156.182143]  [<ffffffffa0d84a22>] ll_getattr+0x62/0x7c0 [lustre]
      [23156.182150]  [<ffffffff811e3c66>] vfs_getattr+0x46/0x80
      [23156.182153]  [<ffffffff811e3d95>] vfs_fstatat+0x75/0xc0
      [23156.182156]  [<ffffffff811e42ee>] SYSC_newstat+0x2e/0x60
      [23156.182163]  [<ffffffff81642363>] ? do_page_fault+0x23/0x80
      [23156.182165]  [<ffffffff8163e648>] ? page_fault+0x28/0x30
      [23156.182168]  [<ffffffff811e45ce>] SyS_newstat+0xe/0x10
      [23156.182174]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [23156.182175] 
      [23156.182316] Kernel panic - not syncing: LBUG
      [23156.182319] CPU: 38 PID: 48928 Comm: prove Tainted: G          IOE  ------------   3.10.0-327.36.3.el7.x86_64 #1
      [23156.182320] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
      [23156.182326]  ffffffffa096ddef 0000000056b71082 ffff88103be23ad0 ffffffff81636431
      [23156.182331]  ffff88103be23b50 ffffffff8162fcc0 ffffffff00000008 ffff88103be23b60
      [23156.182336]  ffff88103be23b00 0000000056b71082 ffffffffa0dd3780 0000000000000000
      [23156.182336] Call Trace:
      [23156.182346]  [<ffffffff81636431>] dump_stack+0x19/0x1b
      [23156.182349]  [<ffffffff8162fcc0>] panic+0xd8/0x1e7
      [23156.182357]  [<ffffffffa0950ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
      [23156.182373]  [<ffffffffa0dc28f8>] cl_file_inode_init+0x248/0x300 [lustre]
      [23156.182385]  [<ffffffffa0d9ae65>] ll_update_inode+0x305/0x5d0 [lustre]
      [23156.182391]  [<ffffffffa0b68c29>] ? lmv_get_lustre_md+0x149/0x2d0 [lmv]
      [23156.182403]  [<ffffffffa0d9d08d>] ll_prep_inode+0x12d/0xc90 [lustre]
      [23156.182416]  [<ffffffffa0da9753>] ? ll_i2gids+0x33/0xb0 [lustre]
      [23156.182419]  [<ffffffff81101302>] ? from_kgid+0x12/0x20
      [23156.182429]  [<ffffffffa0d687c1>] ll_revalidate_it_finish+0x71/0x170 [lustre]
      [23156.182440]  [<ffffffffa0d83b9b>] __ll_inode_revalidate+0x19b/0xc00 [lustre]
      [23156.182451]  [<ffffffffa0d84a22>] ll_getattr+0x62/0x7c0 [lustre]
      [23156.182455]  [<ffffffff811e3c66>] vfs_getattr+0x46/0x80
      [23156.182458]  [<ffffffff811e3d95>] vfs_fstatat+0x75/0xc0
      [23156.182461]  [<ffffffff811e42ee>] SYSC_newstat+0x2e/0x60
      [23156.182464]  [<ffffffff81642363>] ? do_page_fault+0x23/0x80
      [23156.182467]  [<ffffffff8163e648>] ? page_fault+0x28/0x30
      [23156.182470]  [<ffffffff811e45ce>] SyS_newstat+0xe/0x10
      [23156.182473]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      

      Before this we did see this on the various clients

       
      [ 7621.463514] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 2 previous similar messages
      [ 7680.841904] INFO: task fsx:6463 blocked for more than 120 seconds.
      [ 7680.850637] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7680.861238] fsx             D ffff880830affeb0     0  6463   6459 0x00000080
      [ 7680.870886]  ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8
      [ 7680.880858]  ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10
      [ 7680.890701]  ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0
      [ 7680.900454] Call Trace:
      [ 7680.904601]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 7680.911469]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 7680.919304]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 7680.928615]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 7680.937161]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 7680.945840]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 7680.955264]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 7680.963198]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 7680.970537]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 7680.978788]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 7680.987737]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7680.995551]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 7681.003654]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 7681.011803]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 7681.019088]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7681.026856]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 7681.034706]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 7681.042840]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 7681.051174]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 7681.058714]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 7681.065875]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 7681.072636]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 7681.080185]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 7681.088901]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 7681.095774]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 7681.103428] INFO: task fsx:6464 blocked for more than 120 seconds.
      [ 7681.111278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7681.120987] fsx             D ffff88083b5efeb0     0  6464   6459 0x00000080
      [ 7681.129878]  ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8
      [ 7681.139209]  ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650
      [ 7681.148535]  ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0
      [ 7681.157861] Call Trace:
      [ 7681.161544]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 7681.168024]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 7681.175467]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 7681.184430]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 7681.192611]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 7681.200963]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 7681.210050]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 7681.217661]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 7681.224683]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 7681.232649]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 7681.241303]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7681.248866]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 7681.256715]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 7681.264660]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 7681.271743]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7681.279302]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 7681.286951]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 7681.294894]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 7681.303025]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 7681.310366]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 7681.317316]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 7681.323870]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 7681.331196]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 7681.339693]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 7681.346340]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 7707.260211] Lustre: 6361:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1492372056/real 1492372056]  req@ffff88016636a400 x1564860107137792/t0(0) o103->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:17/18 lens 416/224 e 0 to 1 dl 1492372063 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
      [ 7707.297453] Lustre: 6361:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
      [ 7707.309810] Lustre: lsdraid-MDT0000-mdc-ffff880832a4b000: Connection to lsdraid-MDT0000 (at 192.168.1.5@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      [ 7739.314486] LustreError: 166-1: MGC192.168.1.5@o2ib: Connection to MGS (at 192.168.1.5@o2ib) was lost; in progress operations using this service will fail
      [ 7801.288244] INFO: task fsx:6463 blocked for more than 120 seconds.
      [ 7801.296906] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7801.307434] fsx             D ffff880830affeb0     0  6463   6459 0x00000080
      [ 7801.317015]  ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8
      [ 7801.326920]  ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10
      [ 7801.336722]  ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0
      [ 7801.346435] Call Trace:
      [ 7801.350508]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 7801.357404]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 7801.365221]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 7801.374549]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 7801.383050]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 7801.391743]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 7801.401132]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 7801.409054]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 7801.416405]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 7801.424667]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 7801.433634]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7801.441454]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 7801.449568]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 7801.457722]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 7801.465010]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7801.472742]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 7801.480564]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 7801.488662]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 7801.496959]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 7801.504464]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 7801.511594]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 7801.518325]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 7801.525833]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 7801.534511]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 7801.541335]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 7801.548948] INFO: task fsx:6464 blocked for more than 120 seconds.
      [ 7801.556758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7801.566424] fsx             D ffff88083b5efeb0     0  6464   6459 0x00000080
      [ 7801.575267]  ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8
      [ 7801.584558]  ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650
      [ 7801.593846]  ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0
      [ 7801.603132] Call Trace:
      [ 7801.606799]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 7801.613263]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 7801.620689]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 7801.629636]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 7801.637804]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 7801.646153]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 7801.655416]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 7801.663196]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 7801.670366]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 7801.678492]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 7801.687310]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7801.695042]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 7801.703072]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 7801.711180]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 7801.718425]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7801.725989]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 7801.733633]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 7801.741573]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 7801.749702]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 7801.757038]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 7801.763987]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 7801.770540]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 7801.777865]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 7801.786350]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 7801.792996]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 7839.277447] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1492372195/real 1492372195]  req@ffff8809df2a9b00 x1564860107138192/t0(0) o38->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:12/10 lens 520/544 e 0 to 1 dl 1492372226 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
      [ 7839.315041] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 20 previous similar messages
      [ 7921.733573] INFO: task fsx:6463 blocked for more than 120 seconds.
      [ 7921.742066] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7921.752431] fsx             D ffff880830affeb0     0  6463   6459 0x00000080
      [ 7921.761858]  ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8
      [ 7921.771634]  ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10
      [ 7921.781416]  ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0
      [ 7921.791098] Call Trace:
      [ 7921.795146]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 7921.801932]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 7921.809719]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 7921.818969]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 7921.827468]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 7921.836091]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 7921.845456]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 7921.853295]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 7921.860546]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 7921.868721]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 7921.877597]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7921.885333]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 7921.893362]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 7921.901445]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 7921.908666]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7921.916316]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 7921.924069]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 7921.932109]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 7921.940338]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 7921.947769]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 7921.954813]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 7921.961460]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 7921.968883]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 7921.977476]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 7921.984224]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 7921.991749] INFO: task fsx:6464 blocked for more than 120 seconds.
      [ 7921.999474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7922.009057] fsx             D ffff88083b5efeb0     0  6464   6459 0x00000080
      [ 7922.017819]  ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8
      [ 7922.027030]  ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650
      [ 7922.036242]  ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0
      [ 7922.045439] Call Trace:
      [ 7922.049028]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 7922.055436]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 7922.062832]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 7922.071769]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 7922.079951]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 7922.088305]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 7922.097396]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 7922.105008]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 7922.112014]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 7922.119982]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 7922.128637]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7922.136386]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 7922.144462]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 7922.152623]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 7922.159869]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 7922.167613]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 7922.175421]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 7922.183526]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 7922.191819]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 7922.199319]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 7922.206268]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 7922.212823]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 7922.220147]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 7922.228632]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 7922.235281]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 7923.592450] LustreError: 27229:0:(osc_cache.c:951:osc_extent_wait()) extent ffff88012eaeb068@{[69632 -> 73727/73727], [3|0|+|rpc|wiY|ffff8806c50ac850], [17432576|1|+|-|ffff8802ae377000|4096|ffff88104d932280]} lsdraid-OST0001-osc-ffff880832a4b000: wait ext to 0 timedout, recovery in progress?
      [ 7923.625757] LustreError: 27229:0:(osc_cache.c:951:osc_extent_wait()) ### extent: ffff88012eaeb068 ns: lsdraid-OST0001-osc-ffff880832a4b000 lock: ffff8802ae377000/0x766280d790dde765 lrc: 11/0,1 mode: PW/PW res: [0x85588:0x0:0x0].0x0 rrc: 1 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x20000000000 nid: local remote: 0x17e15c0cc7f9c1bf expref: -99 pid: 26746 timeout: 0 lvb_type: 1
      [ 8042.175864] INFO: task fsx:6463 blocked for more than 120 seconds.
      [ 8042.184502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 8042.195019] fsx             D ffff880830affeb0     0  6463   6459 0x00000080
      [ 8042.204575]  ffff880830affae0 0000000000000082 ffff8808501a7300 ffff880830afffd8
      [ 8042.214464]  ffff880830afffd8 ffff880830afffd8 ffff8808501a7300 ffff88103c4c5c10
      [ 8042.224251]  ffff88103c4c5c18 7fffffffffffffff ffff8808501a7300 ffff880830affeb0
      [ 8042.233932] Call Trace:
      [ 8042.238032]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 8042.244837]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 8042.252625]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 8042.261883]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 8042.270308]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 8042.278917]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 8042.288229]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 8042.296078]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 8042.303299]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 8042.311491]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 8042.320343]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 8042.328108]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 8042.336118]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 8042.344221]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 8042.351457]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 8042.359196]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 8042.367001]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 8042.375099]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 8042.383383]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 8042.390874]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 8042.397978]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 8042.404692]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 8042.412185]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 8042.420849]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 8042.427670]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 8042.435264] INFO: task fsx:6464 blocked for more than 120 seconds.
      [ 8042.443057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 8042.452712] fsx             D ffff88083b5efeb0     0  6464   6459 0x00000080
      [ 8042.461542]  ffff88083b5efae0 0000000000000086 ffff8808501a3980 ffff88083b5effd8
      [ 8042.470819]  ffff88083b5effd8 ffff88083b5effd8 ffff8808501a3980 ffff88103c5ba650
      [ 8042.480091]  ffff88103c5ba658 7fffffffffffffff ffff8808501a3980 ffff88083b5efeb0
      [ 8042.489357] Call Trace:
      [ 8042.493008]  [<ffffffff8163bb39>] schedule+0x29/0x70
      [ 8042.499480]  [<ffffffff81639829>] schedule_timeout+0x209/0x2d0
      [ 8042.506915]  [<ffffffffa0a51a53>] ? ptlrpc_set_add_new_req+0xe3/0x160 [ptlrpc]
      [ 8042.515873]  [<ffffffffa0e47e80>] ? osc_io_ladvise_end+0x50/0x50 [osc]
      [ 8042.524049]  [<ffffffffa0a80c2c>] ? ptlrpcd_add_req+0x1ec/0x2e0 [ptlrpc]
      [ 8042.532586]  [<ffffffffa0a63f30>] ? lustre_swab_niobuf_remote+0x30/0x30 [ptlrpc]
      [ 8042.541853]  [<ffffffff8163bf06>] wait_for_completion+0x116/0x170
      [ 8042.549637]  [<ffffffff810b8940>] ? wake_up_state+0x20/0x20
      [ 8042.556810]  [<ffffffffa0e48044>] osc_io_setattr_end+0xc4/0x180 [osc]
      [ 8042.564949]  [<ffffffffa08ee3b0>] ? lov_io_iter_fini_wrapper+0x50/0x50 [lov]
      [ 8042.573771]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 8042.581502]  [<ffffffffa08ee48b>] lov_io_end_wrapper+0xdb/0xe0 [lov]
      [ 8042.589512]  [<ffffffffa08ee9a6>] lov_io_call.isra.9+0x86/0x140 [lov]
      [ 8042.597456]  [<ffffffffa08eea96>] lov_io_end+0x36/0xb0 [lov]
      [ 8042.604533]  [<ffffffffa0c1db3d>] cl_io_end+0x5d/0x150 [obdclass]
      [ 8042.612090]  [<ffffffffa0c202e3>] cl_io_loop+0xb3/0x190 [obdclass]
      [ 8042.619734]  [<ffffffffa0dc2550>] cl_setattr_ost+0x240/0x3a0 [lustre]
      [ 8042.627675]  [<ffffffffa0d9c7f9>] ll_setattr_raw+0x12a9/0x1330 [lustre]
      [ 8042.635802]  [<ffffffffa0d9c8e3>] ll_setattr+0x63/0xc0 [lustre]
      [ 8042.643140]  [<ffffffff811fbe29>] notify_change+0x279/0x3d0
      [ 8042.650088]  [<ffffffff811dcf13>] do_truncate+0x73/0xc0
      [ 8042.656639]  [<ffffffff811e17a8>] ? __sb_start_write+0x58/0x110
      [ 8042.663963]  [<ffffffff811dd2bc>] do_sys_ftruncate.constprop.17+0x11c/0x180
      [ 8042.672448]  [<ffffffff811dd35e>] SyS_ftruncate+0xe/0x10
      [ 8042.679094]  [<ffffffff81646b49>] system_call_fastpath+0x16/0x1b
      [ 8114.126337] Lustre: 6338:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1492372470/real 1492372470]  req@ffff8809df2acb00 x1564860107139152/t0(0) o38->lsdraid-MDT0000-mdc-ffff880832a4b000@192.168.1.5@o2ib:12/10 lens 520/544 e 0 to 1 dl 1492372525 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
      

      All clients that had the Lustre FS mounted crashed with the same error.

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              jsalians_intel John Salinas (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: