We are running our lustre file system on 1 mds and 8 oss nodes. we are running lustre 2.10.6 on the lustre servers and clients.
On one of the clients, we are exporting lustre via NFS3 and smb, it has been working fine for more than a year, but recently the client which is exporting lustre as NFS and smb start to crash due to a lustre bug as following:
2014.148312] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1209601876 [1209597952, 1210056704)
[ 2014.148338] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
[ 2014.148352] Pid: 19435, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[ 2014.148353] Call Trace:
[ 2014.148376] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2014.148389] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2014.148394] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.148419] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.148449] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.148462] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.148470] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.148476] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.148480] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.148482] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.148484] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.148492] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.148498] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.148504] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.148509] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.148523] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.148531] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.148535] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.148539] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.148543] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2014.148551] Kernel panic - not syncing: LBUG
[ 2014.148561] CPU: 2 PID: 19435 Comm: nfsd Kdump: loaded Tainted: G OE ------------ 3.10.0-957.21.3.el7.x86_64 #1
[ 2014.148579] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.8 05/21/2018
[ 2014.148592] Call Trace:
[ 2014.148603] [<ffffffff8d563107>] dump_stack+0x19/0x1b
[ 2014.148615] [<ffffffff8d55c810>] panic+0xe8/0x21f
[ 2014.148629] [<ffffffffc0a0d8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[ 2014.148650] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.148675] [<ffffffffc0cb3357>] ? cl_lock_request+0x67/0x1f0 [obdclass]
[ 2014.148699] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.148722] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.148739] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.148753] [<ffffffff8ced3250>] ? check_preempt_curr+0x80/0xa0
[ 2014.148771] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.148784] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.148914] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.149049] [<ffffffffc1017eb0>] ? ll_file_splice_read+0x1e0/0x1e0 [lustre]
[ 2014.149185] [<ffffffffc1018440>] ? ll_file_aio_write+0x590/0x590 [lustre]
[ 2014.149318] [<ffffffff8d11e003>] ? ima_get_action+0x23/0x30
[ 2014.149447] [<ffffffff8d11d51e>] ? process_measurement+0x8e/0x250
[ 2014.149578] [<ffffffff8d03f087>] ? do_dentry_open+0x1e7/0x2e0
[ 2014.149708] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.149841] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.149975] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.150109] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.150243] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.150381] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.150489] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1211699028 [1211695104, 1212153856)
[ 2014.150491] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
[ 2014.150492] Pid: 19462, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[ 2014.150492] Call Trace:
[ 2014.150514] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2014.150519] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2014.150533] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.150551] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.150564] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.150571] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.150577] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.150580] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.150581] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.150583] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.150589] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.150594] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.150599] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.150603] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.150613] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.150621] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.150625] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.150627] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.150630] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.150634] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2014.152515] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1213796180 [1213792256, 1214251008)
[ 2014.152517] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
[ 2014.152518] Pid: 19480, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[ 2014.152519] Call Trace:
[ 2014.152542] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2014.152548] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2014.152569] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.152593] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.152610] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.152620] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.152630] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.152632] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.152634] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.152635] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.152643] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.152649] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.152655] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.152661] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.152671] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.152679] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.152685] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.152687] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.152689] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.152693] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2014.157437] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.157572] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.157704] [<ffffffffc0694090>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 2014.157835] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.157963] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
[ 2014.158094] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.158224] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
(END)
We have updated that client to lustre 2.12.2, but it did not help