Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.10.6, Lustre 2.12.2
-
None
-
Server: PowerEdge R640 with 64 GB memory and Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
OS: CentOS 7.5.1804
Lustre client: 2.12.2
-
3
-
9223372036854775807
Description
We are running our lustre file system on 1 mds and 8 oss nodes. we are running lustre 2.10.6 on the lustre servers and clients.
On one of the clients, we are exporting lustre via NFS3 and smb, it has been working fine for more than a year, but recently the client which is exporting lustre as NFS and smb start to crash due to a lustre bug as following:
2014.148312] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1209601876 [1209597952, 1210056704)
[ 2014.148338] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
[ 2014.148352] Pid: 19435, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[ 2014.148353] Call Trace:
[ 2014.148376] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2014.148389] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2014.148394] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.148419] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.148449] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.148462] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.148470] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.148476] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.148480] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.148482] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.148484] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.148492] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.148498] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.148504] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.148509] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.148523] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.148531] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.148535] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.148539] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.148543] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2014.148551] Kernel panic - not syncing: LBUG
[ 2014.148561] CPU: 2 PID: 19435 Comm: nfsd Kdump: loaded Tainted: G OE ------------ 3.10.0-957.21.3.el7.x86_64 #1
[ 2014.148579] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.8 05/21/2018
[ 2014.148592] Call Trace:
[ 2014.148603] [<ffffffff8d563107>] dump_stack+0x19/0x1b
[ 2014.148615] [<ffffffff8d55c810>] panic+0xe8/0x21f
[ 2014.148629] [<ffffffffc0a0d8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[ 2014.148650] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.148675] [<ffffffffc0cb3357>] ? cl_lock_request+0x67/0x1f0 [obdclass]
[ 2014.148699] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.148722] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.148739] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.148753] [<ffffffff8ced3250>] ? check_preempt_curr+0x80/0xa0
[ 2014.148771] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.148784] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.148914] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.149049] [<ffffffffc1017eb0>] ? ll_file_splice_read+0x1e0/0x1e0 [lustre]
[ 2014.149185] [<ffffffffc1018440>] ? ll_file_aio_write+0x590/0x590 [lustre]
[ 2014.149318] [<ffffffff8d11e003>] ? ima_get_action+0x23/0x30
[ 2014.149447] [<ffffffff8d11d51e>] ? process_measurement+0x8e/0x250
[ 2014.149578] [<ffffffff8d03f087>] ? do_dentry_open+0x1e7/0x2e0
[ 2014.149708] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.149841] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.149975] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.150109] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.150243] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.150381] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.150489] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1211699028 [1211695104, 1212153856)
[ 2014.150491] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
[ 2014.150492] Pid: 19462, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[ 2014.150492] Call Trace:
[ 2014.150514] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2014.150519] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2014.150533] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.150551] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.150564] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.150571] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.150577] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.150580] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.150581] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.150583] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.150589] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.150594] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.150599] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.150603] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.150613] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.150621] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.150625] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.150627] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.150630] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.150634] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2014.152515] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1213796180 [1213792256, 1214251008)
[ 2014.152517] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
[ 2014.152518] Pid: 19480, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
[ 2014.152519] Call Trace:
[ 2014.152542] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2014.152548] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2014.152569] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
[ 2014.152593] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
[ 2014.152610] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
[ 2014.152620] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
[ 2014.152630] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
[ 2014.152632] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
[ 2014.152634] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
[ 2014.152635] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
[ 2014.152643] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
[ 2014.152649] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
[ 2014.152655] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
[ 2014.152661] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
[ 2014.152671] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
[ 2014.152679] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.152685] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.152687] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.152689] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.152693] [<ffffffffffffffff>] 0xffffffffffffffff
[ 2014.157437] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
[ 2014.157572] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
[ 2014.157704] [<ffffffffc0694090>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 2014.157835] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
[ 2014.157963] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
[ 2014.158094] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
[ 2014.158224] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
(END)
We have updated that client to lustre 2.12.2, but it did not help
Attachments
Issue Links
- is related to
-
LU-11825 Remove LU-8964/pio feature & supporting framework
- Resolved