Details
-
Bug
-
Resolution: Unresolved
-
Major
-
Lustre 2.7.0, Lustre 2.8.0, Lustre 2.5.4
-
None
-
3
-
9223372036854775807
Description
kernel:LustreError: 29035:0:(vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs ) failed: tot_nrsegs: 1, nrsegs: 2
Message from syslogd@test1 at Nov 4 13:01:37 ...
kernel:LustreError: 29035:0:(vvp_io.c:573:vvp_io_update_iov()) LBUG
Attachments
Issue Links
- is duplicated by
-
LU-9106 ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs )
-
- Resolved
-
- is related to
-
LU-6260 more support for 3.16 linux kernel
-
- Resolved
-
- is related to
-
LU-7067 vvp_io.c:1076:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed
-
- Resolved
-
-
LU-4257 parallel dds are slower than serial dds
-
- Resolved
-
- mentioned in
-
Page No Confluence page found with the given URL.
Activity
Just as a "me too", we hit that same LBUG (trace below) with IEEL 3.0
# cat /proc/fs/lustre/version lustre: 2.7.15.3 kernel: patchless_client build: jenkins-arch=x86_64,build_type=client,distro=el6.7,ib_stack=inkernel-3843-ga11db72-PRISTINE-2.6.32-573.12.1.el6.x86_64
2017-02-24 08:29:10 [3183317.732931] LustreError: 50671:0:(vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs ) failed: tot_nrsegs: 1, nrsegs: 2 2017-02-24 08:29:10 [3183317.749876] LustreError: 50671:0:(vvp_io.c:573:vvp_io_update_iov()) LBUG 2017-02-24 08:29:10 [3183317.757841] Pid: 50671, comm: jellyfish 2017-02-24 08:29:11 [3183317.762581] 2017-02-24 08:29:11 [3183317.762581] Call Trace: 2017-02-24 08:29:11 [3183317.767941] [<ffffffffa030c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2017-02-24 08:29:11 [3183317.776222] [<ffffffffa030ce97>] lbug_with_loc+0x47/0xb0 [libcfs] 2017-02-24 08:29:11 [3183317.783662] [<ffffffffa09e1d19>] vvp_io_rw_lock+0x6f9/0x790 [lustre] 2017-02-24 08:29:11 [3183317.791356] [<ffffffffa09e1de5>] vvp_io_write_lock+0x35/0x40 [lustre] 2017-02-24 08:29:11 [3183317.799184] [<ffffffffa0526893>] cl_io_lock+0x63/0x3c0 [obdclass] 2017-02-24 08:29:11 [3183317.806589] [<ffffffffa0526c92>] cl_io_loop+0xa2/0x1b0 [obdclass] 2017-02-24 08:29:11 [3183317.813985] [<ffffffffa097d470>] ll_file_io_generic+0x5d0/0xae0 [lustre] 2017-02-24 08:29:11 [3183317.822061] [<ffffffff8105e173>] ? __wake_up+0x53/0x70 2017-02-24 08:29:11 [3183317.828391] [<ffffffffa0987dbb>] ll_file_aio_write+0x21b/0x9d0 [lustre] 2017-02-24 08:29:11 [3183317.836376] [<ffffffffa0987ba0>] ? ll_file_aio_write+0x0/0x9d0 [lustre] 2017-02-24 08:29:11 [3183317.844351] [<ffffffff811917db>] do_sync_readv_writev+0xfb/0x140 2017-02-24 08:29:11 [3183317.851649] [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 2017-02-24 08:29:11 [3183317.859347] [<ffffffffa051ae0d>] ? cl_env_put+0x16d/0x200 [obdclass] 2017-02-24 08:29:11 [3183317.867019] [<ffffffff81231a56>] ? security_file_permission+0x16/0x20 2017-02-24 08:29:11 [3183317.874791] [<ffffffff81192886>] do_readv_writev+0xd6/0x1f0 2017-02-24 08:29:11 [3183317.881616] [<ffffffffa098a4d3>] ? ll_file_read+0x143/0x260 [lustre] 2017-02-24 08:29:11 [3183317.889288] [<ffffffff811929e6>] vfs_writev+0x46/0x60 2017-02-24 08:29:11 [3183317.895509] [<ffffffff81192b11>] sys_writev+0x51/0xd0 2017-02-24 08:29:11 [3183317.901747] [<ffffffff8153c64e>] ? do_device_not_available+0xe/0x10 2017-02-24 08:29:11 [3183317.909337] [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b 2017-02-24 08:29:11 [3183317.916521] 2017-02-24 08:29:11 [3183317.919054] Kernel panic - not syncing: LBUG 2017-02-24 08:29:11 [3183317.924280] Pid: 50671, comm: jellyfish Not tainted 2.6.32-573.12.1.el6.noc0w.x86_64 #1 2017-02-24 08:29:11 [3183317.933932] Call Trace: 2017-02-24 08:29:11 [3183317.937145] [<ffffffff81538271>] ? panic+0xa7/0x16f 2017-02-24 08:29:11 [3183317.943174] [<ffffffffa030ceeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2017-02-24 08:29:11 [3183317.950774] [<ffffffffa09e1d19>] ? vvp_io_rw_lock+0x6f9/0x790 [lustre] 2017-02-24 08:29:11 [3183317.958640] [<ffffffffa09e1de5>] ? vvp_io_write_lock+0x35/0x40 [lustre] 2017-02-24 08:29:11 [3183317.966634] [<ffffffffa0526893>] ? cl_io_lock+0x63/0x3c0 [obdclass] 2017-02-24 08:29:11 [3183317.974209] [<ffffffffa0526c92>] ? cl_io_loop+0xa2/0x1b0 [obdclass] 2017-02-24 08:29:11 [3183317.981778] [<ffffffffa097d470>] ? ll_file_io_generic+0x5d0/0xae0 [lustre] 2017-02-24 08:29:11 [3183317.990012] [<ffffffff8105e173>] ? __wake_up+0x53/0x70 2017-02-24 08:29:11 [3183317.996318] [<ffffffffa0987dbb>] ? ll_file_aio_write+0x21b/0x9d0 [lustre] 2017-02-24 08:29:11 [3183318.004469] [<ffffffffa0987ba0>] ? ll_file_aio_write+0x0/0x9d0 [lustre] 2017-02-24 08:29:11 [3183318.012424] [<ffffffff811917db>] ? do_sync_readv_writev+0xfb/0x140 2017-02-24 08:29:11 [3183318.019891] [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 2017-02-24 08:29:11 [3183318.027561] [<ffffffffa051ae0d>] ? cl_env_put+0x16d/0x200 [obdclass] 2017-02-24 08:29:11 [3183318.035212] [<ffffffff81231a56>] ? security_file_permission+0x16/0x20 2017-02-24 08:29:11 [3183318.042963] [<ffffffff81192886>] ? do_readv_writev+0xd6/0x1f0 2017-02-24 08:29:11 [3183318.049950] [<ffffffffa098a4d3>] ? ll_file_read+0x143/0x260 [lustre] 2017-02-24 08:29:11 [3183318.057606] [<ffffffff811929e6>] ? vfs_writev+0x46/0x60 2017-02-24 08:29:11 [3183318.063997] [<ffffffff81192b11>] ? sys_writev+0x51/0xd0 2017-02-24 08:29:11 [3183318.070389] [<ffffffff8153c64e>] ? do_device_not_available+0xe/0x10 2017-02-24 08:29:11 [3183318.077955] [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Cheers,
--
Kilian
I pushed a b2_8_fe patch of LU-4257 at http://review.whamcloud.com/#/c/21198. Do you have a good reproducer? We only see it once in awhile and haven't figured out what is causing the LBUG and if it can be done in a consistent way.
I have a patch to clean up this piece of code, please check out the patch with commit 1101120d3258509fa74f952cd8664bfdc17bd97d in the master branch and it would be worth porting that patch over and see if it can fix the problem.
This problem still exist on b2_8_fe branch. We have hit this problem twice while running in production.
2016-07-01T20:09:21.362206-04:00 c11-3c1s2n3 LustreError:
32026:0:(vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs
>= vio->vui_iter->nr_segs ) failed: tot_nrsegs: 1,
nrsegs: 2
2016-07-01T20:09:21.362293-04:00 c11-3c1s2n3 LustreError:
32026:0:(vvp_io.c:573:vvp_io_update_iov()) LBUG
2016-07-01T20:09:21.362336-04:00 c11-3c1s2n3 Pid: 32026, comm:
bowtie2-build-s
2016-07-01T20:09:21.362344-04:00 c11-3c1s2n3 Call Trace:
2016-07-01T20:09:21.362359-04:00 c11-3c1s2n3 [<ffffffff81006651>]
try_stack_unwind+0x161/0x1a0
2016-07-01T20:09:21.362367-04:00 c11-3c1s2n3 [<ffffffff81004eb9>]
dump_trace+0x89/0x430
2016-07-01T20:09:21.391856-04:00 c11-3c1s2n3 [<ffffffffa011aac0>]
lbug_with_loc+0x90/0x1d0 [libcfs]
2016-07-01T20:09:21.391877-04:00 c11-3c1s2n3 [<ffffffffa06cb3f8>]
vvp_io_rw_lock+0x738/0x860 [lustre]
2016-07-01T20:09:21.391893-04:00 c11-3c1s2n3 [<ffffffffa06cb556>]
vvp_io_write_lock+0x36/0x40 [lustre]
2016-07-01T20:09:21.391902-04:00 c11-3c1s2n3 [<ffffffffa030a514>]
cl_io_lock+0x74/0x400 [obdclass]
2016-07-01T20:09:21.422510-04:00 c11-3c1s2n3 [<ffffffffa030be27>]
cl_io_loop+0x2b7/0x710 [obdclass]
2016-07-01T20:09:21.422529-04:00 c11-3c1s2n3 [<ffffffffa0675574>]
ll_file_io_generic+0x364/0xab0 [lustre]
2016-07-01T20:09:21.422544-04:00 c11-3c1s2n3 [<ffffffffa0676290>]
ll_file_aio_write+0x5d0/0x6a0 [lustre]
2016-07-01T20:09:21.422580-04:00 c11-3c1s2n3 [<ffffffff8114097b>]
do_sync_readv_writev+0xdb/0x120
2016-07-01T20:09:21.422594-04:00 c11-3c1s2n3 [<ffffffff81141854>]
do_readv_writev+0xd4/0x1e0
2016-07-01T20:09:21.422601-04:00 c11-3c1s2n3 [<ffffffff8114199e>]
vfs_writev+0x3e/0x60
2016-07-01T20:09:21.422608-04:00 c11-3c1s2n3 [<ffffffff81141ae5>]
sys_writev+0x55/0xc0
2016-07-01T20:09:21.503878-04:00 c11-3c1s2n3 [<ffffffff8133ac2b>]
system_call_fastpath+0x16/0x1b
2016-07-01T20:09:21.503911-04:00 c11-3c1s2n3 [<00002aaaab7581be>]
0x2aaaab7581be
2016-07-01T20:09:21.503920-04:00 c11-3c1s2n3 Kernel panic - not syncing:
LBUG
2016-07-01T20:09:21.503944-04:00 c11-3c1s2n3 Pid: 32026, comm:
bowtie2-build-s Tainted: P
3.0.101-0.46.1_1.0502.8871-cray_gem_c #1
2016-07-01T20:09:21.503961-04:00 c11-3c1s2n3 Call Trace:
2016-07-01T20:09:21.503970-04:00 c11-3c1s2n3 [<ffffffff81006651>]
try_stack_unwind+0x161/0x1a0
2016-07-01T20:09:21.503977-04:00 c11-3c1s2n3 [<ffffffff81004eb9>]
dump_trace+0x89/0x430
2016-07-01T20:09:21.503990-04:00 c11-3c1s2n3 [<ffffffff810060bc>]
show_trace_log_lvl+0x5c/0x80
2016-07-01T20:09:21.503996-04:00 c11-3c1s2n3 [<ffffffff810060f5>]
show_trace+0x15/0x20
2016-07-01T20:09:21.504011-04:00 c11-3c1s2n3 [<ffffffff81336d32>]
dump_stack+0x79/0x84
2016-07-01T20:09:21.504018-04:00 c11-3c1s2n3 [<ffffffff81336dd1>]
panic+0x94/0x1da
2016-07-01T20:09:21.504051-04:00 c11-3c1s2n3 [<ffffffffa011abf1>]
lbug_with_loc+0x1c1/0x1d0 [libcfs]
2016-07-01T20:09:21.504070-04:00 c11-3c1s2n3 [<ffffffffa06cb3f8>]
vvp_io_rw_lock+0x738/0x860 [lustre]
2016-07-01T20:09:21.504079-04:00 c11-3c1s2n3 [<ffffffffa06cb556>]
vvp_io_write_lock+0x36/0x40 [lustre]
2016-07-01T20:09:21.504121-04:00 c11-3c1s2n3 [<ffffffffa030a514>]
cl_io_lock+0x74/0x400 [obdclass]
2016-07-01T20:09:21.504139-04:00 c11-3c1s2n3 [<ffffffffa030be27>]
cl_io_loop+0x2b7/0x710 [obdclass]
2016-07-01T20:09:21.504157-04:00 c11-3c1s2n3 [<ffffffffa0675574>]
ll_file_io_generic+0x364/0xab0 [lustre]
2016-07-01T20:09:21.504174-04:00 c11-3c1s2n3 [<ffffffffa0676290>]
ll_file_aio_write+0x5d0/0x6a0 [lustre]
2016-07-01T20:09:21.504185-04:00 c11-3c1s2n3 [<ffffffff8114097b>]
do_sync_readv_writev+0xdb/0x120
2016-07-01T20:09:21.532355-04:00 c11-3c1s2n3 [<ffffffff81141854>]
do_readv_writev+0xd4/0x1e0
2016-07-01T20:09:21.532377-04:00 c11-3c1s2n3 [<ffffffff8114199e>]
vfs_writev+0x3e/0x60
2016-07-01T20:09:21.532387-04:00 c11-3c1s2n3 [<ffffffff81141ae5>]
sys_writev+0x55/0xc0
2016-07-01T20:09:21.532405-04:00 c11-3c1s2n3 [<ffffffff8133ac2b>]
system_call_fastpath+0x16/0x1b
2016-07-01T20:09:21.532417-04:00 c11-3c1s2n3 [<00002aaaab7581be>]
0x2aaaab7581bd
2016-07-01T20:09:21.662130-04:00 c6-0c2s1n2 LustreError:
24943:0:(vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs
>= vio->vui_iter->nr_segs ) failed: tot_nrsegs: 1,
nrsegs: 2
2016-07-01T20:09:21.662163-04:00 c6-0c2s1n2 LustreError:
24943:0:(vvp_io.c:573:vvp_io_update_iov()) LBUG
2016-07-01T20:09:21.662205-04:00 c6-0c2s1n2 Pid: 24943, comm:
bowtie2-build-s
2016-07-01T20:09:21.662214-04:00 c6-0c2s1n2 Call Trace:
2016-07-01T20:09:21.662222-04:00 c6-0c2s1n2 [<ffffffff81006651>]
try_stack_unwind+0x161/0x1a0
2016-07-01T20:09:21.662229-04:00 c6-0c2s1n2 [<ffffffff81004eb9>]
dump_trace+0x89/0x430
2016-07-01T20:09:21.662241-04:00 c6-0c2s1n2 [<ffffffffa011aac0>]
lbug_with_loc+0x90/0x1d0 [libcfs]
2016-07-01T20:09:21.692133-04:00 c6-0c2s1n2 [<ffffffffa06cb3f8>]
vvp_io_rw_lock+0x738/0x860 [lustre]
2016-07-01T20:09:21.692165-04:00 c6-0c2s1n2 [<ffffffffa06cb556>]
vvp_io_write_lock+0x36/0x40 [lustre]
2016-07-01T20:09:21.692204-04:00 c6-0c2s1n2 [<ffffffffa030a514>]
cl_io_lock+0x74/0x400 [obdclass]
2016-07-01T20:09:21.692212-04:00 c6-0c2s1n2 [<ffffffffa030be27>]
cl_io_loop+0x2b7/0x710 [obdclass]
2016-07-01T20:09:21.692236-04:00 c6-0c2s1n2 [<ffffffffa0675574>]
ll_file_io_generic+0x364/0xab0 [lustre]
2016-07-01T20:09:21.692257-04:00 c6-0c2s1n2 [<ffffffffa0676290>]
ll_file_aio_write+0x5d0/0x6a0 [lustre]
2016-07-01T20:09:21.742635-04:00 c6-0c2s1n2 [<ffffffff8114097b>]
do_sync_readv_writev+0xdb/0x120
2016-07-01T20:09:21.742668-04:00 c6-0c2s1n2 [<ffffffff81141854>]
do_readv_writev+0xd4/0x1e0
2016-07-01T20:09:21.742678-04:00 c6-0c2s1n2 [<ffffffff8114199e>]
vfs_writev+0x3e/0x60
2016-07-01T20:09:21.742731-04:00 c6-0c2s1n2 [<ffffffff81141ae5>]
sys_writev+0x55/0xc0
2016-07-01T20:09:21.742745-04:00 c6-0c2s1n2 [<ffffffff8133ac2b>]
system_call_fastpath+0x16/0x1b
2016-07-01T20:09:21.742754-04:00 c6-0c2s1n2 [<00002aaaab7581be>]
0x2aaaab7581be
2016-07-01T20:09:21.742800-04:00 c6-0c2s1n2 Kernel panic - not syncing:
LBUG
2016-07-01T20:09:21.742827-04:00 c6-0c2s1n2 Pid: 24943, comm:
bowtie2-build-s Tainted: P
3.0.101-0.46.1_1.0502.8871-cray_gem_c #1
2016-07-01T20:09:21.742849-04:00 c6-0c2s1n2 Call Trace:
2016-07-01T20:09:21.742904-04:00 c6-0c2s1n2 [<ffffffff81006651>]
try_stack_unwind+0x161/0x1a0
2016-07-01T20:09:21.742940-04:00 c6-0c2s1n2 [<ffffffff81004eb9>]
dump_trace+0x89/0x430
2016-07-01T20:09:21.742948-04:00 c6-0c2s1n2 [<ffffffff810060bc>]
show_trace_log_lvl+0x5c/0x80
2016-07-01T20:09:21.772093-04:00 c6-0c2s1n2 [<ffffffff810060f5>]
show_trace+0x15/0x20
2016-07-01T20:09:21.772123-04:00 c6-0c2s1n2 [<ffffffff81336d32>]
dump_stack+0x79/0x84
2016-07-01T20:09:21.772132-04:00 c6-0c2s1n2 [<ffffffff81336dd1>]
panic+0x94/0x1da
2016-07-01T20:09:21.772205-04:00 c6-0c2s1n2 [<ffffffffa011abf1>]
lbug_with_loc+0x1c1/0x1d0 [libcfs]
2016-07-01T20:09:21.772224-04:00 c6-0c2s1n2 [<ffffffffa06cb3f8>]
vvp_io_rw_lock+0x738/0x860 [lustre]
2016-07-01T20:09:21.772241-04:00 c6-0c2s1n2 [<ffffffffa06cb556>]
vvp_io_write_lock+0x36/0x40 [lustre]
2016-07-01T20:09:21.772256-04:00 c6-0c2s1n2 [<ffffffffa030a514>]
cl_io_lock+0x74/0x400 [obdclass]
2016-07-01T20:09:21.802196-04:00 c6-0c2s1n2 [<ffffffffa030be27>]
cl_io_loop+0x2b7/0x710 [obdclass]
2016-07-01T20:09:21.802260-04:00 c6-0c2s1n2 [<ffffffffa0675574>]
ll_file_io_generic+0x364/0xab0 [lustre]
2016-07-01T20:09:21.802270-04:00 c6-0c2s1n2 [<ffffffffa0676290>]
ll_file_aio_write+0x5d0/0x6a0 [lustre]
2016-07-01T20:09:21.802279-04:00 c6-0c2s1n2 [<ffffffff8114097b>]
do_sync_readv_writev+0xdb/0x120
2016-07-01T20:09:21.802299-04:00 c6-0c2s1n2 [<ffffffff81141854>]
do_readv_writev+0xd4/0x1e0
2016-07-01T20:09:21.802317-04:00 c6-0c2s1n2 [<ffffffff8114199e>]
vfs_writev+0x3e/0x60
2016-07-01T20:09:21.832209-04:00 c6-0c2s1n2 [<ffffffff81141ae5>]
sys_writev+0x55/0xc0
2016-07-01T20:09:21.832229-04:00 c6-0c2s1n2 [<ffffffff8133ac2b>]
system_call_fastpath+0x16/0x1b
2016-07-01T20:09:21.832259-04:00 c6-0c2s1n2 [<00002aaaab7581be>]
0x2aaaab7581bd
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17632/
Subject: LU-7382 llite: Fix iovec references accounting in ll_file_aio_read/write
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 57f055f8d0df80e140724b00d1729f454222a83a
I've uploaded a new set of four dumps with Andriy's patch to here:
ftp.whamcloud.com:/uploads/LU-7382/151223_dumps.tar.gz
All four nodes which failed had debug enabled. I've included the extracted (and sorted) logs for one of them:
c0-0c1s0n1-1512222203_log.sort
Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/17632
Subject: LU-7382 llite: vvp_io_update_iov() ASSERTION failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8431fec8ba5434b76aa994d93d5fa44b850be689
The patch that landed to b2_8 seems to handle most of the cases but we recently have found one application on our systems that causes this problem at random times.
Correction: It looks closed to the
LU-7067