[LU-7382] (vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs ) failed Created: 04/Nov/15 Updated: 08/Sep/17 |
|
| Status: | Reopened |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.8.0, Lustre 2.5.4 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Zhenyu Xu | Assignee: | Zhenyu Xu |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
kernel:LustreError: 29035:0:(vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs ) failed: tot_nrsegs: 1, nrsegs: 2 Message from syslogd@test1 at Nov 4 13:01:37 ... |
| Comments |
| Comment by Gerrit Updater [ 04/Nov/15 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/17039 |
| Comment by Ann Koehler (Inactive) [ 11/Dec/15 ] |
|
Cray is seeing this bug regularly during system testing using Lustre 2.7 on a relatively small machine (1 MDT, 8 OSTs, 16 clients). I've uploaded a couple of dumps in case they may be useful in resolving the bug. Location: |
| Comment by Zhenyu Xu [ 14/Dec/15 ] |
|
Thank you Ann for the crash dump. |
| Comment by Gerrit Updater [ 16/Dec/15 ] |
|
Andriy Skulysh (andriy.skulysh@seagate.com) uploaded a new patch: http://review.whamcloud.com/17632 |
| Comment by Patrick Farrell (Inactive) [ 23/Dec/15 ] |
|
I've uploaded a new set of four dumps with Andriy's patch to here: All four nodes which failed had debug enabled. I've included the extracted (and sorted) logs for one of them: |
| Comment by Gerrit Updater [ 14/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17632/ |
| Comment by Joseph Gmitter (Inactive) [ 14/Jan/16 ] |
|
Landed for 2.8.0 |
| Comment by James A Simmons [ 07/Jul/16 ] |
|
This problem still exist on b2_8_fe branch. We have hit this problem twice while running in production. 2016-07-01T20:09:21.362206-04:00 c11-3c1s2n3 LustreError: |
| Comment by Jinshan Xiong (Inactive) [ 07/Jul/16 ] |
|
I have a patch to clean up this piece of code, please check out the patch with commit 1101120d3258509fa74f952cd8664bfdc17bd97d in the master branch and it would be worth porting that patch over and see if it can fix the problem. |
| Comment by James A Simmons [ 07/Jul/16 ] |
|
I pushed a b2_8_fe patch of |
| Comment by Stanford Research Computing Center [ 24/Feb/17 ] |
|
Just as a "me too", we hit that same LBUG (trace below) with IEEL 3.0 # cat /proc/fs/lustre/version lustre: 2.7.15.3 kernel: patchless_client build: jenkins-arch=x86_64,build_type=client,distro=el6.7,ib_stack=inkernel-3843-ga11db72-PRISTINE-2.6.32-573.12.1.el6.x86_64 2017-02-24 08:29:10 [3183317.732931] LustreError: 50671:0:(vvp_io.c:573:vvp_io_update_iov()) ASSERTION( vio->vui_tot_nrsegs >= vio->vui_iter->nr_segs ) failed: tot_nrsegs: 1, nrsegs: 2 2017-02-24 08:29:10 [3183317.749876] LustreError: 50671:0:(vvp_io.c:573:vvp_io_update_iov()) LBUG 2017-02-24 08:29:10 [3183317.757841] Pid: 50671, comm: jellyfish 2017-02-24 08:29:11 [3183317.762581] 2017-02-24 08:29:11 [3183317.762581] Call Trace: 2017-02-24 08:29:11 [3183317.767941] [<ffffffffa030c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 2017-02-24 08:29:11 [3183317.776222] [<ffffffffa030ce97>] lbug_with_loc+0x47/0xb0 [libcfs] 2017-02-24 08:29:11 [3183317.783662] [<ffffffffa09e1d19>] vvp_io_rw_lock+0x6f9/0x790 [lustre] 2017-02-24 08:29:11 [3183317.791356] [<ffffffffa09e1de5>] vvp_io_write_lock+0x35/0x40 [lustre] 2017-02-24 08:29:11 [3183317.799184] [<ffffffffa0526893>] cl_io_lock+0x63/0x3c0 [obdclass] 2017-02-24 08:29:11 [3183317.806589] [<ffffffffa0526c92>] cl_io_loop+0xa2/0x1b0 [obdclass] 2017-02-24 08:29:11 [3183317.813985] [<ffffffffa097d470>] ll_file_io_generic+0x5d0/0xae0 [lustre] 2017-02-24 08:29:11 [3183317.822061] [<ffffffff8105e173>] ? __wake_up+0x53/0x70 2017-02-24 08:29:11 [3183317.828391] [<ffffffffa0987dbb>] ll_file_aio_write+0x21b/0x9d0 [lustre] 2017-02-24 08:29:11 [3183317.836376] [<ffffffffa0987ba0>] ? ll_file_aio_write+0x0/0x9d0 [lustre] 2017-02-24 08:29:11 [3183317.844351] [<ffffffff811917db>] do_sync_readv_writev+0xfb/0x140 2017-02-24 08:29:11 [3183317.851649] [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 2017-02-24 08:29:11 [3183317.859347] [<ffffffffa051ae0d>] ? cl_env_put+0x16d/0x200 [obdclass] 2017-02-24 08:29:11 [3183317.867019] [<ffffffff81231a56>] ? security_file_permission+0x16/0x20 2017-02-24 08:29:11 [3183317.874791] [<ffffffff81192886>] do_readv_writev+0xd6/0x1f0 2017-02-24 08:29:11 [3183317.881616] [<ffffffffa098a4d3>] ? ll_file_read+0x143/0x260 [lustre] 2017-02-24 08:29:11 [3183317.889288] [<ffffffff811929e6>] vfs_writev+0x46/0x60 2017-02-24 08:29:11 [3183317.895509] [<ffffffff81192b11>] sys_writev+0x51/0xd0 2017-02-24 08:29:11 [3183317.901747] [<ffffffff8153c64e>] ? do_device_not_available+0xe/0x10 2017-02-24 08:29:11 [3183317.909337] [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b 2017-02-24 08:29:11 [3183317.916521] 2017-02-24 08:29:11 [3183317.919054] Kernel panic - not syncing: LBUG 2017-02-24 08:29:11 [3183317.924280] Pid: 50671, comm: jellyfish Not tainted 2.6.32-573.12.1.el6.noc0w.x86_64 #1 2017-02-24 08:29:11 [3183317.933932] Call Trace: 2017-02-24 08:29:11 [3183317.937145] [<ffffffff81538271>] ? panic+0xa7/0x16f 2017-02-24 08:29:11 [3183317.943174] [<ffffffffa030ceeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 2017-02-24 08:29:11 [3183317.950774] [<ffffffffa09e1d19>] ? vvp_io_rw_lock+0x6f9/0x790 [lustre] 2017-02-24 08:29:11 [3183317.958640] [<ffffffffa09e1de5>] ? vvp_io_write_lock+0x35/0x40 [lustre] 2017-02-24 08:29:11 [3183317.966634] [<ffffffffa0526893>] ? cl_io_lock+0x63/0x3c0 [obdclass] 2017-02-24 08:29:11 [3183317.974209] [<ffffffffa0526c92>] ? cl_io_loop+0xa2/0x1b0 [obdclass] 2017-02-24 08:29:11 [3183317.981778] [<ffffffffa097d470>] ? ll_file_io_generic+0x5d0/0xae0 [lustre] 2017-02-24 08:29:11 [3183317.990012] [<ffffffff8105e173>] ? __wake_up+0x53/0x70 2017-02-24 08:29:11 [3183317.996318] [<ffffffffa0987dbb>] ? ll_file_aio_write+0x21b/0x9d0 [lustre] 2017-02-24 08:29:11 [3183318.004469] [<ffffffffa0987ba0>] ? ll_file_aio_write+0x0/0x9d0 [lustre] 2017-02-24 08:29:11 [3183318.012424] [<ffffffff811917db>] ? do_sync_readv_writev+0xfb/0x140 2017-02-24 08:29:11 [3183318.019891] [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 2017-02-24 08:29:11 [3183318.027561] [<ffffffffa051ae0d>] ? cl_env_put+0x16d/0x200 [obdclass] 2017-02-24 08:29:11 [3183318.035212] [<ffffffff81231a56>] ? security_file_permission+0x16/0x20 2017-02-24 08:29:11 [3183318.042963] [<ffffffff81192886>] ? do_readv_writev+0xd6/0x1f0 2017-02-24 08:29:11 [3183318.049950] [<ffffffffa098a4d3>] ? ll_file_read+0x143/0x260 [lustre] 2017-02-24 08:29:11 [3183318.057606] [<ffffffff811929e6>] ? vfs_writev+0x46/0x60 2017-02-24 08:29:11 [3183318.063997] [<ffffffff81192b11>] ? sys_writev+0x51/0xd0 2017-02-24 08:29:11 [3183318.070389] [<ffffffff8153c64e>] ? do_device_not_available+0xe/0x10 2017-02-24 08:29:11 [3183318.077955] [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b Cheers, |
| Comment by James A Simmons [ 28/Mar/17 ] |
|
The patch that landed to b2_8 seems to handle most of the cases but we recently have found one application on our systems that causes this problem at random times. Correction: It looks closed to the |