Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12503

LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.10.6, Lustre 2.12.2
    • None
    • Server: PowerEdge R640 with 64 GB memory and Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
      OS: CentOS 7.5.1804
      Lustre client: 2.12.2
    • 3
    • 9223372036854775807

    Description

      We are running our lustre file system on 1 mds and 8 oss nodes. we are running lustre 2.10.6 on the lustre servers and clients.

      On one of the clients, we are exporting lustre via NFS3 and smb, it has been working fine for more than a year, but recently the client which is exporting lustre as NFS and smb start to crash due to a lustre bug as following:

       

      2014.148312] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1209601876 [1209597952, 1210056704)
      [ 2014.148338] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
      [ 2014.148352] Pid: 19435, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
      [ 2014.148353] Call Trace:
      [ 2014.148376] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 2014.148389] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 2014.148394] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.148419] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.148449] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.148462] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.148470] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.148476] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.148480] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.148482] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.148484] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.148492] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.148498] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.148504] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.148509] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.148523] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.148531] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.148535] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.148539] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.148543] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 2014.148551] Kernel panic - not syncing: LBUG
      [ 2014.148561] CPU: 2 PID: 19435 Comm: nfsd Kdump: loaded Tainted: G OE ------------ 3.10.0-957.21.3.el7.x86_64 #1
      [ 2014.148579] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.8 05/21/2018
      [ 2014.148592] Call Trace:
      [ 2014.148603] [<ffffffff8d563107>] dump_stack+0x19/0x1b
      [ 2014.148615] [<ffffffff8d55c810>] panic+0xe8/0x21f
      [ 2014.148629] [<ffffffffc0a0d8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [ 2014.148650] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.148675] [<ffffffffc0cb3357>] ? cl_lock_request+0x67/0x1f0 [obdclass]
      [ 2014.148699] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.148722] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.148739] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.148753] [<ffffffff8ced3250>] ? check_preempt_curr+0x80/0xa0
      [ 2014.148771] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.148784] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.148914] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.149049] [<ffffffffc1017eb0>] ? ll_file_splice_read+0x1e0/0x1e0 [lustre]
      [ 2014.149185] [<ffffffffc1018440>] ? ll_file_aio_write+0x590/0x590 [lustre]
      [ 2014.149318] [<ffffffff8d11e003>] ? ima_get_action+0x23/0x30
      [ 2014.149447] [<ffffffff8d11d51e>] ? process_measurement+0x8e/0x250
      [ 2014.149578] [<ffffffff8d03f087>] ? do_dentry_open+0x1e7/0x2e0
      [ 2014.149708] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.149841] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.149975] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.150109] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.150243] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.150381] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.150489] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1211699028 [1211695104, 1212153856)
      [ 2014.150491] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
      [ 2014.150492] Pid: 19462, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
      [ 2014.150492] Call Trace:
      [ 2014.150514] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 2014.150519] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 2014.150533] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.150551] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.150564] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.150571] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.150577] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.150580] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.150581] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.150583] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.150589] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.150594] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.150599] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.150603] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.150613] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.150621] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.150625] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.150627] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.150630] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.150634] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 2014.152515] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1213796180 [1213792256, 1214251008)
      [ 2014.152517] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
      [ 2014.152518] Pid: 19480, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
      [ 2014.152519] Call Trace:
      [ 2014.152542] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 2014.152548] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 2014.152569] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.152593] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.152610] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.152620] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.152630] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.152632] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.152634] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.152635] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.152643] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.152649] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.152655] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.152661] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.152671] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.152679] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.152685] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.152687] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.152689] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.152693] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 2014.157437] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.157572] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.157704] [<ffffffffc0694090>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [ 2014.157835] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.157963] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
      [ 2014.158094] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.158224] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
      (END)

       

      We have updated that client to lustre 2.12.2, but it did not help 

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              halifu Saerda Halifu (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: