Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12503

LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.10.6, Lustre 2.12.2
    • None
    • Server: PowerEdge R640 with 64 GB memory and Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
      OS: CentOS 7.5.1804
      Lustre client: 2.12.2
    • 3
    • 9223372036854775807

    Description

      We are running our lustre file system on 1 mds and 8 oss nodes. we are running lustre 2.10.6 on the lustre servers and clients.

      On one of the clients, we are exporting lustre via NFS3 and smb, it has been working fine for more than a year, but recently the client which is exporting lustre as NFS and smb start to crash due to a lustre bug as following:

       

      2014.148312] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1209601876 [1209597952, 1210056704)
      [ 2014.148338] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
      [ 2014.148352] Pid: 19435, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
      [ 2014.148353] Call Trace:
      [ 2014.148376] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 2014.148389] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 2014.148394] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.148419] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.148449] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.148462] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.148470] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.148476] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.148480] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.148482] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.148484] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.148492] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.148498] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.148504] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.148509] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.148523] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.148531] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.148535] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.148539] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.148543] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 2014.148551] Kernel panic - not syncing: LBUG
      [ 2014.148561] CPU: 2 PID: 19435 Comm: nfsd Kdump: loaded Tainted: G OE ------------ 3.10.0-957.21.3.el7.x86_64 #1
      [ 2014.148579] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 1.4.8 05/21/2018
      [ 2014.148592] Call Trace:
      [ 2014.148603] [<ffffffff8d563107>] dump_stack+0x19/0x1b
      [ 2014.148615] [<ffffffff8d55c810>] panic+0xe8/0x21f
      [ 2014.148629] [<ffffffffc0a0d8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [ 2014.148650] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.148675] [<ffffffffc0cb3357>] ? cl_lock_request+0x67/0x1f0 [obdclass]
      [ 2014.148699] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.148722] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.148739] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.148753] [<ffffffff8ced3250>] ? check_preempt_curr+0x80/0xa0
      [ 2014.148771] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.148784] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.148914] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.149049] [<ffffffffc1017eb0>] ? ll_file_splice_read+0x1e0/0x1e0 [lustre]
      [ 2014.149185] [<ffffffffc1018440>] ? ll_file_aio_write+0x590/0x590 [lustre]
      [ 2014.149318] [<ffffffff8d11e003>] ? ima_get_action+0x23/0x30
      [ 2014.149447] [<ffffffff8d11d51e>] ? process_measurement+0x8e/0x250
      [ 2014.149578] [<ffffffff8d03f087>] ? do_dentry_open+0x1e7/0x2e0
      [ 2014.149708] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.149841] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.149975] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.150109] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.150243] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.150381] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.150489] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1211699028 [1211695104, 1212153856)
      [ 2014.150491] LustreError: 19462:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
      [ 2014.150492] Pid: 19462, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
      [ 2014.150492] Call Trace:
      [ 2014.150514] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 2014.150519] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 2014.150533] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.150551] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.150564] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.150571] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.150577] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.150580] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.150581] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.150583] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.150589] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.150594] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.150599] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.150603] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.150613] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.150621] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.150625] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.150627] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.150630] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.150634] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 2014.152515] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: ki_pos 1213796180 [1213792256, 1214251008)
      [ 2014.152517] LustreError: 19480:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG
      [ 2014.152518] Pid: 19480, comm: nfsd 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019
      [ 2014.152519] Call Trace:
      [ 2014.152542] [<ffffffffc0a0d7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 2014.152548] [<ffffffffc0a0d87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 2014.152569] [<ffffffffc1061270>] vvp_io_write_start+0x790/0x820 [lustre]
      [ 2014.152593] [<ffffffffc0cb5328>] cl_io_start+0x68/0x130 [obdclass]
      [ 2014.152610] [<ffffffffc0cb74fc>] cl_io_loop+0xcc/0x1c0 [obdclass]
      [ 2014.152620] [<ffffffffc101765b>] ll_file_io_generic+0x63b/0xcb0 [lustre]
      [ 2014.152630] [<ffffffffc10182f2>] ll_file_aio_write+0x442/0x590 [lustre]
      [ 2014.152632] [<ffffffff8d040e6b>] do_sync_readv_writev+0x7b/0xd0
      [ 2014.152634] [<ffffffff8d042aae>] do_readv_writev+0xce/0x260
      [ 2014.152635] [<ffffffff8d042cd5>] vfs_writev+0x35/0x60
      [ 2014.152643] [<ffffffffc0699f90>] nfsd_vfs_write+0xc0/0x3a0 [nfsd]
      [ 2014.152649] [<ffffffffc069c962>] nfsd_write+0x112/0x2a0 [nfsd]
      [ 2014.152655] [<ffffffffc06a3070>] nfsd3_proc_write+0xc0/0x160 [nfsd]
      [ 2014.152661] [<ffffffffc0694810>] nfsd_dispatch+0xe0/0x290 [nfsd]
      [ 2014.152671] [<ffffffffc0610cf3>] svc_process_common+0x493/0x760 [sunrpc]
      [ 2014.152679] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.152685] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.152687] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.152689] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.152693] [<ffffffffffffffff>] 0xffffffffffffffff
      [ 2014.157437] [<ffffffffc06110c3>] svc_process+0x103/0x190 [sunrpc]
      [ 2014.157572] [<ffffffffc069416f>] nfsd+0xdf/0x150 [nfsd]
      [ 2014.157704] [<ffffffffc0694090>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [ 2014.157835] [<ffffffff8cec1da1>] kthread+0xd1/0xe0
      [ 2014.157963] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
      [ 2014.158094] [<ffffffff8d575c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [ 2014.158224] [<ffffffff8cec1cd0>] ? insert_kthread_work+0x40/0x40
      (END)

       

      We have updated that client to lustre 2.12.2, but it did not help 

      Attachments

        Issue Links

          Activity

            [LU-12503] LustreError: 19435:0:(vvp_io.c:1056:vvp_io_write_start()) LBUG

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/37921
            Subject: LU-12503 llite: file write pos mimatch
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: d5a087e31b1e1cf6812640d476faae8774ba0d66

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/37921 Subject: LU-12503 llite: file write pos mimatch Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: d5a087e31b1e1cf6812640d476faae8774ba0d66

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37035/
            Subject: LU-12503 vvp_dev: increment *pos in .next
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 589ba9b62c0e8b3a93145dd44bbbd92a26d6da8b

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37035/ Subject: LU-12503 vvp_dev: increment *pos in .next Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 589ba9b62c0e8b3a93145dd44bbbd92a26d6da8b

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37034/
            Subject: LU-12503 llite: file write pos mimatch
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 322cd140132e821c63b41b7da9ddb9f519b52194

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37034/ Subject: LU-12503 llite: file write pos mimatch Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 322cd140132e821c63b41b7da9ddb9f519b52194

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37035
            Subject: LU-12503 vvp_dev: increment *pos in .next
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: cb2f58387f30271074dac0fcf021b5db157022e2

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37035 Subject: LU-12503 vvp_dev: increment *pos in .next Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: cb2f58387f30271074dac0fcf021b5db157022e2

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37034
            Subject: LU-12503 llite: file write pos mimatch
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 7e79fc11ed73b291b8e7a4805b3f1144d71ff83f

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37034 Subject: LU-12503 llite: file write pos mimatch Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 7e79fc11ed73b291b8e7a4805b3f1144d71ff83f
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36021/
            Subject: LU-12503 llite: file write pos mimatch
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1d2aa1513dc4e65813ad0bea138966a55244dbde

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36021/ Subject: LU-12503 llite: file write pos mimatch Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1d2aa1513dc4e65813ad0bea138966a55244dbde
            bobijam Zhenyu Xu added a comment -

            yes, I hope the fix patch can handle the issue, and add the debug patch to catch info if that's not the right fix for this issue.

            bobijam Zhenyu Xu added a comment - yes, I hope the fix patch can handle the issue, and add the debug patch to catch info if that's not the right fix for this issue.
            pjones Peter Jones added a comment -

            bobijam

            It looks like you have turned your original debug patch into a fix and now have added a new debug patch. Are you hoping for halifu to use both of these?

            Peter

            pjones Peter Jones added a comment - bobijam It looks like you have turned your original debug patch into a fix and now have added a new debug patch. Are you hoping for halifu to use both of these? Peter

            People

              bobijam Zhenyu Xu
              halifu Saerda Halifu (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: