Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17070

sanity-flr test_200b: vvp_vmpage_error()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      There's a periodic crash in sanity-flr in master that looks like this:

       [26047.097521] Lustre: DEBUG MARKER: == sanity-flr test 200b: racing IO, mirror extend and resync ========================================================== 09:51:50 (1693475510)
      [26047.201126] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
      [26047.214264] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-23vm8@tcp:/lustre /mnt/lustre2
      [26047.252017] Lustre: Mounted lustre-client
      [26047.252911] Lustre: Skipped 1 previous similar message
      [26047.286711] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre3
      [26047.298657] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-23vm8@tcp:/lustre /mnt/lustre3
      [26053.066268] LustreError: 11-0: lustre-OST0003-osc-ffff9e711d5ac000: operation ost_fallocate to node 10.240.38.66@tcp failed: rc = -524
      [26098.433176] Lustre: *** cfs_fail_loc=1423, val=0***
      [26098.437521] LustreError: 329308:0:(vvp_page.c:119:vvp_vmpage_error()) LBUG
      [26098.438976] Pid: 329308, comm: ptlrpcd_00_01 4.18.0-477.15.1.el8_8.x86_64 #1 SMP Fri Jun 2 08:27:19 EDT 2023
      [26098.440832] Call Trace TBD:
      [26098.441616] [<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
      [26098.442707] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
      [26098.443675] [<0>] vvp_page_completion_write+0x2f7/0x400 [lustre]
      [26098.445022] [<0>] cl_page_completion+0x170/0x430 [obdclass]
      [26098.446365] [<0>] osc_ap_completion.isra.34+0x138/0x3e0 [osc]
      [26098.447567] [<0>] osc_extent_finish+0x203/0x9f0 [osc]
      [26098.448577] [<0>] brw_interpret+0x1c3/0xdb0 [osc]
      [26098.449527] [<0>] ptlrpc_check_set+0x53a/0x1e70 [ptlrpc]
      [26098.450855] [<0>] ptlrpcd+0x856/0xa70 [ptlrpc]
      [26098.451782] [<0>] kthread+0x134/0x150
      [26098.452577] [<0>] ret_from_fork+0x35/0x40
      [26098.453420] Kernel panic - not syncing: LBUG

      latest occurrence: https://testing.whamcloud.com/test_sets/1e9bca24-c969-4c92-a668-84678334a22a

      First occurrence in maloo on May 24: https://testing.whamcloud.com/test_sets/0c20e209-9122-4091-8244-e0c7b6ac8b2a

      First ever occurrence happened to be in janitor in special testing of this patch (where the test was added even: https://review.whamcloud.com/c/fs/lustre-release/+/46413

      The patch landed in April 2023 so it's likely the culprit here.

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: