Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20261

->readpage LASSERT with ra_pages of 512 when using PCC-RO

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Issue description:
      customer hit the following LBUG on the lustre client when he use pcc :

      Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2176:ll_readpage()) sbi ra pages 0, file ra pages 512
      Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2184:ll_readpage()) bdi io_pages 0
      Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2190:ll_readpage()) ASSERTION( !ra_assert ) failed:
      Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2190:ll_readpage()) LBUG
      Apr 29 02:34:13 dgx-07 kernel: CPU: 120 PID: 663126 Comm: VLLM::Worker_TP Tainted: P W OE 5.15.0-1099-nvidia #100-Ubuntu
      Apr 29 02:34:13 dgx-07 kernel: Hardware name: NVIDIA DGXA100 920-23687-2531-001/DGXA100, BIOS 1.18 10/25/2022
      Apr 29 02:34:13 dgx-07 kernel: Call Trace:
      Apr 29 02:34:13 dgx-07 kernel: <TASK>
      Apr 29 02:34:13 dgx-07 kernel: show_stack+0x52/0x5c
      Apr 29 02:34:13 dgx-07 kernel: dump_stack_lvl+0x4a/0x63
      Apr 29 02:34:13 dgx-07 kernel: dump_stack+0x10/0x16
      Apr 29 02:34:13 dgx-07 kernel: lbug_with_loc.cold+0x5/0x43 [libcfs]
      Apr 29 02:34:13 dgx-07 kernel: ll_readpage+0xfb6/0xfc0 [lustre]
      Apr 29 02:34:13 dgx-07 kernel: filemap_read_page+0x38/0x100
      Apr 29 02:34:13 dgx-07 kernel: filemap_fault+0x9a9/0xab0
      Apr 29 02:34:13 dgx-07 kernel: ? from_kgid+0x12/0x20
      Apr 29 02:34:13 dgx-07 kernel: ? cl_object_attr_get+0x70/0x150 [obdclass]
      Apr 29 02:34:13 dgx-07 kernel: ? srso_return_thunk+0x5/0x10
      Apr 29 02:34:13 dgx-07 kernel: LustreError: 663127:0:(rw.c:2190:ll_readpage()) ASSERTION( !ra_assert ) failed:
      Apr 29 02:34:13 dgx-07 kernel: ? ll_inode_size_unlock+0x1d/0x30 [lustre]
      Apr 29 02:34:13 dgx-07 kernel: LustreError: 663127:0:(rw.c:2190:ll_readpage()) LBUG
      Apr 29 02:34:13 dgx-07 kernel: ll_filemap_fault+0x35/0x60 [lustre]
      Apr 29 02:34:13 dgx-07 kernel: vvp_io_fault_start+0x54c/0xe80 [lustre]
      Apr 29 02:34:13 dgx-07 kernel: ? cl_lock_enqueue+0x5e/0x120 [obdclass]
      Apr 29 02:34:13 dgx-07 kernel: ? srso_return_thunk+0x5/0x10
      Apr 29 02:34:13 dgx-07 kernel: ? cl_lock_request+0x69/0x1e0 [obdclass]
      Apr 29 02:34:13 dgx-07 kernel: ? vvp_io_read_start+0x8c0/0x8c0 [lustre]
      Apr 29 02:34:13 dgx-07 kernel: cl_io_start+0x87/0x170 [obdclass]
      Apr 29 02:34:13 dgx-07 kernel: cl_io_loop+0x9c/0x210 [obdclass]
      Apr 29 02:34:13 dgx-07 kernel: ll_fault+0x54b/0x9c0 [lustre]
      Apr 29 02:34:13 dgx-07 kernel: ? page_add_file_rmap+0xa6/0x150
      Apr 29 02:34:13 dgx-07 kernel: __do_fault+0x3c/0x120
      Apr 29 02:34:13 dgx-07 kernel: do_read_fault+0xeb/0x160
      Apr 29 02:34:13 dgx-07 kernel: do_fault+0xa0/0x2e0
      Apr 29 02:34:13 dgx-07 kernel: handle_pte_fault+0x1cd/0x240
      Apr 29 02:34:13 dgx-07 kernel: __handle_mm_fault+0x405/0x6f0
      Apr 29 02:34:13 dgx-07 kernel: handle_mm_fault+0xd8/0x2c0
      Apr 29 02:34:13 dgx-07 kernel: do_user_addr_fault+0x1c9/0x640
      Apr 29 02:34:13 dgx-07 kernel: exc_page_fault+0x77/0x170
      Apr 29 02:34:13 dgx-07 kernel: asm_exc_page_fault+0x27/0x30

      Issue does not occur when customer don't use PCC.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: