Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
Issue description:
customer hit the following LBUG on the lustre client when he use pcc :
Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2176:ll_readpage()) sbi ra pages 0, file ra pages 512 Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2184:ll_readpage()) bdi io_pages 0 Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2190:ll_readpage()) ASSERTION( !ra_assert ) failed: Apr 29 02:34:13 dgx-07 kernel: LustreError: 663126:0:(rw.c:2190:ll_readpage()) LBUG Apr 29 02:34:13 dgx-07 kernel: CPU: 120 PID: 663126 Comm: VLLM::Worker_TP Tainted: P W OE 5.15.0-1099-nvidia #100-Ubuntu Apr 29 02:34:13 dgx-07 kernel: Hardware name: NVIDIA DGXA100 920-23687-2531-001/DGXA100, BIOS 1.18 10/25/2022 Apr 29 02:34:13 dgx-07 kernel: Call Trace: Apr 29 02:34:13 dgx-07 kernel: <TASK> Apr 29 02:34:13 dgx-07 kernel: show_stack+0x52/0x5c Apr 29 02:34:13 dgx-07 kernel: dump_stack_lvl+0x4a/0x63 Apr 29 02:34:13 dgx-07 kernel: dump_stack+0x10/0x16 Apr 29 02:34:13 dgx-07 kernel: lbug_with_loc.cold+0x5/0x43 [libcfs] Apr 29 02:34:13 dgx-07 kernel: ll_readpage+0xfb6/0xfc0 [lustre] Apr 29 02:34:13 dgx-07 kernel: filemap_read_page+0x38/0x100 Apr 29 02:34:13 dgx-07 kernel: filemap_fault+0x9a9/0xab0 Apr 29 02:34:13 dgx-07 kernel: ? from_kgid+0x12/0x20 Apr 29 02:34:13 dgx-07 kernel: ? cl_object_attr_get+0x70/0x150 [obdclass] Apr 29 02:34:13 dgx-07 kernel: ? srso_return_thunk+0x5/0x10 Apr 29 02:34:13 dgx-07 kernel: LustreError: 663127:0:(rw.c:2190:ll_readpage()) ASSERTION( !ra_assert ) failed: Apr 29 02:34:13 dgx-07 kernel: ? ll_inode_size_unlock+0x1d/0x30 [lustre] Apr 29 02:34:13 dgx-07 kernel: LustreError: 663127:0:(rw.c:2190:ll_readpage()) LBUG Apr 29 02:34:13 dgx-07 kernel: ll_filemap_fault+0x35/0x60 [lustre] Apr 29 02:34:13 dgx-07 kernel: vvp_io_fault_start+0x54c/0xe80 [lustre] Apr 29 02:34:13 dgx-07 kernel: ? cl_lock_enqueue+0x5e/0x120 [obdclass] Apr 29 02:34:13 dgx-07 kernel: ? srso_return_thunk+0x5/0x10 Apr 29 02:34:13 dgx-07 kernel: ? cl_lock_request+0x69/0x1e0 [obdclass] Apr 29 02:34:13 dgx-07 kernel: ? vvp_io_read_start+0x8c0/0x8c0 [lustre] Apr 29 02:34:13 dgx-07 kernel: cl_io_start+0x87/0x170 [obdclass] Apr 29 02:34:13 dgx-07 kernel: cl_io_loop+0x9c/0x210 [obdclass] Apr 29 02:34:13 dgx-07 kernel: ll_fault+0x54b/0x9c0 [lustre] Apr 29 02:34:13 dgx-07 kernel: ? page_add_file_rmap+0xa6/0x150 Apr 29 02:34:13 dgx-07 kernel: __do_fault+0x3c/0x120 Apr 29 02:34:13 dgx-07 kernel: do_read_fault+0xeb/0x160 Apr 29 02:34:13 dgx-07 kernel: do_fault+0xa0/0x2e0 Apr 29 02:34:13 dgx-07 kernel: handle_pte_fault+0x1cd/0x240 Apr 29 02:34:13 dgx-07 kernel: __handle_mm_fault+0x405/0x6f0 Apr 29 02:34:13 dgx-07 kernel: handle_mm_fault+0xd8/0x2c0 Apr 29 02:34:13 dgx-07 kernel: do_user_addr_fault+0x1c9/0x640 Apr 29 02:34:13 dgx-07 kernel: exc_page_fault+0x77/0x170 Apr 29 02:34:13 dgx-07 kernel: asm_exc_page_fault+0x27/0x30
Issue does not occur when customer don't use PCC.
Attachments
Issue Links
- is related to
-
DDN-6812 Loading...