Details
-
Bug
-
Resolution: Not a Bug
-
Minor
-
None
-
Lustre 2.12.8
-
None
-
Ubuntu 20.04 and 22.04
Kernel 5.13 and 5.15
-
3
-
9223372036854775807
Description
A Lustre client running kdb+ application, accessing a unique Lustre filesystem, crashed when running a read mostly workload with the following details:
LustreError: 2189:0:(osc_object.c:415:osc_req_attr_set()) page@000000006ff324cc[4 000000006e5cd158 3 1 0000000000000000] LustreError: 2189:0:(osc_object.c:415:osc_req_attr_set()) vvp-page@000000005896944a(0:0) vm@00000000f05ece12 17ffffc0002001 4:0 ffff9052622abe00 262400 lru LustreError: 2189:0:(osc_object.c:415:osc_req_attr_set()) lov-page@0000000040ca418a, comp index: 10000, gen: 3 LustreError: 2189:0:(osc_object.c:415:osc_req_attr_set()) osc-page@0000000090b283c9 52480: 1< 0x845fed 1 0 + + > 2< 214958080 0 4096 0x7 0x9 | 0000000000000000 00000000c2efe449 0000000047d0bab5 > 3< 1 18 0 > 4< 0 0 8 1879157331 - | - - - + > 5< - - - + | 0 - | 0 - -> LustreError: 2189:0:(osc_object.c:415:osc_req_attr_set()) end page@000000006ff324cc LustreError: 2189:0:(osc_object.c:415:osc_req_attr_set()) uncovered page! LustreError: 2189:0:(ldlm_resource.c:1764:ldlm_resource_dump()) --- Resource: [0x1cedf:0x0:0x0].0x0 (00000000de919111) refcount = 2 LustreError: 2189:0:(ldlm_resource.c:1768:ldlm_resource_dump()) Granted locks (in reverse order): LustreError: 2189:0:(ldlm_resource.c:1771:ldlm_resource_dump()) ### ### ns: kneitbmv-OST000b-osc-ffff905252cb9000 lock: 0000000040a3d2a7/0x6a51fb9756d1bd42 lrc: 3/1,0 mode: PR/PR res: [0x1cedf:0x0:0x0].0x0 rrc: 3 type: EXT [0->18446744073709551615] (req 214958080->216006655) flags: 0x800020000000000 nid: local remote: 0x9ca45c0e329718ca expref: -99 pid: 2639 timeout: 0 lvb_type: 1
The debugging logs tell that the page is not protected by an LDLM lock, but the LDLM resource output show thats a PR EXTENT LOCK is available for the whole object range.
crash> bt 2189 PID: 2189 TASK: ffff9052491a5e80 CPU: 9 COMMAND: "ptlrpcd_03_00" #0 [ffffbb01027b3900] machine_kexec at ffffffffae68c2a0 #1 [ffffbb01027b3960] __crash_kexec at ffffffffae798f72 #2 [ffffbb01027b3a30] panic at ffffffffaf303b8e #3 [ffffbb01027b3ac8] osc_req_attr_set at ffffffffc0ca5251 [osc] #4 [ffffbb01027b3b08] cl_req_attr_set at ffffffffc0a85670 [obdclass] #5 [ffffbb01027b3b38] osc_build_rpc at ffffffffc0c9f666 [osc] #6 [ffffbb01027b3bf8] osc_extent_finish at ffffffffc0cbf606 [osc] #7 [ffffbb01027b3cf0] brw_queue_work at ffffffffc0c94264 [osc] #8 [ffffbb01027b3d10] work_interpreter at ffffffffc0dbc9a3 [ptlrpc] #9 [ffffbb01027b3d30] ptlrpc_check_set at ffffffffc0dc4bd2 [ptlrpc] #10 [ffffbb01027b3dc8] ptlrpcd_check at ffffffffc0df0d79 [ptlrpc] #11 [ffffbb01027b3e20] ptlrpcd at ffffffffc0df1137 [ptlrpc] #12 [ffffbb01027b3f10] kthread at ffffffffae6e99b7 #13 [ffffbb01027b3f50] ret_from_fork at ffffffffae604cbf
LDLM lock says the lock was taken by PID 2639
crash> bt 2639 PID: 2639 TASK: ffff9052562bde80 CPU: 7 COMMAND: "q" #0 [ffffbb012304b808] __schedule at ffffffffaf3b500d #1 [ffffbb012304b890] schedule at ffffffffaf3b53be #2 [ffffbb012304b8b0] io_schedule at ffffffffaf3b54a6 #3 [ffffbb012304b8c8] wait_on_page_bit_common at ffffffffae8af4fc #4 [ffffbb012304b968] filemap_update_page at ffffffffae8b2382 #5 [ffffbb012304b9a8] filemap_get_pages at ffffffffae8b2612 #6 [ffffbb012304ba48] filemap_read at ffffffffae8b284c #7 [ffffbb012304bb50] generic_file_read_iter at ffffffffae8b2c55 #8 [ffffbb012304bb98] vvp_io_read_start at ffffffffc10cdd04 [lustre] #9 [ffffbb012304bc10] cl_io_start at ffffffffc0a85528 [obdclass] #10 [ffffbb012304bc48] cl_io_loop at ffffffffc0a88a4f [obdclass] #11 [ffffbb012304bc80] ll_file_io_generic at ffffffffc10882e1 [lustre] #12 [ffffbb012304bd80] ll_file_read_iter at ffffffffc1089101 [lustre] #13 [ffffbb012304bdc8] new_sync_read at ffffffffae989aba #14 [ffffbb012304be58] vfs_read at ffffffffae98a453 #15 [ffffbb012304be98] ksys_read at ffffffffae98cf27 #16 [ffffbb012304bed8] __x64_sys_read at ffffffffae98cfc9 #17 [ffffbb012304bee8] do_syscall_64 at ffffffffaf3a7fa9 RIP: 00007f87f70e09cc RSP: 00007ffd6bef11f0 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f87f70e09cc RDX: 00000000001fffff RSI: 00007f794dc00009 RDI: 0000000000000006 RBP: 00007f794dc00009 R8: 0000000000000000 R9: 00007ffd6bef12c0 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd6bef1348 R13: 0000000000000001 R14: 00000000001fffff R15: 00000000001fffff ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b PID: 2637 TASK: ffff9052562bbf00 CPU: 9 COMMAND: "q" #0 [ffffbb01018bf6f0] __schedule at ffffffffaf3b500d #1 [ffffbb01018bf778] schedule at ffffffffaf3b53be #2 [ffffbb01018bf798] cl_sync_io_wait at ffffffffc0a87cd7 [obdclass] #3 [ffffbb01018bf808] ll_io_read_page at ffffffffc10a9727 [lustre] #4 [ffffbb01018bf8e8] ll_readpage at ffffffffc10ab1dc [lustre] #5 [ffffbb01018bf938] filemap_read_page at ffffffffae8afa58 #6 [ffffbb01018bf960] filemap_update_page at ffffffffae8b231c #7 [ffffbb01018bf9a0] filemap_get_pages at ffffffffae8b2612 #8 [ffffbb01018bfa40] filemap_read at ffffffffae8b284c #9 [ffffbb01018bfb48] generic_file_read_iter at ffffffffae8b2c55 #10 [ffffbb01018bfb90] vvp_io_read_start at ffffffffc10cdd04 [lustre] #11 [ffffbb01018bfc08] cl_io_start at ffffffffc0a85528 [obdclass] #12 [ffffbb01018bfc40] cl_io_loop at ffffffffc0a88a4f [obdclass] #13 [ffffbb01018bfc78] ll_file_io_generic at ffffffffc10882e1 [lustre] #14 [ffffbb01018bfd78] ll_file_read_iter at ffffffffc1089101 [lustre] #15 [ffffbb01018bfdc0] new_sync_read at ffffffffae989aba #16 [ffffbb01018bfe50] vfs_read at ffffffffae98a453 #17 [ffffbb01018bfe90] ksys_read at ffffffffae98cf27 #18 [ffffbb01018bfed0] __x64_sys_read at ffffffffae98cfc9 #19 [ffffbb01018bfee0] do_syscall_64 at ffffffffaf3a7fa9 #20 [ffffbb01018bff50] entry_SYSCALL_64_after_hwframe at ffffffffaf400099 RIP: 00007fe7e07e59cc RSP: 00007ffc950e6010 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007fe7e07e59cc RDX: 00000000001fffff RSI: 00007fdbfce00009 RDI: 0000000000000006 RBP: 00007fdbfce00009 R8: 0000000000000000 R9: 00007ffc950e60e0 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc950e6168 R13: 0000000000000001 R14: 00000000001fffff R15: 00000000001fffff ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b crash> ps -l 2189 2637 2639 [668349169196534] [UN] PID: 2639 TASK: ffff9052562bde80 CPU: 7 COMMAND: "q" [668349163196875] [RU] PID: 2189 TASK: ffff9052491a5e80 CPU: 9 COMMAND: "ptlrpcd_03_00" [668349163179612] [IN] PID: 2637 TASK: ffff9052562bbf00 CPU: 9 COMMAND: "q"
- Multiple occurrences of this bug demonstrates that the crash is always happening while the client is trying to read the second stripe of the file, either with standard striping (-c 5) or with PFL striping (-E 1G -c 1 -E EOF -c 5).
- Also, retrying few millisecs later seems to return the right lock.
There is a kernel bug at 5.12+ version where filemap_get_pages() adds additional page to IO.
So the main cl_io_loop() locks n pages, and IO has n+1. And as result uncovered page during rpc build.
We got it with small read IO 328 bytes first page, and failed with second page. First page should be cached.
Here is the kernel fix