[LU-16156] stale read during IOR test due LU-14541 Created: 14/Sep/22 Updated: 01/Dec/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Alexey Lyashkov | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
It looks 00000080:00010000:3.0:1661448429.922359:0:17969:0:(file.c:5884:ll_layout_lock_set()) ### file [0x200044d2e:0x22f:0x0](000000008945c220) being reconfigured ns: kjcf04-MDT0000-mdc-ffff88879388d000 lock: 00000000265351e4/0x2ea 4f3791074a3c7 lrc: 3/1,0 mode: CR/CR res: [0x200044d2e:0x22f:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT gid 0 flags: 0x0 nid: local remote: 0x76fc642919c90d07 expref: -99 pid: 17969 timeout: 0 lvb_type: 3 00000080:00200000:3.0:1661448429.922913:0:17969:0:(file.c:5779:ll_layout_conf()) [0x200044d2e:0x22f:0x0]: layout version change: 4294967294 -> 15 DoM 00010000:00010000:3.0:1661448429.926955:0:17969:0:(ldlm_lock.c:902:ldlm_lock_decref_internal()) ### add lock into lru list ns: kjcf04-MDT0000-mdc-ffff88879388d000 lock: 00000000ca0cb036/0x2ea4f3791074a3e3 lrc: 3/0,0 mode: P R/PR res: [0x200044d2e:0x22f:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT gid 0 flags: 0x800020000000000 nid: local remote: 0x76fc642919c90d3f expref: -99 pid: 17969 timeout: 0 lvb_type: 1 00010000:00010000:3.0:1661448429.926982:0:17969:0:(ldlm_lock.c:766:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 3/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [0->0] (req 0->0) gid 0 flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 17969 timeout: 0 lvb_type: 1 00010000:00010000:3.0:1661448429.926985:0:17969:0:(ldlm_request.c:1028:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x0 ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 3/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [1048576->4194303] (req 1048576->4194303) gid 0 flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 17969 timeout: 0 lvb_type: 1 00010000:00010000:3.0:1661448429.926994:0:17969:0:(ldlm_request.c:1115:ldlm_cli_enqueue()) ### sending request ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 3/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [1048576->4194303] (req 1048576->4194303) gid 0 flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 17969 timeout: 0 lvb_type: 1 00010000:00010000:3.0:1661448429.927377:0:17969:0:(ldlm_resource.c:1686:ldlm_resource_add_lock()) ### About to add this lock ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 4/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [1048576->4194303] (req 1048576->4194303) gid 0 flags: 0x10000000020000 nid: local remote: 0x10ff82d82b217a69 expref: -99 pid: 17969 timeout: 0 lvb_type: 1 00000080:00200000:3.0:1661448430.113932:0:17969:0:(vvp_io.c:312:vvp_io_fini()) [0x200044d2e:0x22f:0x0] ignore/verify layout 0/0, layout version 15 need write layout 0, restore needed 0 00000008:00100000:17.0:1661448430.236630:0:10311:0:(osc_request.c:1804:osc_brw_prep_request()) brw rpc 00000000ff43cadf - object 0x0:633777469 offset 104857600<>109056000 00000100:00100000:23.0:1661448430.292928:0:10312:0:(client.c:2220:ptlrpc_check_set()) Completed RPC req@00000000ff43cadf pname:cluuid:pid:xid:nid:opc:job:rc ptlrpcd_00_04:2f7f0639-3f88-4f3f-9faf-e0b16a082f10:10312:1741723660248256:10.17.100.60@o2ib:3::0/4194304 00000080:00200000:3.0:1661448430.310692:0:17969:0:(vvp_io.c:847:vvp_io_read_start()) IORfile_4m.00000011: read [100663296, 104857600) 00000080:00200008:3.0:1661448430.321303:0:17969:0:(file.c:1997:ll_file_read_iter()) file IORfile_4m.00000011:[0x200044d2e:0x22f:0x0], ppos: 104857600, count: 4194304 00010000:00010000:3.0:1661448430.314974:0:17969:0:(ldlm_lock.c:909:ldlm_lock_decref_internal()) ### do not add lock into lru list ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 5/1,0 mode: PR/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 1048576->4194303) gid 0 flags: 0x810020000020000 nid: local remote: 0x10ff82d82b217a69 expref: -99 pid: 17969 timeout: 0 lvb_type: 1 new ldlm lock requested/granted, but BRW rpc created with single page hole inside. |
| Comments |
| Comment by Alexey Lyashkov [ 01/Dec/22 ] |
|
In fact, problem is simple while fix is not. CLIO broke a kernel MM protocol for the memory reclaim and ll_releasepage checks is racy and buggy. This bug from CLIO begin. CLIO hold a vmpage ref until CL page released (aka vvp_page_fini), it cause a situation when memory reclaim or layout change needs to flush a pages, but blocking AST arrived in same time. osc_gang_lookup holds a CL page reference while memory reclaim start to delete a page, but existent check is racy. So ll_releasepage exit after cl_page_delete with extra vmpage reference. That all. VM page still in the page cache and any read my pickup this data. It's easy check by adds assert at end of ll_releasepages with check refcount and run a dual mount fsx test from sanityN. 00000008:00000001:9.0:1666910059.632700:0:5016:0:(osc_cache.c:3088:osc_page_gang_lookup()) Process entered but cl_page ref was hold where bl ast enter and interrrupted by ll_releasepage aka cache flush. 00000020:00000001:8.0:1666910059.632703:0:11668:0:(cl_page.c:545:cl_vmpage_page()) Process leaving (rc=18446624413482391544 : -119660227160072 : ffff932b6eaa97f8) 00000020:00000001:8.0:1666910059.632708:0:11668:0:(cl_page.c:444:cl_page_state_set0()) page@ffff932b6eaa97f8[3 ffff932b5cd3f2b0 1 1 0000000000000000] 00000020:00000001:8.0:1666910059.632709:0:11668:0:(cl_page.c:445:cl_page_state_set0()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0002015 count 3 priv ffff932b6eaa97f8: 00000020:00000001:8.0:1666910059.633545:0:11668:0:(cl_page.c:489:cl_pagevec_put()) page@ffff932b6eaa97f8[2 ffff932b5cd3f2b0 5 1 0000000000000000] 00000020:00000001:8.0:1666910059.633546:0:11668:0:(cl_page.c:490:cl_pagevec_put()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0000015 count 3 priv 0: 00000080:00008000:8.0:1666910059.633548:0:11668:0:(rw26.c:175:ll_releasepage()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0000015 count 3 priv 0: clpage ffff932b6eaa97f8 : 1 ll_releasepage exit and expect to free a cl_page but ref hold by BL AST thread. and vmpage still with 3 refs while __remove_mapping whats 2. 00000020:00000001:9.0:1666910059.642999:0:5016:0:(cl_page.c:489:cl_pagevec_put()) page@ffff932b6eaa97f8[1 ffff932b5cd3f2b0 5 1 0000000000000000] 00000020:00000001:9.0:1666910059.643000:0:5016:0:(cl_page.c:490:cl_pagevec_put()) page fffff2cc04e941c0 map ffff932c62810218 index 82632 flags 17ffffc0000014 count 2 priv 0: 00000020:00000010:9.0:1666910059.643003:0:5016:0:(cl_page.c:178:__cl_page_free()) slab-freed 'cl_page': 472 at ffff932b6eaa97f8. cl_page freed -> vmpage ref released, vmpage with 2refs and it may removed from pagecache, but none want's to do it and uptodate page still in pagecache. |