Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16156

stale read during IOR test due LU-14541

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      It looks LU-14541 returned and last patch isn't enough.
      reproduced on cray-2.15-int with last 14541 patch applied.

      00000080:00010000:3.0:1661448429.922359:0:17969:0:(file.c:5884:ll_layout_lock_set()) ### file [0x200044d2e:0x22f:0x0](000000008945c220) being reconfigured ns: kjcf04-MDT0000-mdc-ffff88879388d000 lock: 00000000265351e4/0x2ea
      4f3791074a3c7 lrc: 3/1,0 mode: CR/CR res: [0x200044d2e:0x22f:0x0].0x0 bits 0x8/0x0 rrc: 2 type: IBT gid 0 flags: 0x0 nid: local remote: 0x76fc642919c90d07 expref: -99 pid: 17969 timeout: 0 lvb_type: 3
      00000080:00200000:3.0:1661448429.922913:0:17969:0:(file.c:5779:ll_layout_conf()) [0x200044d2e:0x22f:0x0]: layout version change: 4294967294 -> 15
      DoM
      00010000:00010000:3.0:1661448429.926955:0:17969:0:(ldlm_lock.c:902:ldlm_lock_decref_internal()) ### add lock into lru list ns: kjcf04-MDT0000-mdc-ffff88879388d000 lock: 00000000ca0cb036/0x2ea4f3791074a3e3 lrc: 3/0,0 mode: P
      R/PR res: [0x200044d2e:0x22f:0x0].0x0 bits 0x40/0x0 rrc: 3 type: IBT gid 0 flags: 0x800020000000000 nid: local remote: 0x76fc642919c90d3f expref: -99 pid: 17969 timeout: 0 lvb_type: 1
      00010000:00010000:3.0:1661448429.926982:0:17969:0:(ldlm_lock.c:766:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 3/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [0->0] (req 0->0) gid 0 flags: 0x10000000000000 nid: local remote: 0x0 expref: -99 pid: 17969 timeout: 0 lvb_type: 1
      00010000:00010000:3.0:1661448429.926985:0:17969:0:(ldlm_request.c:1028:ldlm_cli_enqueue()) ### client-side enqueue START, flags 0x0 ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 3/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [1048576->4194303] (req 1048576->4194303) gid 0 flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 17969 timeout: 0 lvb_type: 1
      00010000:00010000:3.0:1661448429.926994:0:17969:0:(ldlm_request.c:1115:ldlm_cli_enqueue()) ### sending request ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 3/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [1048576->4194303] (req 1048576->4194303) gid 0 flags: 0x0 nid: local remote: 0x0 expref: -99 pid: 17969 timeout: 0 lvb_type: 1
      00010000:00010000:3.0:1661448429.927377:0:17969:0:(ldlm_resource.c:1686:ldlm_resource_add_lock()) ### About to add this lock ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 4/1,0 mode: --/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [1048576->4194303] (req 1048576->4194303) gid 0 flags: 0x10000000020000 nid: local remote: 0x10ff82d82b217a69 expref: -99 pid: 17969 timeout: 0 lvb_type: 1
      00000080:00200000:3.0:1661448430.113932:0:17969:0:(vvp_io.c:312:vvp_io_fini()) [0x200044d2e:0x22f:0x0] ignore/verify layout 0/0, layout version 15 need write layout 0, restore needed 0
      00000008:00100000:17.0:1661448430.236630:0:10311:0:(osc_request.c:1804:osc_brw_prep_request()) brw rpc 00000000ff43cadf - object 0x0:633777469 offset 104857600<>109056000
      00000100:00100000:23.0:1661448430.292928:0:10312:0:(client.c:2220:ptlrpc_check_set()) Completed RPC req@00000000ff43cadf pname:cluuid:pid:xid:nid:opc:job:rc ptlrpcd_00_04:2f7f0639-3f88-4f3f-9faf-e0b16a082f10:10312:1741723660248256:10.17.100.60@o2ib:3::0/4194304
      00000080:00200000:3.0:1661448430.310692:0:17969:0:(vvp_io.c:847:vvp_io_read_start()) IORfile_4m.00000011: read [100663296, 104857600)
      00000080:00200008:3.0:1661448430.321303:0:17969:0:(file.c:1997:ll_file_read_iter()) file IORfile_4m.00000011:[0x200044d2e:0x22f:0x0], ppos: 104857600, count: 4194304
      00010000:00010000:3.0:1661448430.314974:0:17969:0:(ldlm_lock.c:909:ldlm_lock_decref_internal()) ### do not add lock into lru list ns: kjcf04-OST0005-osc-ffff88879388d000 lock: 00000000a8693f35/0x2ea4f3791074a3ff lrc: 5/1,0 mode: PR/PR res: [0x25c6ad3d:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 1048576->4194303) gid 0 flags: 0x810020000020000 nid: local remote: 0x10ff82d82b217a69 expref: -99 pid: 17969 timeout: 0 lvb_type: 1
      

      new ldlm lock requested/granted, but BRW rpc created with single page hole inside.
      it caused a stale read, first patch with clear uptodate page flag helps and stale data don't reported.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: