Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10585

parallel-scale test_compilebench: LustreError: 17907:0:(osc_cache.c:955:osc_extent_wait())

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.11.0
    • None
    • RHEL 7.4 Server/ZFS - 2.10.57
      RHEL 7.4 Client - 2.10.57
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/64429ee4-fd94-11e7-bd00-52540065bddc

      test_compilebench failed with the following error:

      Timeout occurred after 810 mins, last suite running was parallel-scale, restarting cluster to continue tests
      

      client dmesg:

      ==================================================== 19:52:55 \(1516391575\)
      [34274.382611] Lustre: DEBUG MARKER: == parallel-scale test compilebench: compilebench ==================================================== 19:52:55 (1516391575)
      [34277.064668] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.compilebench.13986 -i 1         -r 1 --makej
      [34277.241330] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.compilebench.13986 -i 1 -r 1 --makej
      [34491.976929] LNet: Host 10.2.8.43 reset our connection while we were sending data; it may have rebooted.
      [34491.978174] Lustre: 15167:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1516391793/real 1516391793]  req@ffff88007a953300 x1590051734496032/t0(0) o400->lustre-OST0001-osc-ffff8800426ca800@10.2.8.43@tcp:28/4 lens 224/224 e 0 to 1 dl 1516391837 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1
      [34491.981073] Lustre: lustre-OST0001-osc-ffff8800426ca800: Connection to lustre-OST0001 (at 10.2.8.43@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [34530.982827] Lustre: 15168:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1516391788/real 1516391788]  req@ffff88004342cc00 x1590051734495888/t0(0) o400->lustre-OST0001-osc-ffff8800426ca800@10.2.8.43@tcp:28/4 lens 224/224 e 0 to 1 dl 1516391832 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [34530.985840] Lustre: 15168:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
      [34552.982823] Lustre: 15168:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1516391778/real 1516391779]  req@ffff880044c48000 x1590051734494352/t0(0) o4->lustre-OST0003-osc-ffff8800426ca800@10.2.8.43@tcp:6/4 lens 608/448 e 0 to 1 dl 1516391854 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [34552.985814] Lustre: 15168:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 9 previous similar messages
      [34555.982817] Lustre: 15167:0:(client.c:2100:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1516391781/real 1516391781]  req@ffff880043aade00 x1590051734495680/t0(0) o4->lustre-OST0003-osc-ffff8800426ca800@10.2.8.43@tcp:6/4 lens 608/448 e 0 to 1 dl 1516391857 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
      [34555.985664] Lustre: 15167:0:(client.c:2100:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
      [38981.808821] LustreError: 17907:0:(osc_cache.c:955:osc_extent_wait()) extent ffff880038aee6e0@{[0 -> 511/1023], [3|0|+|rpc|wiY|ffff8800542718c0], [2752512|310|+|-|ffff88007961f8c0|1024|ffff880069c30000]} lustre-OST0003-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [38981.811368] LustreError: 17907:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff880038aee6e0 ns: lustre-OST0003-osc-ffff8800426ca800 lock: ffff88007961f8c0/0x9c1d9679bc44a8a9 lrc: 3/0,0 mode: PW/PW res: [0x3e6f9:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x29400020000 nid: local remote: 0xec2fa022db61f7f expref: -99 pid: 17896 timeout: 0 lvb_type: 1
      [39011.808814] LustreError: 16813:0:(osc_cache.c:955:osc_extent_wait()) extent ffff880038aeedc0@{[0 -> 511/1023], [3|0|+|rpc|wiuY|ffff8800542713c0], [2752512|343|+|-|ffff880015bb4b40|1024|ffff880058079fa0]} lustre-OST0002-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [39011.808821] LustreError: 16812:0:(osc_cache.c:955:osc_extent_wait()) extent ffff880038aee0b0@{[0 -> 255/1023], [3|0|+|rpc|wiuY|ffff880054271000], [1703936|1|+|-|ffff8800609beb40|1024|ffff880058079fa0]} lustre-OST0005-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [39011.808829] LustreError: 16812:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff880038aee0b0 ns: lustre-OST0005-osc-ffff8800426ca800 lock: ffff8800609beb40/0x9c1d9679bc44a81d lrc: 3/0,0 mode: PW/PW res: [0x3f4bd:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x29400020000 nid: local remote: 0xec2fa022db61f39 expref: -99 pid: 17896 timeout: 0 lvb_type: 1
      [39011.808837] LustreError: 18173:0:(osc_cache.c:955:osc_extent_wait()) extent ffff880038aeebb0@{[0 -> 511/1023], [3|0|+|rpc|wiuY|ffff880054271280], [2752512|310|+|-|ffff880015bb4000|1024|ffff880058079fa0]} lustre-OST0001-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [39011.808860] LustreError: 18174:0:(osc_cache.c:955:osc_extent_wait()) extent ffff8800573338f0@{[7936 -> 8191/8191], [3|0|+|rpc|wiuY|ffff880054271500], [1703936|43|+|-|ffff8800609bf8c0|1024|ffff880058079fa0]} lustre-OST0006-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [39021.808815] LustreError: 18176:0:(osc_cache.c:955:osc_extent_wait()) extent ffff880038aee160@{[0 -> 255/1023], [3|0|+|rpc|wiuY|ffff880057331b40], [1703936|1|+|-|ffff880059544fc0|1024|ffff880058079fa0]} lustre-OST0005-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [39021.810849] LustreError: 18177:0:(osc_cache.c:955:osc_extent_wait()) extent ffff880038aeec60@{[0 -> 255/1023], [3|0|+|rpc|wiuY|ffff880054271a00], [1703936|9|+|-|ffff88007961efc0|1024|ffff880058079fa0]} lustre-OST0000-osc-ffff8800426ca800: wait ext to 0 timedout, recovery in progress?
      [39021.810857] LustreError: 18177:0:(osc_cache.c:955:osc_extent_wait()) ### extent: ffff880038aeec60 ns: lustre-OST0000-osc-ffff8800426ca800 lock: ffff88007961efc0/0x9c1d9679bc44a88d lrc: 3/0,0 mode: PW/PW res: [0x5c1df:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x29400020000 nid: local remote: 0xec2fa022db61f71 expref: -99 pid: 17896 timeout: 0 lvb_type: 1
      [39021.810858] LustreError: 18177:0:(osc_cache.c:955:osc_extent_wait()) Skipped 3 previous similar messages
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              standan Saurabh Tandan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: