Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1495

Hyperion -recovery-scale evictions

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • Lustre 2.1.2
    • None
    • Hyperion RHEL6 - 2.1.2-RC2, servers and clients
    • 3
    • 4041

    Description

      Seeing a repeated error where a single client is evicted during mds-recovery, followed by a number of other client evictions.
      debug logs, logs uploaded to FTP. - LLNL.212rc.tar.gz
      The failover script is failing over MDS, however OSTs are also experiencing disconnect/reconnect events
      Typical example:
      Jun 7 15:17:43 ehyperion260 kernel: Lustre: 19716:0:(client.c:1780:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1339107463
      /real 1339107463] req@ffff880144351800 x1404138588851383/t0(0) o8->lustre-OST0002-osc-ffff880226efb000@192.168.127.60@o2ib1:28/4 lens 368/512 e 0 to 1 dl 133910751
      8 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      Jun 7 15:17:43 ehyperion260 kernel: Lustre: 19716:0:(client.c:1780:ptlrpc_expire_one_request()) Skipped 217 previous similar messages
      Jun 7 15:17:43 ehyperion260 kernel: Lustre: lustre-OST0001-osc-ffff880226efb000: Connection restored to lustre-OST0001 (at 192.168.127.61@o2ib1)
      Jun 7 15:17:43 ehyperion260 kernel: Lustre: Skipped 4 previous similar messages
      Jun 7 15:17:48 ehyperion260 kernel: Lustre: lustre-MDT0000-mdc-ffff880226efb000: Connection restored to lustre-MDT0000 (at 192.168.127.6@o2ib1)
      Jun 7 15:17:48 ehyperion260 kernel: Lustre: Skipped 4 previous similar messages
      Jun 7 15:18:08 ehyperion260 kernel: Lustre: lustre-OST0002-osc-ffff880226efb000: Connection restored to lustre-OST0002 (at 192.168.127.60@o2ib1)
      Jun 7 15:18:33 ehyperion260 kernel: LustreError: 167-0: This client was evicted by lustre-OST0003; in progress operations using this service will fail.
      Jun 7 15:18:33 ehyperion260 kernel: LustreError: 167-0: This client was evicted by lustre-OST0004; in progress operations using this service will fail.

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: