Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2257

eviction from MDT during recovery

    XMLWordPrintable

Details

    • 3
    • 5400

    Description

      The MDS evicted one client during recovery:

      2012-10-30 23:31:31 Lustre: lstest-MDT0000: Recovery over after 1:11, of 448 clients 447 recovered and 1 was evicted.
      

      The client had this to say:

      00000100:00000400:1.0:1351665089.361406:0:4759:0:(client.c:2702:ptlrpc_replay_interpret()) @@@ Version mismatch during replay
        req@ffff8808156d5400 x1417272286137308/t60416147281(60416147281) o101->lstest-MDT0000-mdc-ffff88101c53c800@172.20.5.2@o2ib500:12/10 lens 784/544 e 0 to 0 dl 1351665195 ref 2 fl Interpret:R/4/0 rc -75/-75
      00000100:00000100:1.0:1351665195.391323:0:4759:0:(client.c:1914:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1351665089/real 1351665089]  req@ffff8808037b2000 x1417272286147651/t0(0) o400->lstest-MDT0000-mdc-ffff88101c53c800@172.20.5.2@o2ib500:12/10 lens 224/224 e 0 to 1 dl 1351665195 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1
      00000100:00000400:1.0:1351665195.391332:0:4759:0:(import.c:1207:completed_replay_interpret()) lstest-MDT0000-mdc-ffff88101c53c800: version recovery fails, reconnecting
      00000100:02020000:1.0:1351665195.404955:0:4759:0:(import.c:1325:ptlrpc_import_recovery_state_machine()) 167-0: lstest-MDT0000-mdc-ffff88101c53c800: This client was evicted by lstest-MDT0000; in progress operations using this service will fail.
      00000080:00020000:8.0:1351665195.420248:0:21538:0:(file.c:155:ll_close_inode_openhandle()) inode 144115590443843479 mdc close failed: rc = -5
      00000100:02000000:10.0:1351665195.420776:0:21623:0:(import.c:1403:ptlrpc_import_recovery_state_machine()) lstest-MDT0000-mdc-ffff88101c53c800: Connection restored to lstest-MDT0000 (at 172.20.5.2@o2ib500)
      

      The application (IOR) failed with ENOENT on a write:

      Commencing write performance test: Tue Oct 30 23:11:39 2012
      ior ERROR: stat() failed, errno 2, No such file or directory (aiori-POSIX.c:323)
      

      We'd like to understand why this client was evicted and what the version recovery error messages mean.

      LLNL-bug-id: bz1867

      Attachments

        Activity

          People

            tappro Mikhail Pershin
            nedbass Ned Bass
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: