Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7820

jobs crash with llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5

Details

    • 3
    • 9223372036854775807

    Description

      Error happens during soak testing of build '20160224' (b2_8 RC2) (see:
      https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola& spaceKey=Releases#SoakTestingonLola-20150224). DNE is enabled.
      MDSes had been formatted using ldiskfs, OSTs using zfs. MDSes are configured in active-active HA failover configuration.

      Applicaton {mdtest (1file per process) jobs crash with the following errors:

        JOBID          ERROR MESSAGE
      -- 445604 :  201602 25 15:08:35 : Process 1(lola-31.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
      -- 445605 :  201602 25 15:07:42 : Process 3(lola-32.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
      -- 445415 :  201602 25 11:27:11 : Process 3(lola-34.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
      -- 445416 :  201602 25 11:28:45 : Process 3(lola-32.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
      -- 445270 :  201602 25 08:05:01 : Process 4(lola-31.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
      -- 445271 :  201602 25 08:04:34 : Process 1(lola-29.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
      

      On MDS and client nodes the following Lustre errors can be correlated:

      ---- Incident 25 15:08:35 ----
      lola-11.log:Feb 25 15:08:35 lola-11 kernel: Lustre: soaked-MDT0006: Connection restored to 300cd577-7ec5-3892-b093-9d631f897cda (at 192.168.1.131@o2ib100)
      lola-11.log:Feb 25 15:08:35 lola-11 kernel: Lustre: Skipped 254 previous similar messages
      lola-31.log:Feb 25 15:08:35 lola-31 kernel: LustreError: 167-0: soaked-MDT0006-mdc-ffff88086597e800: This client was evicted by soaked-MDT0006; in progress operations using this service will fail.
      lola-31.log:Feb 25 15:08:35 lola-31 kernel: LustreError: 120434:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
      lola-31.log:Feb 25 15:08:35 lola-31 kernel: Lustre: soaked-MDT0006-mdc-ffff88086597e800: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
      ---- Incident 25 15:07:42 ----
      lola-32.log:Feb 25 15:07:42 lola-32 kernel: LustreError: 167-0: soaked-MDT0006-mdc-ffff88082f4c4000: This client was evicted by soaked-MDT0006; in progress operations using this service will fail.
      lola-32.log:Feb 25 15:07:42 lola-32 kernel: LustreError: 133347:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
      lola-32.log:Feb 25 15:07:42 lola-32 kernel: LustreError: 133347:0:(llite_lib.c:2309:ll_prep_inode()) Skipped 2 previous similar messages
      lola-32.log:Feb 25 15:07:42 lola-32 kernel: Lustre: soaked-MDT0006-mdc-ffff88082f4c4000: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10)
      ---- Incident 25 11:27:11 ----
      lola-31.log:Feb 25 11:27:11 lola-31 kernel: LustreError: 105033:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -4
      lola-34.log:Feb 25 11:27:11 lola-34 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88102fa38000: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
      lola-34.log:Feb 25 11:27:11 lola-34 kernel: LustreError: 105947:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
      lola-34.log:Feb 25 11:27:11 lola-34 kernel: Lustre: soaked-MDT0002-mdc-ffff88102fa38000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      ---- Incident 25 11:28:45 ----
      lola-32.log:Feb 25 11:28:45 lola-32 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88082f4c4000: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
      lola-32.log:Feb 25 11:28:45 lola-32 kernel: LustreError: 117554:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
      lola-32.log:Feb 25 11:28:45 lola-32 kernel: Lustre: soaked-MDT0002-mdc-ffff88082f4c4000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-32.log:Feb 25 11:28:45 lola-32 kernel: LustreError: 117554:0:(llite_lib.c:2309:ll_prep_inode()) Skipped 2 previous similar messages
      ---- Incident 25 08:05:01 ----
      lola-31.log:Feb 25 08:05:01 lola-31 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88086597e800: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
      lola-31.log:Feb 25 08:05:01 lola-31 kernel: LustreError: 89849:0:(file.c:180:ll_close_inode_openhandle()) soaked-clilmv-ffff88086597e800: inode [0x28000bf82:0x69f4:0x0] mdc close failed: rc = -5
      lola-31.log:Feb 25 08:05:01 lola-31 kernel: LustreError: 91182:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5
      lola-31.log:Feb 25 08:05:01 lola-31 kernel: Lustre: soaked-MDT0002-mdc-ffff88086597e800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      ---- Incident 25 08:04:34 ----
      lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff880871eec800: This client was evicted by soaked-MDT0002; in progress operations using this service will fail.
      lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 1037:0:(file.c:180:ll_close_inode_openhandle()) soaked-clilmv-ffff880871eec800: inode [0x28000bf82:0x66f3:0x0] mdc close failed: rc = -5
      lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 1043:0:(vvp_io.c:1519:vvp_io_init()) soaked: refresh file layout [0x28000a816:0x1c0e2:0x0] error -5.
      lola-29.log:Feb 25 08:04:34 lola-29 kernel: Lustre: soaked-MDT0002-mdc-ffff880871eec800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10)
      lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 1037:0:(file.c:180:ll_close_inode_openhandle()) Skipped 3 previous similar messages
      

      The errors happened after

      mds_failover     : 2016-02-25 14:52:36,099 - 2016-02-25 14:59:44,541     lola-11
      mds_failover     : 2016-02-25 11:06:59,431 - 2016-02-25 11:16:18,956     lola-9
      mds_failover     : 2016-02-25 07:45:03,939 - 2016-02-25 07:54:18,970     lola-9
      

      Does the eviction is an expected part of the workflow?

      Attachments

        Activity

          People

            wc-triage WC Triage
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: