Details
-
Bug
-
Resolution: Won't Fix
-
Critical
-
None
-
Lustre 2.8.0
-
lola
build: https://build.hpdd.intel.com/job/lustre-b2_8/8/
-
3
-
9223372036854775807
Description
Error happens during soak testing of build '20160224' (b2_8 RC2) (see:
https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola& spaceKey=Releases#SoakTestingonLola-20150224). DNE is enabled.
MDSes had been formatted using ldiskfs, OSTs using zfs. MDSes are configured in active-active HA failover configuration.
Applicaton {mdtest (1file per process) jobs crash with the following errors:
JOBID ERROR MESSAGE -- 445604 : 201602 25 15:08:35 : Process 1(lola-31.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error -- 445605 : 201602 25 15:07:42 : Process 3(lola-32.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error -- 445415 : 201602 25 11:27:11 : Process 3(lola-34.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error -- 445416 : 201602 25 11:28:45 : Process 3(lola-32.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error -- 445270 : 201602 25 08:05:01 : Process 4(lola-31.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error -- 445271 : 201602 25 08:04:34 : Process 1(lola-29.lola.whamcloud.com): FAILED in main, Unable to change to test directory: Input/output error
On MDS and client nodes the following Lustre errors can be correlated:
---- Incident 25 15:08:35 ---- lola-11.log:Feb 25 15:08:35 lola-11 kernel: Lustre: soaked-MDT0006: Connection restored to 300cd577-7ec5-3892-b093-9d631f897cda (at 192.168.1.131@o2ib100) lola-11.log:Feb 25 15:08:35 lola-11 kernel: Lustre: Skipped 254 previous similar messages lola-31.log:Feb 25 15:08:35 lola-31 kernel: LustreError: 167-0: soaked-MDT0006-mdc-ffff88086597e800: This client was evicted by soaked-MDT0006; in progress operations using this service will fail. lola-31.log:Feb 25 15:08:35 lola-31 kernel: LustreError: 120434:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5 lola-31.log:Feb 25 15:08:35 lola-31 kernel: Lustre: soaked-MDT0006-mdc-ffff88086597e800: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10) ---- Incident 25 15:07:42 ---- lola-32.log:Feb 25 15:07:42 lola-32 kernel: LustreError: 167-0: soaked-MDT0006-mdc-ffff88082f4c4000: This client was evicted by soaked-MDT0006; in progress operations using this service will fail. lola-32.log:Feb 25 15:07:42 lola-32 kernel: LustreError: 133347:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5 lola-32.log:Feb 25 15:07:42 lola-32 kernel: LustreError: 133347:0:(llite_lib.c:2309:ll_prep_inode()) Skipped 2 previous similar messages lola-32.log:Feb 25 15:07:42 lola-32 kernel: Lustre: soaked-MDT0006-mdc-ffff88082f4c4000: Connection restored to 192.168.1.111@o2ib10 (at 192.168.1.111@o2ib10) ---- Incident 25 11:27:11 ---- lola-31.log:Feb 25 11:27:11 lola-31 kernel: LustreError: 105033:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -4 lola-34.log:Feb 25 11:27:11 lola-34 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88102fa38000: This client was evicted by soaked-MDT0002; in progress operations using this service will fail. lola-34.log:Feb 25 11:27:11 lola-34 kernel: LustreError: 105947:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5 lola-34.log:Feb 25 11:27:11 lola-34 kernel: Lustre: soaked-MDT0002-mdc-ffff88102fa38000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) ---- Incident 25 11:28:45 ---- lola-32.log:Feb 25 11:28:45 lola-32 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88082f4c4000: This client was evicted by soaked-MDT0002; in progress operations using this service will fail. lola-32.log:Feb 25 11:28:45 lola-32 kernel: LustreError: 117554:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5 lola-32.log:Feb 25 11:28:45 lola-32 kernel: Lustre: soaked-MDT0002-mdc-ffff88082f4c4000: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-32.log:Feb 25 11:28:45 lola-32 kernel: LustreError: 117554:0:(llite_lib.c:2309:ll_prep_inode()) Skipped 2 previous similar messages ---- Incident 25 08:05:01 ---- lola-31.log:Feb 25 08:05:01 lola-31 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff88086597e800: This client was evicted by soaked-MDT0002; in progress operations using this service will fail. lola-31.log:Feb 25 08:05:01 lola-31 kernel: LustreError: 89849:0:(file.c:180:ll_close_inode_openhandle()) soaked-clilmv-ffff88086597e800: inode [0x28000bf82:0x69f4:0x0] mdc close failed: rc = -5 lola-31.log:Feb 25 08:05:01 lola-31 kernel: LustreError: 91182:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5 lola-31.log:Feb 25 08:05:01 lola-31 kernel: Lustre: soaked-MDT0002-mdc-ffff88086597e800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) ---- Incident 25 08:04:34 ---- lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 167-0: soaked-MDT0002-mdc-ffff880871eec800: This client was evicted by soaked-MDT0002; in progress operations using this service will fail. lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 1037:0:(file.c:180:ll_close_inode_openhandle()) soaked-clilmv-ffff880871eec800: inode [0x28000bf82:0x66f3:0x0] mdc close failed: rc = -5 lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 1043:0:(vvp_io.c:1519:vvp_io_init()) soaked: refresh file layout [0x28000a816:0x1c0e2:0x0] error -5. lola-29.log:Feb 25 08:04:34 lola-29 kernel: Lustre: soaked-MDT0002-mdc-ffff880871eec800: Connection restored to 192.168.1.109@o2ib10 (at 192.168.1.109@o2ib10) lola-29.log:Feb 25 08:04:34 lola-29 kernel: LustreError: 1037:0:(file.c:180:ll_close_inode_openhandle()) Skipped 3 previous similar messages
The errors happened after
mds_failover : 2016-02-25 14:52:36,099 - 2016-02-25 14:59:44,541 lola-11 mds_failover : 2016-02-25 11:06:59,431 - 2016-02-25 11:16:18,956 lola-9 mds_failover : 2016-02-25 07:45:03,939 - 2016-02-25 07:54:18,970 lola-9
Does the eviction is an expected part of the workflow?