Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
None
-
lola
build: master commit 71d2ea0fde17ecde0bf237f486d4bafb5d54fe3f + patches
-
3
-
9223372036854775807
Description
The error happens during soak testing of build '20160427' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160427). DNE is enabled. OSTs had been formatted with zfs, MDT's using ldiskfs as storage backend. OSS and MDT nodes are configured in HA active-active failover configuration. For debugging purpose parameter dump_on_eviction=1 was set.
The configuration, especially the mapping of node node to role can be found here: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration
After every MDS restart or failover a large number of Luster nodes (very often the majority) are evicted.
The following sequence of events is 100% reproducible:
- 2016-04-28 10:07:15,956 mds_failover lola-8 ---> lola-9 started
- 2016-04-28 10:16:23,738:fsmgmt.fsmgmt:INFO Node lola-9: 'soaked-MDT0000' recovery completed
- 2016-04-28 10:16:23,739:fsmgmt.fsmgmt:INFO Unmounting soaked-MDT0000 on lola-9 ...
- 2016-04-28 10:16:48,995:fsmgmt.fsmgmt:INFO ... soaked-MDT0000 mounted successfully on lola-8
2016-04-28 10:16:48,996 mds_failover (failback completed ; lola-8 run own own resource mdt-0 again)
2016-04-28 10:17:32 recovery of mdt-0 finished on lola-8:
Apr 28 10:17:32 lola-8 kernel: Lustre: soaked-MDT0000: Recovery over after 0:43, of 21 clients 21 recovered and 0 were evicted. *
- 2016-04-28 10:17:* most clients get evicted although stated differetly in Lustre message above:
lola-10.log:Apr 28 10:17:24 lola-10 kernel: LustreError: 48860:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-11.log:Apr 28 10:17:05 lola-11 kernel: LustreError: 28261:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-13.log:Apr 28 10:17:06 lola-13 kernel: LustreError: 81063:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-16.log:Apr 28 10:17:10 lola-16 kernel: LustreError: 229277:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-18.log:Apr 28 10:17:08 lola-18 kernel: LustreError: 110914:0:(import.c:1405:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-19.log:Apr 28 10:17:25 lola-19 kernel: LustreError: 233525:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-20.log:Apr 28 10:17:14 lola-20 kernel: LustreError: 182741:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-21.log:Apr 28 10:17:14 lola-21 kernel: LustreError: 155091:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-22.log:Apr 28 10:17:05 lola-22 kernel: LustreError: 171992:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-23.log:Apr 28 10:17:34 lola-23 kernel: LustreError: 158263:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-24.log:Apr 28 10:17:21 lola-24 kernel: LustreError: 160657:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-25.log:Apr 28 10:17:11 lola-25 kernel: LustreError: 196242:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-26.log:Apr 28 10:17:07 lola-26 kernel: LustreError: 153478:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-27.log:Apr 28 10:17:20 lola-27 kernel: LustreError: 158888:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-29.log:Apr 28 10:17:25 lola-29 kernel: LustreError: 29326:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-2.log:Apr 28 10:17:10 lola-2 kernel: LustreError: 16891:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-2.log:Apr 28 10:17:17 lola-2 kernel: LustreError: 16899:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-2.log:Apr 28 10:17:42 lola-2 kernel: LustreError: 16907:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-30.log:Apr 28 10:17:14 lola-30 kernel: LustreError: 34608:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-31.log:Apr 28 10:17:21 lola-31 kernel: LustreError: 17749:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-32.log:Apr 28 10:17:02 lola-32 kernel: LustreError: 152914:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-33.log:Apr 28 10:17:14 lola-33 kernel: LustreError: 165946:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-34.log:Apr 28 10:17:16 lola-34 kernel: LustreError: 152469:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-3.log:Apr 28 10:17:18 lola-3 kernel: LustreError: 75334:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-4.log:Apr 28 10:17:08 lola-4 kernel: LustreError: 34658:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-5.log:Apr 28 10:17:07 lola-5 kernel: LustreError: 32477:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-6.log:Apr 28 10:17:02 lola-6 kernel: LustreError: 75888:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-7.log:Apr 28 10:17:24 lola-7 kernel: LustreError: 20063:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction lola-9.log:Apr 28 10:17:31 lola-9 kernel: LustreError: 11783:0:(import.c:1406:ptlrpc_invalidate_import_thread()) dump the log upon eviction
Attached files messages, console, and debug log for each Lustre node type:
OSS : lola-3
MDS : lola-11
client : lola-20
As stated above the effect can be reproduced with certainty in case additional information are needed.
IB fabric and LNet routers didn't indicate any errors or malfunctions at any of the time interval the error occurred, nor earlier or later.