[LU-8086] client eviction after MDT restart or failover Created: 29/Apr/16 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Frank Heckes (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | soak | ||
| Environment: |
lola |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The error happens during soak testing of build '20160427' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160427). DNE is enabled. OSTs had been formatted with zfs, MDT's using ldiskfs as storage backend. OSS and MDT nodes are configured in HA active-active failover configuration. For debugging purpose parameter dump_on_eviction=1 was set. The configuration, especially the mapping of node node to role can be found here: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration After every MDS restart or failover a large number of Luster nodes (very often the majority) are evicted. The following sequence of events is 100% reproducible:
2016-04-28 10:16:48,996 mds_failover (failback completed ; lola-8 run own own resource mdt-0 again) 2016-04-28 10:17:32 recovery of mdt-0 finished on lola-8: Apr 28 10:17:32 lola-8 kernel: Lustre: soaked-MDT0000: Recovery over after 0:43, of 21 clients 21 recovered and 0 were evicted. *
Attached files messages, console, and debug log for each Lustre node type: As stated above the effect can be reproduced with certainty in case additional information are needed. IB fabric and LNet routers didn't indicate any errors or malfunctions at any of the time interval the error occurred, nor earlier or later. |
| Comments |
| Comment by Di Wang [ 02/May/16 ] |
|
It seems most of the eviction happened between mgc and mgs in lola-20-lustre-log.1461863720.182550, which is normal in this test. ptlrpc_invalidate_import_thread^@dump the log upon eviction ptlrpc_invalidate_import_thread^@ffff880821fd8000 MGS: changing import state from EVICTED to RECOVER |