[LU-12066] recovery-small test 26b fails with “Client was not evicted by ost rc=1” Created: 13/Mar/19 Updated: 07/Feb/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.10.7, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.15.3 |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | failover | ||
| Environment: |
failover test session |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
recovery-small test_26b fails with “Client was not evicted by ost rc=1”. We only see this issue with failover testing Looking at a recent failure at https://testing.whamcloud.com/test_sets/e21fbf5e-4500-11e9-9720-52540065bddc , in the suite_log, we see an error at the beginning of the test == recovery-small test 26b: evict dead exports ======================================================= 09:09:58 (1552381798) CMD: trevis-42vm12 lctl get_param -n timeout trevis-42vm1: error: invalid path '/mnt/lustre': Input/output error Starting client: trevis-42vm1.trevis.whamcloud.com: -o user_xattr,flock trevis-42vm11:trevis-42vm12:/lustre /mnt/lustre2 CMD: trevis-42vm1.trevis.whamcloud.com mkdir -p /mnt/lustre2 CMD: trevis-42vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-42vm11:trevis-42vm12:/lustre /mnt/lustre2 CMD: trevis-42vm12 lctl get_param -n mdt.lustre-MDT0000.num_exports CMD: trevis-42vm5 lctl get_param -n obdfilter.lustre-OST0000.num_exports starting with 4 OST and 12 MDS exports … CMD: trevis-42vm5 lctl get_param -n *.lustre-OST0000.num_exports | cut -d' ' -f2 Update not seen after 60s: wanted '3' got '4' recovery-small test_26b: @@@@@@ FAIL: Client was not evicted by ost rc=1 On client 1 (vm1) we see an error in the console logs ======================================================= 09:09:58 \(1552381798\) [261204.673102] Lustre: DEBUG MARKER: == recovery-small test 26b: evict dead exports ======================================================= 09:09:58 (1552381798) [261205.856521] Lustre: Evicted from MGS (at 10.9.3.160@tcp) after server handle changed from 0x2ef44212d062fce8 to 0x2ef44212d0630260 [261210.854799] LustreError: 10265:0:(file.c:3644:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -5 … There’s nothing obviously wrong looking at the rest of the console logs. Log for more failures are at: We see a similar failure with master failover testing, but the logs do not have the ll_inode_revalidate_fini() error in the client console log and don’t have the ‘invalid path’ error. In some cases, recovery-small test 26a fails before the 26b failure. |
| Comments |
| Comment by Chris Horn [ 24/Apr/20 ] |
|
+1 on master https://testing.whamcloud.com/test_sessions/b2965d82-a459-4188-a035-72180920afb6 |
| Comment by James Nunez (Inactive) [ 12/Jan/21 ] |
|
Although this ticket is for failover test group failures, there is a recent interop failure for a full test session that has a similar failure at https://testing.whamcloud.com/test_sets/85c1940b-6a24-4b96-9e02-5e5e976474bb for 2.13.57.36 clients and 2.12.6 servers. |
| Comment by Peter Jones [ 26/Nov/21 ] |
|
Hongchao Could you please advise? Thanks Peter |
| Comment by Gerrit Updater [ 25/Mar/22 ] |
|
"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46934 |