[LU-12340] replay-dual test 0b timeouts Created: 25/May/19  Updated: 10/Oct/19  Resolved: 10/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12769 replay-dual test 0b hangs in client m... Resolved
is related to LU-11762 replay-single test 0d fails with 'po... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It seems replay-dual test0b started to timeout a lot for me (not sure if 100%).

Server side logs indicate this which certainly does not look normal:

[  243.748011] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 0 evicted) to recover in 0:19
[  243.749658] Lustre: Skipped 1 previous similar message
[  263.267987] Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
[  263.269233] Lustre: lustre-MDT0000: disconnecting 1 stale clients
[  263.780003] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) to recover in 0:34
[  263.782005] Lustre: Skipped 3 previous similar messages
[  298.835999] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 0:00
[  298.838164] Lustre: Skipped 6 previous similar messages
[  363.940436] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 1:05
[  363.941848] Lustre: Skipped 12 previous similar messages
[  494.164193] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 3:15
[  494.166791] Lustre: Skipped 25 previous similar messages
[  754.579831] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 7:36
[  754.581284] Lustre: Skipped 51 previous similar messages
[ 1270.403818] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 16:11
[ 1270.405106] Lustre: Skipped 102 previous similar messages
[ 1871.363761] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 26:12
[ 1871.364853] Lustre: Skipped 119 previous similar messages
[ 2472.323982] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 36:13
[ 2472.325633] Lustre: Skipped 119 previous similar messages
[ 3073.284066] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 46:14
[ 3073.285972] Lustre: Skipped 119 previous similar messages
[ 3674.244100] Lustre: lustre-MDT0000: Denying connection for new client d11934b8-d154-4 (at 192.168.10.167@tcp), waiting for 2 known clients (0 recovered, 1 in progress, and 1 evicted) already passed deadline 56:15
[ 3674.246309] Lustre: Skipped 119 previous similar messages

sample reports:
http://testing.linuxhacker.ru:3333/lustre-reports/279/testresults/replay-dual-zfs-centos7_x86_64-centos7_x86_64/
http://testing.linuxhacker.ru:3333/lustre-reports/278/testresults/replay-dual-ldiskfs-DNE-centos7_x86_64-centos7_x86_64/



 Comments   
Comment by Jian Yu [ 15/Aug/19 ]

+1 on master branch: https://testing.whamcloud.com/test_sets/a0bb8926-bf1c-11e9-98c8-52540065bddc

Comment by James A Simmons [ 15/Aug/19 ]

This is a duplicate of LU-11762. This is really hard to reproduce expect for Oleg.

Comment by James A Simmons [ 23/Sep/19 ]

Potential fix from LU-12769

Comment by James A Simmons [ 10/Oct/19 ]

This should be resolved by https://review.whamcloud.com/#/c/36274/ If not we can reopen.

Generated at Sat Feb 10 02:51:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.