[LU-15144] conf-sanity test_47: timeout Created: 22/Oct/21  Updated: 11/Apr/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for S Buisson <sbuisson@ddn.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5ed8a54c-ac57-4971-8dc1-bf185dc4d6d0

test_47 failed with the following error:

Timeout occurred after 314 mins, last suite running was conf-sanity

Last test log output is:

Starting client: onyx-37vm3.onyx.whamcloud.com:  -o user_xattr,flock onyx-22vm5@tcp:/lustre /mnt/lustre
CMD: onyx-37vm3.onyx.whamcloud.com mkdir -p /mnt/lustre
CMD: onyx-37vm3.onyx.whamcloud.com mount -t lustre -o user_xattr,flock onyx-22vm5@tcp:/lustre /mnt/lustre

The client does not manage to start, with the following in dmesg:

[Thu Oct 21 17:10:23 2021] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock onyx-22vm5@tcp:/lustre /mnt/lustre
[Thu Oct 21 17:10:29 2021] LustreError: 450440:0:(mgc_request.c:253:do_config_log_add()) MGC10.2.4.25@tcp: failed processing log, type 1: rc = -5
[Thu Oct 21 17:10:36 2021] LustreError: 450445:0:(mgc_request.c:612:do_requeue()) failed processing log: -5
[Thu Oct 21 17:11:00 2021] LustreError: 15c-8: MGC10.2.4.25@tcp: Confguration from log lustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
[Thu Oct 21 17:11:00 2021] Lustre: Unmounted lustre-client
[Thu Oct 21 17:12:44 2021] LustreError: 450440:0:(import.c:354:ptlrpc_invalidate_import()) MGS: timeout waiting for callback (1 != 0)
[Thu Oct 21 17:12:44 2021] LustreError: 450440:0:(import.c:377:ptlrpc_invalidate_import()) @@@ still on sending list  req@00000000b8ecf6b4 x1714249859596352/t0(0) o250->MGC10.2.4.25@tcp@10.2.4.25@tcp:26/25 lens 520/544 e 0 to 1 dl 1634836230 ref 2 fl UnregRPC:EXNU/0/ffffffff rc -5/-1 job:''
[Thu Oct 21 17:12:44 2021] LustreError: 450440:0:(import.c:388:ptlrpc_invalidate_import()) MGS: Unregistering RPCs found (1). Network is sluggish? Waiting for them to error out.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_47 - Timeout occurred after 314 mins, last suite running was conf-sanity



 Comments   
Comment by Sergey Cheremencev [ 06/Dec/21 ]

+1 on master conf-sanity test_109b - https://testing.whamcloud.com/test_sets/3d2d674d-6b89-4441-8bb3-b86c3fcff202

Comment by Etienne Aujames [ 11/Apr/22 ]

+1 on master conf-sanity test_50f: https://testing.whamcloud.com/test_sets/a4b40535-8de6-4a57-a2d7-2658ff988e5d

OST

 [2610.879750] LustreError: 166-1: MGC10.240.29.151@tcp: Connection to MGS (at 10.240.29.151@tcp) was lost; in progress operations using this service will fail
....
[ 3110.002828] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-ost1; mount -t lustre -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
[ 3110.394926] LDISKFS-fs (dm-11): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
[ 3126.822501] LustreError: 160912:0:(mgc_request.c:253:do_config_log_add()) MGC10.240.29.151@tcp: failed processing log, type 1: rc = -5
[ 3138.086037] LustreError: 160912:0:(mgc_request.c:253:do_config_log_add()) MGC10.240.29.151@tcp: failed processing log, type 4: rc = -110

Client:

[ 2215.590905] LustreError: 166-1: MGC10.240.29.151@tcp: Connection to MGS (at 10.240.29.151@tcp) was lost; in progress operations using this service will fail
...
[ 2338.363151] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock onyx-117vm4@tcp:/lustre /mnt/lustre
[ 2344.540440] LustreError: 93429:0:(mgc_request.c:253:do_config_log_add()) MGC10.240.29.151@tcp: failed processing log, type 1: rc = -5
[ 2354.332045] LustreError: 93435:0:(mgc_request.c:612:do_requeue()) failed processing log: -5
[ 2375.259261] LustreError: 15c-8: MGC10.240.29.151@tcp: Confguration from log lustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
[ 2375.263041] Lustre: Unmounted lustre-client
[ 2375.264264] LustreError: 93429:0:(super25.c:176:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -5
Generated at Sat Feb 10 03:15:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.