[LU-15741] replay-dual test_0b: mgc_request.c:253:do_config_log_add()) MGC10.240.41.18@tcp: failed processing log Created: 14/Apr/22  Updated: 12/Jun/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15375 Interop: replay-dual test_0b: test_0b... Open
is related to LU-16874 replay-dual test_0b: FAIL: mount1 fais Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Cliff White <cwhite@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a6e98d16-acd5-4901-b6a3-45addaa864f8

The failure is for replay-dual test 0b.

Mount fails on client, may be issue with test systems

[ 8922.500173] Lustre: Unmounted lustre-client
[ 8955.589567] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1994-05.com.redhat:de2392d78ab4,i,0x00023d000001,iqn.2018-06.com.trevis-59vm1:target,t,0x01
[ 8963.157715] iSCSI/iqn.1994-05.com.redhat:de2392d78ab4: Unsupported SCSI Opcode 0xa3, sending CHECK_CONDITION.
[ 8979.897072] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
[ 8979.899008] Lustre: Skipped 7 previous similar messages
[ 8990.171882] Lustre: Unmounted lustre-client
[ 8990.181417] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
[ 8990.193085] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-59vm7:trevis-59vm8:/lustre /mnt/lustre
[ 8996.355848] LustreError: 124531:0:(mgc_request.c:253:do_config_log_add()) MGC10.240.41.18@tcp: failed processing log, type 1: rc = -5
[ 9007.619313] LustreError: 15c-8: MGC10.240.41.18@tcp: Confguration from log lustre-client failed from MGS -5. Communication error between node & MGS, a bad configuration, or other errors. See syslog for more info
[ 9007.623006] Lustre: Unmounted lustre-client
[ 9007.624191] LustreError: 124531:0:(super25.c:176:lustre_fill_super()) llite: Unable to mount <unknown>: rc = -5
[ 9008.020069] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-dual test_0b: @@@@@@ FAIL: mount1 fais 
[ 9008.486992] Lustre: DEBUG MARKER: replay-dual test_0b: @@@@@@ FAIL: mount1 fais


 Comments   
Comment by Sarah Liu [ 08/Jun/22 ]

similar but seems caused by network issue
https://testing.whamcloud.com/test_sets/38dfa3a3-fc22-4e35-909d-d07ef37a5178

Command completed successfully
waiting ping -c 1 -w 3 trevis-33vm8, 900 secs left ...
waiting ping -c 1 -w 3 trevis-33vm8, 895 secs left ...
waiting ping -c 1 -w 3 trevis-33vm8, 890 secs left ...
CMD: trevis-33vm8 hostname
trevis-33vm8: ssh: connect to host trevis-33vm8 port 22: Connection refused
pdsh@trevis-33vm1: trevis-33vm8: ssh exited with exit code 255
CMD: trevis-33vm8 hostname
Failover mds1 to trevis-33vm7
CMD: trevis-33vm7 hostname
mount facets: mds1
CMD: trevis-33vm7 lsmod | grep zfs >&/dev/null || modprobe zfs;
			zpool list -H lustre-mdt1 >/dev/null 2>&1 ||
			zpool import -f -o cachefile=none -o failmode=panic -d /dev/lvm-Role_MDS lustre-mdt1
CMD: trevis-33vm7 zfs get -H -o value 						lustre:svname lustre-mdt1/mdt1
Starting mds1: -o localrecov  lustre-mdt1/mdt1 /mnt/lustre-mds1
CMD: trevis-33vm7 mkdir -p /mnt/lustre-mds1; mount -t lustre -o localrecov  lustre-mdt1/mdt1 /mnt/lustre-mds1
CMD: trevis-33vm7 /usr/sbin/lctl get_param -n health_check
CMD: trevis-33vm7 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config bash rpc.sh set_default_debug \"-1\" \"all\" 4 
trevis-33vm7: CMD: trevis-33vm7 /usr/sbin/lctl get_param -n version 2>/dev/null
trevis-33vm7: CMD: trevis-33vm7 /usr/sbin/lctl get_param -n version 2>/dev/null
trevis-33vm7: CMD: trevis-33vm5 /usr/sbin/lctl get_param -n version 2>/dev/null
trevis-33vm7: CMD: trevis-33vm7.trevis.whamcloud.com /usr/sbin/lctl get_param -n version 2>/dev/null
trevis-33vm7: trevis-33vm7.trevis.whamcloud.com: executing set_default_debug -1 all 4
CMD: trevis-33vm7 zfs get -H -o value 				lustre:svname lustre-mdt1/mdt1 2>/dev/null | 				grep -E ':[a-zA-Z]{3}[0-9]{4}'
pdsh@trevis-33vm1: trevis-33vm7: ssh exited with exit code 1
CMD: trevis-33vm7 zfs get -H -o value lustre:svname 		                           lustre-mdt1/mdt1 2>/dev/null
Started lustre-MDT0000
Starting client: trevis-33vm1.trevis.whamcloud.com:  -o user_xattr,flock trevis-33vm7:trevis-33vm8:/lustre /mnt/lustre
CMD: trevis-33vm1.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-33vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-33vm7:trevis-33vm8:/lustre /mnt/lustre
mount.lustre: mount trevis-33vm7:trevis-33vm8:/lustre at /mnt/lustre failed: Input/output error
Is the MGS running?
 replay-dual test_0b: @@@@@@ FAIL: mount1 fais 
Generated at Sat Feb 10 03:20:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.