[LU-16753] replay-single: test_135 timeout Created: 20/Apr/23  Updated: 25/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16536 MDS umount can get stuck due to LDLM ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Lai Siyao <lai.siyao@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/92be109d-b483-4d74-9042-171b3c44c8d3

Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/94117 - 4.18.0-425.10.1.el8_7.x86_64
servers: https://build.whamcloud.com/job/lustre-reviews/94117 - 4.18.0-425.10.1.el8_lustre.x86_64

== replay-single test 135: Server failure in lock replay phase ========================================================== 17:17:59 (1681924679)
...
Failover ost1 to onyx-78vm3
CMD: onyx-78vm3 hostname
CMD: onyx-78vm3 /usr/sbin/lctl set_param fail_val=20
fail_val=20
CMD: onyx-78vm3 /usr/sbin/lctl set_param fail_loc=0x32d
fail_loc=0x32d
CMD: onyx-78vm3 dmsetup status /dev/mapper/ost1_flakey >/dev/null 2>&1
CMD: onyx-78vm3 dmsetup status /dev/mapper/ost1_flakey 2>&1
CMD: onyx-78vm3 dmsetup table /dev/mapper/ost1_flakey
CMD: onyx-78vm3 dmsetup suspend --nolockfs --noflush /dev/mapper/ost1_flakey
CMD: onyx-78vm3 dmsetup load /dev/mapper/ost1_flakey --table \"0 20111360 linear 252:0 0\"
CMD: onyx-78vm3 dmsetup resume /dev/mapper/ost1_flakey
CMD: onyx-78vm3 test -b /dev/mapper/ost1_flakey
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey
Starting ost1: -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: onyx-78vm3 mkdir -p /mnt/lustre-ost1; mount -t lustre -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey 2>/dev/null
CMD: onyx-78vm3 /usr/sbin/lctl set_param 				seq.cli-lustre-OST0000-super.width=0x3fff
seq.cli-lustre-OST0000-super.width=0x3fff
CMD: onyx-78vm3 /usr/sbin/lctl get_param -n health_check
CMD: onyx-78vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config 		TESTLOG_PREFIX=/autotest/autotest-1/2023-04-19/lustre-reviews_review-dne-part-6_94117_8_7aa47c47-d979-4023-84f3-0d3eb4670464//replay-single TESTNAME=test_135 		bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4 
onyx-78vm3: onyx-78vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
pdsh@onyx-78vm1: onyx-78vm3: ssh exited with exit code 1
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey 2>/dev/null
Started lustre-OST0000
CMD: onyx-78vm3 grep -c /mnt/lustre-ost1' ' /proc/mounts || true
Stopping /mnt/lustre-ost1 (opts:) on onyx-78vm3
CMD: onyx-78vm3 umount -d /mnt/lustre-ost1
CMD: onyx-78vm3 lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
Failover ost1 to onyx-78vm3
CMD: onyx-78vm3 hostname
CMD: onyx-78vm3 /usr/sbin/lctl set_param fail_loc=0
fail_loc=0
CMD: onyx-78vm3 dmsetup status /dev/mapper/ost1_flakey >/dev/null 2>&1
CMD: onyx-78vm3 dmsetup status /dev/mapper/ost1_flakey 2>&1
CMD: onyx-78vm3 test -b /dev/mapper/ost1_flakey
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey
Starting ost1: -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: onyx-78vm3 mkdir -p /mnt/lustre-ost1; mount -t lustre -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey 2>/dev/null
CMD: onyx-78vm3 /usr/sbin/lctl set_param 				seq.cli-lustre-OST0000-super.width=0x3fff
seq.cli-lustre-OST0000-super.width=0x3fff
CMD: onyx-78vm3 /usr/sbin/lctl get_param -n health_check
CMD: onyx-78vm3 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/openmpi/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/sbin:/bin::/sbin:/bin:/usr/sbin: NAME=autotest_config 		TESTLOG_PREFIX=/autotest/autotest-1/2023-04-19/lustre-reviews_review-dne-part-6_94117_8_7aa47c47-d979-4023-84f3-0d3eb4670464//replay-single TESTNAME=test_135 		bash rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all\" 4 
onyx-78vm3: onyx-78vm3.onyx.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey 				2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
pdsh@onyx-78vm1: onyx-78vm3: ssh exited with exit code 1
CMD: onyx-78vm3 e2label /dev/mapper/ost1_flakey 2>/dev/null
Started lustre-OST0000


 Comments   
Comment by Andreas Dilger [ 29/Jun/23 ]

Possibly also related to LU-16536.

Comment by Arshad Hussain [ 29/Nov/23 ]

+1 on master (https://testing.whamcloud.com/test_sets/12362470-f71f-41f4-be55-ea42efb827b0)

Generated at Sat Feb 10 03:29:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.