[LU-16275] replay-dual 25 test fail when do test without RPM install Created: 28/Oct/22  Updated: 19/Jul/23  Resolved: 19/Jul/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Kevin Zhao Assignee: Kevin Zhao
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
== replay-dual test 25: replay|resend ==================== 02:48:05 (1663728485)
1+0 records in
1+0 records out
512 bytes copied, 0.0019952 s, 257 kB/s
fail_loc=0x304
fail_loc=0x304
fail_loc=0x304
CMD: lustre-xwcomty2-01 multiop /mnt/lustre2/f25.replay-dual Ow512
CMD: lustre-xwcomty2-05 lctl set_param fail_loc=0x80000325
fail_loc=0x80000325
Failing ost1 on lustre-xwcomty2-05
CMD: lustre-xwcomty2-05 grep -c /mnt/lustre-ost1' ' /proc/mounts || true
Stopping /mnt/lustre-ost1 (opts:) on lustre-xwcomty2-05
CMD: lustre-xwcomty2-05 umount -d /mnt/lustre-ost1
CMD: lustre-xwcomty2-05 lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
reboot facets: ost1
Failover ost1 to lustre-xwcomty2-05
mount facets: ost1
CMD: lustre-xwcomty2-05 dmsetup status /dev/mapper/ost1_flakey >/dev/null 2>&1
CMD: lustre-xwcomty2-05 dmsetup status /dev/mapper/ost1_flakey 2>&1
CMD: lustre-xwcomty2-05 test -b /dev/mapper/ost1_flakey
CMD: lustre-xwcomty2-05 e2label /dev/mapper/ost1_flakey
Starting ost1: -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: lustre-xwcomty2-05 mkdir -p /mnt/lustre-ost1; mount -t lustre -o localrecov  /dev/mapper/ost1_flakey /mnt/lustre-ost1
CMD: lustre-xwcomty2-05 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n health_check
lustre-xwcomty2-05: CMD: lustre-xwcomty2-03 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-05: CMD: lustre-xwcomty2-03 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-05: CMD: lustre-xwcomty2-05 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-05: CMD: lustre-xwcomty2-05 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-05: lustre-xwcomty2-05: executing set_default_debug -1 all 16
CMD: lustre-xwcomty2-05 e2label /dev/mapper/ost1_flakey                                 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}'
pdsh@lustre-xwcomty2-01: lustre-xwcomty2-05: ssh exited with exit code 1
CMD: lustre-xwcomty2-05 e2label /dev/mapper/ost1_flakey 2>/dev/null

Started lustre-OST0000
lustre-xwcomty2-01: CMD: lustre-xwcomty2-03 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-02: CMD: lustre-xwcomty2-03 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-02: CMD: lustre-xwcomty2-03 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-01: CMD: lustre-xwcomty2-03 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-02: CMD: lustre-xwcomty2-05 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-01: CMD: lustre-xwcomty2-05 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-02: CMD: lustre-xwcomty2-02 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-01: CMD: lustre-xwcomty2-01 /root/test/build-06244k/lustre/lustre/utils/lctl get_param -n version 2>/dev/null
lustre-xwcomty2-01: lustre-xwcomty2-01: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
lustre-xwcomty2-02: lustre-xwcomty2-02: executing wait_import_state_mount (FULL|IDLE) osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid
lustre-xwcomty2-01: CMD: lustre-xwcomty2-01 lctl get_param -n at_max
lustre-xwcomty2-01: osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL state after 0 sec
lustre-xwcomty2-02: CMD: lustre-xwcomty2-02 lctl get_param -n at_max
lustre-xwcomty2-02: osc.lustre-OST0000-osc-[-0-9a-f]*.ost_server_uuid in FULL
lustre-xwcomty2-02: IDLE state after 0 sec
multiop: no process found   

The error is due to`killall`

It can not find the process with name multiop, so that it will report no process found hang the test. When run the test without RPM, and just build from source, the multiop will be "lt-multiop", will have a prefix called “lt-“, so killall can not recognize it.

 

 



 Comments   
Comment by Gerrit Updater [ 28/Oct/22 ]

"Kevin Zhao <kevin.zhao@linaro.org>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48977
Subject: LU-16275 tests: Modify killall of replay-dual 25
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6adcbd2d92e97049a009c2d1090de9dd17dc206f

Comment by Gerrit Updater [ 19/Jul/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48977/
Subject: LU-16275 tests: Modify killall of replay-dual 25
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 90d44ad5b00fc18ae4702e11250a422cf165e2ae

Comment by Peter Jones [ 19/Jul/23 ]

Landed for 2.16

Generated at Sat Feb 10 03:25:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.