[LU-14153] sanity test_280: "mount client failed" on review-dne-ssk review-dne-selinux-ssk Created: 27/Nov/20 Updated: 25/Jan/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | SSK | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4a0c244b-cc64-455a-9d41-a8cf9790407c test_280 failed with the following error: == sanity test 280: Race between MGS umount and client llog processing ===== 10.9.6.22@tcp:/lustre /mnt/lustre lustre rw,seclabel,flock,user_xattr,lazystatfs,noencrypt 0 0 CMD: trevis-45vm1.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client trevis-45vm1.trevis.whamcloud.com /mnt/lustre (opts:) CMD: trevis-45vm1.trevis.whamcloud.com lsof -t /mnt/lustre CMD: trevis-45vm1.trevis.whamcloud.com umount /mnt/lustre 2>&1 CMD: trevis-25vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,skpath=/tmp/test-framework-keys trevis-25vm4@tcp:/lustre /mnt/lustre : Starting client: trevis-25vm1.trevis.whamcloud.com: -o user_xattr,flock,skpath=/tmp/test-framework-keys trevis-25vm4@tcp:/lustre /mnt/lustre CMD: trevis-25vm1.trevis.whamcloud.com mkdir -p /mnt/lustre CMD: trevis-25vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,skpath=/tmp/test-framework-keys trevis-25vm4@tcp:/lustre /mnt/lustre mount.lustre: according to /etc/mtab trevis-25vm4@tcp:/lustre is already mounted on /mnt/lustre sanity test_280: @@@@@@ FAIL: mount client failed It looks like this is failing intermittently since 2020-07-30 (about 35 times over 4 months) for review-dne-ssk and review-dne-selinux-ssk sessions. Note that it does not happen for review-dne-selinux sessions, and the few failures on other test sessions look like they were related to many previous tests failing also. It may just be a test script issue (e.g. SSK is causing the client mount to be slower, and the race that the subtest is trying to trigger is happening differently as a result). VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 27/Nov/20 ] |
|
Comparing a passing and failing test run, it is clear that the failing test run has an additional "Starting client:" step, which leads to the "already mounted" error, because the passing test run shows that the client already has a mounted filesystem when mount_client() is called, so doesn't even try to mount it again: diff -u /tmp/passed /tmp/failed --- /tmp/passed 2020-11-26 22:38:25.000000000 -0700 +++ /tmp/failed 2020-11-26 22:38:26.000000000 -0700 @@ -33,6 +33,9 @@ pdsh@trevis-10vm1: trevis-10vm4: ssh exited with exit code 1 CMD: trevis-10vm4 e2label /dev/mapper/mds1_flakey 2>/dev/null Started lustre-MDT0000 -client@tcp:/lustre /mnt/lustre lustre rw,seclabel,flock,user_xattr,lazystatfs,noencrypt 0 0 -Resetting fail_loc on all nodes...CMD: trevis-10vm1.trevis.whamcloud.com,trevis-10vm2,trevis-10vm3,trevis-10vm4,trevis-10vm5 lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null -done. +Starting client: trevis-10vm1.trevis.whamcloud.com: -o user_xattr,flock,skpath=/tmp/test-framework-keys trevis-10vm4@tcp:/lustre /mnt/lustre +CMD: trevis-10vm1.trevis.whamcloud.com mkdir -p /mnt/lustre +CMD: trevis-10vm1.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,skpath=/tmp/test-framework-keys trevis-10vm4@tcp:/lustre /mnt/lustre +mount.lustre: according to /etc/mtab trevis-10vm4@tcp:/lustre is already mounted on /mnt/lustre + sanity test_280: @@@@@@ FAIL: mount client failed This definitely seems like a race in the test, since the failed test doesn't detect the mount in "mount_client()" but finds it later when zconf_mount() tries to mount. |
| Comment by Artem Blagodarenko (Inactive) [ 11/Dec/20 ] |
|
+1 https://testing.whamcloud.com/test_sets/2f812d63-e312-49fe-a7e7-67df1e801949 |
| Comment by Emoly Liu [ 20/Jan/21 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/32b28fef-bddf-422b-9c73-e677bce9cc50 |