[LU-14336] recovery-small test 140a fails with 'no clients with recovery disabled' Created: 15/Jan/21  Updated: 06/Mar/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: failover

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

recovery-small test_140a fails with 'no clients with recovery disabled' for the failover test group. recovery-small test 140a started to fail with this error and zero clients with recovery disabled on 08 AUG 2020 with Lustre 2.13.55.16 at https://testing.whamcloud.com/test_sets/8fe3865b-6896-4b31-ac32-4358e509d2fc

Looking at the suite_log for a recent failure at https://testing.whamcloud.com/test_sets/d8b53f30-dbb3-4e3b-a3a2-01cf525d597f, the only sign of trouble is the rmdir failure for the second Lustre mount point

CMD: trevis-53vm8 /usr/sbin/lctl get_param mdt.*.exports.*.export
0 clients with recovery disabled
CMD: trevis-53vm7 grep -c /mnt/lustre2' ' /proc/mounts
Stopping client trevis-53vm7 /mnt/lustre2 (opts:)
CMD: trevis-53vm7 lsof -t /mnt/lustre2
pdsh@trevis-53vm1: trevis-53vm7: ssh exited with exit code 1
CMD: trevis-53vm7 umount  /mnt/lustre2 2>&1
CMD: trevis-53vm8 rmdir /mnt/lustre2
trevis-53vm8: rmdir: failed to remove '/mnt/lustre2': No such file or directory
pdsh@trevis-53vm1: trevis-53vm8: ssh exited with exit code 1
 recovery-small test_140a: @@@@@@ FAIL: no clients with recovery disabled 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
  = /usr/lib64/lustre/tests/recovery-small.sh:2935:test_140a()

Note that there are zero clients with recovery disabled, but there should be (at least?) one client with recovery disabled.

Looking at recovery-small test 140a, the code that is failing is

2925         # disable recovery for local clients
2926         # so local clients should be marked with no_recovery flag
2927         do_facet mds1 $LCTL set_param mdt.*.local_recovery=0
2928         mount_mds_client
2929 
2930         local cnt
2931         cnt=$(do_facet mds1 $LCTL get_param "mdt.*.exports.*.export" |
2932                 grep export_flags.*no_recovery | wc -l)
2933         echo "$cnt clients with recovery disabled"
2934         umount_mds_client
2935         [ $cnt -eq 0 ] && error "no clients with recovery disabled"

Generated at Sat Feb 10 03:08:51 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.