Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.10.6, Lustre 2.10.7
-
ppc64 clients
-
3
-
9223372036854775807
Description
sanity-scrub test_10a fails for ppc64 with “Fail to cleanup the env!”
Looking at a recent failure, https://testing.whamcloud.com/test_sets/2b0db972-4859-11e9-b98a-52540065bddc, we see that we can’t remove directories on the Lustre file system from a previous sanity-scrub test. From the suite_log, we see
rm: cannot remove '/mnt/lustre/d9.sanity-scrub/mds1': Directory not empty sanity-scrub test_10a: @@@@@@ FAIL: Fail to cleanup the env!
Looking at the OSS (vm1) console log, we see a Lustre error during test 9
[ 1095.831214] Lustre: DEBUG MARKER: trevis-26vm2.trevis.whamcloud.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4 [ 1104.386561] LustreError: 11824:0:(ldlm_resource.c:1146:ldlm_resource_complain()) lustre-MDT0000-lwp-OST0000: namespace resource [0x200000006:0x1020000:0x0].0x0 (ffff8a8765dfe600) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 1104.388585] LustreError: 11824:0:(ldlm_resource.c:1146:ldlm_resource_complain()) Skipped 1 previous similar message [ 1131.369609] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
On the console log for client 2 (vm9), we some messages
[ 1168.406302] Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l [ 1174.948324] Lustre: 3119:0:(mdc_request.c:1504:mdc_read_page()) Page-wide hash collision: 0xfeffffffffffffff [ 1174.948439] Lustre: 3119:0:(mdc_request.c:1504:mdc_read_page()) Skipped 54 previous similar messages [ 1176.178907] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-scrub test_10a: @@@@@@ FAIL: Fail to cleanup the env!
We see this issue only for ppc64 client testing. Note: Although this test has failed with the same message for non-ppc64 clients, in these cases several/most tests prior to 10a fail with not being able to clean up the environment
In some cases, we don’t see any of the above error messages. For example for a recent 2.10.7 RC1 failure at https://testing.whamcloud.com/test_sets/3f1ccaa6-4332-11e9-92fe-52540065bddc, we don’t see any of these error messages in test 9 nor test 10.
Other failures for sanity-scrub test 10a are at
https://testing.whamcloud.com/test_sets/4e833ba2-b72c-11e8-a7de-52540065bddc
https://testing.whamcloud.com/test_sets/d22cb2d0-e288-11e8-bfe1-52540065bddc
https://testing.whamcloud.com/test_sets/660c8ec2-2734-11e9-b97f-52540065bddc