Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0
-
Ubuntu Lustre clients
-
3
-
9223372036854775807
Description
sanity test 60a hangs on unmount of the MDS for Ubuntu clients only. The last thing seen in the client test log is
NOW reload debugging syms.. CMD: trevis-18vm4 /usr/sbin/lctl dk CMD: trevis-18vm4 which llog_reader 2> /dev/null CMD: trevis-18vm4 grep -c /mnt/lustre-mds1' ' /proc/mounts Stopping /mnt/lustre-mds1 (opts:) on trevis-18vm4 CMD: trevis-18vm4 umount -d /mnt/lustre-mds1
Looking at the dmesg log on the MDS (vm4), all llog_test.c tests ran and completed, but we experienced issues when cleaning up/unmounting the MDS:
[ 3162.697626] Lustre: DEBUG MARKER: ls -d /sbin/llog_reader [ 3163.062999] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts [ 3163.339465] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 [ 3163.485523] LustreError: 25441:0:(ldlm_resource.c:1094:ldlm_resource_complain()) lustre-MDT0000-lwp-MDT0000: namespace resource [0x200000006:0x1010000:0x0].0x0 (ffff88005b72b6c0) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 3163.489978] LustreError: 25441:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000006:0x1010000:0x0].0x0 (ffff88005b72b6c0) refcount = 2 [ 3163.493845] LustreError: 25441:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order): [ 3163.496226] LustreError: 25441:0:(ldlm_resource.c:1682:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-lwp-MDT0000 lock: ffff88005b6a6d80/0xa7b5899cf22cc13d lrc: 2/1,0 mode: CR/CR res: [0x200000006:0x1010000:0x0].0x0 rrc: 3 type: PLN flags: 0x1106400000000 nid: local remote: 0xa7b5899cf22cc1bb expref: -99 pid: 12740 timeout: 0 lvb_type: 2 [ 3163.502995] LustreError: 25441:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000006:0x10000:0x0].0x0 (ffff880000047e40) refcount = 2 [ 3163.507253] LustreError: 25441:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order): [ 3163.509922] Lustre: Failing over lustre-MDT0000 [ 3165.457763] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.212@tcp (stopping) [ 3165.572163] LustreError: 25441:0:(genops.c:436:class_free_dev()) Cleanup lustre-QMT0000 returned -95 [ 3165.574521] LustreError: 25441:0:(genops.c:436:class_free_dev()) Skipped 1 previous similar message [ 3167.767456] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.213@tcp (stopping) [ 3167.770166] Lustre: Skipped 6 previous similar messages [ 3170.455396] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.212@tcp (stopping) [ 3172.756491] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.213@tcp (stopping) [ 3172.759177] Lustre: Skipped 7 previous similar messages
This failure started on October 27, 2017 2.10.54 for master branch and on November 27, 2017 2.10.2 RC1 for b2_10
Logs for this failure are at:
master branch
https://testing.hpdd.intel.com/test_sets/7f763f4e-d76b-11e7-a066-52540065bddc
https://testing.hpdd.intel.com/test_sets/6860bbbe-d021-11e7-a066-52540065bddc (interop with 2.9.0)
https://testing.hpdd.intel.com/test_sets/591a4292-ca59-11e7-9840-52540065bddc
https://testing.hpdd.intel.com/test_sets/54b95766-bb6c-11e7-84a9-52540065bddc
b2_10
https://testing.hpdd.intel.com/test_sets/bbeaa6be-d459-11e7-9c63-52540065bddc