[LU-10326] sanity test 60a times out on ‘umount -d /mnt/lustre-mds1’ Created: 04/Dec/17 Updated: 19/Mar/19 Resolved: 04/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | ubuntu | ||
| Environment: |
Ubuntu Lustre clients |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity test 60a hangs on unmount of the MDS for Ubuntu clients only. The last thing seen in the client test log is NOW reload debugging syms.. CMD: trevis-18vm4 /usr/sbin/lctl dk CMD: trevis-18vm4 which llog_reader 2> /dev/null CMD: trevis-18vm4 grep -c /mnt/lustre-mds1' ' /proc/mounts Stopping /mnt/lustre-mds1 (opts:) on trevis-18vm4 CMD: trevis-18vm4 umount -d /mnt/lustre-mds1 Looking at the dmesg log on the MDS (vm4), all llog_test.c tests ran and completed, but we experienced issues when cleaning up/unmounting the MDS: [ 3162.697626] Lustre: DEBUG MARKER: ls -d /sbin/llog_reader [ 3163.062999] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts [ 3163.339465] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1 [ 3163.485523] LustreError: 25441:0:(ldlm_resource.c:1094:ldlm_resource_complain()) lustre-MDT0000-lwp-MDT0000: namespace resource [0x200000006:0x1010000:0x0].0x0 (ffff88005b72b6c0) refcount nonzero (1) after lock cleanup; forcing cleanup. [ 3163.489978] LustreError: 25441:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000006:0x1010000:0x0].0x0 (ffff88005b72b6c0) refcount = 2 [ 3163.493845] LustreError: 25441:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order): [ 3163.496226] LustreError: 25441:0:(ldlm_resource.c:1682:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-lwp-MDT0000 lock: ffff88005b6a6d80/0xa7b5899cf22cc13d lrc: 2/1,0 mode: CR/CR res: [0x200000006:0x1010000:0x0].0x0 rrc: 3 type: PLN flags: 0x1106400000000 nid: local remote: 0xa7b5899cf22cc1bb expref: -99 pid: 12740 timeout: 0 lvb_type: 2 [ 3163.502995] LustreError: 25441:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000006:0x10000:0x0].0x0 (ffff880000047e40) refcount = 2 [ 3163.507253] LustreError: 25441:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order): [ 3163.509922] Lustre: Failing over lustre-MDT0000 [ 3165.457763] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.212@tcp (stopping) [ 3165.572163] LustreError: 25441:0:(genops.c:436:class_free_dev()) Cleanup lustre-QMT0000 returned -95 [ 3165.574521] LustreError: 25441:0:(genops.c:436:class_free_dev()) Skipped 1 previous similar message [ 3167.767456] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.213@tcp (stopping) [ 3167.770166] Lustre: Skipped 6 previous similar messages [ 3170.455396] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.212@tcp (stopping) [ 3172.756491] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.213@tcp (stopping) [ 3172.759177] Lustre: Skipped 7 previous similar messages This failure started on October 27, 2017 2.10.54 for master branch and on November 27, 2017 2.10.2 RC1 for b2_10 Logs for this failure are at: b2_10 |
| Comments |
| Comment by John Hammond [ 05/Dec/17 ] |
|
Likely due to the same cause as |
| Comment by Sarah Liu [ 17/May/18 ] |
|
+1 on b2_10 https://testing.hpdd.intel.com/test_sets/051774a8-5956-11e8-abc3-52540065bddc |
| Comment by Sarah Liu [ 30/May/18 ] |
|
In tag-2.11.52 SLES12sp3 server/client testing, sanity 60a failed as similar reason. sanity test_17g passed in the same session https://testing.hpdd.intel.com/test_sets/652db46e-5a74-11e8-abc3-52540065bddc |
| Comment by Sarah Liu [ 19/Mar/19 ] |
|
similar issue hit in interop testing of 2.10.7 |