Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10326

sanity test 60a times out on ‘umount -d /mnt/lustre-mds1’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0
    • Ubuntu Lustre clients
    • 3
    • 9223372036854775807

    Description

      sanity test 60a hangs on unmount of the MDS for Ubuntu clients only. The last thing seen in the client test log is

      NOW reload debugging syms..
      CMD: trevis-18vm4 /usr/sbin/lctl dk
      CMD: trevis-18vm4 which llog_reader 2> /dev/null
      CMD: trevis-18vm4 grep -c /mnt/lustre-mds1' ' /proc/mounts
      Stopping /mnt/lustre-mds1 (opts:) on trevis-18vm4
      CMD: trevis-18vm4 umount -d /mnt/lustre-mds1
      

      Looking at the dmesg log on the MDS (vm4), all llog_test.c tests ran and completed, but we experienced issues when cleaning up/unmounting the MDS:

      [ 3162.697626] Lustre: DEBUG MARKER: ls -d /sbin/llog_reader
      [ 3163.062999] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts
      [ 3163.339465] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
      [ 3163.485523] LustreError: 25441:0:(ldlm_resource.c:1094:ldlm_resource_complain()) lustre-MDT0000-lwp-MDT0000: namespace resource [0x200000006:0x1010000:0x0].0x0 (ffff88005b72b6c0) refcount nonzero (1) after lock cleanup; forcing cleanup.
      [ 3163.489978] LustreError: 25441:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000006:0x1010000:0x0].0x0 (ffff88005b72b6c0) refcount = 2
      [ 3163.493845] LustreError: 25441:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
      [ 3163.496226] LustreError: 25441:0:(ldlm_resource.c:1682:ldlm_resource_dump()) ### ### ns: lustre-MDT0000-lwp-MDT0000 lock: ffff88005b6a6d80/0xa7b5899cf22cc13d lrc: 2/1,0 mode: CR/CR res: [0x200000006:0x1010000:0x0].0x0 rrc: 3 type: PLN flags: 0x1106400000000 nid: local remote: 0xa7b5899cf22cc1bb expref: -99 pid: 12740 timeout: 0 lvb_type: 2
      [ 3163.502995] LustreError: 25441:0:(ldlm_resource.c:1676:ldlm_resource_dump()) --- Resource: [0x200000006:0x10000:0x0].0x0 (ffff880000047e40) refcount = 2
      [ 3163.507253] LustreError: 25441:0:(ldlm_resource.c:1679:ldlm_resource_dump()) Granted locks (in reverse order):
      [ 3163.509922] Lustre: Failing over lustre-MDT0000
      [ 3165.457763] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.212@tcp (stopping)
      [ 3165.572163] LustreError: 25441:0:(genops.c:436:class_free_dev()) Cleanup lustre-QMT0000 returned -95
      [ 3165.574521] LustreError: 25441:0:(genops.c:436:class_free_dev()) Skipped 1 previous similar message
      [ 3167.767456] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.213@tcp (stopping)
      [ 3167.770166] Lustre: Skipped 6 previous similar messages
      [ 3170.455396] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.212@tcp (stopping)
      [ 3172.756491] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.213@tcp (stopping)
      [ 3172.759177] Lustre: Skipped 7 previous similar messages
      

      This failure started on October 27, 2017 2.10.54 for master branch and on November 27, 2017 2.10.2 RC1 for b2_10

      Logs for this failure are at:
      master branch
      https://testing.hpdd.intel.com/test_sets/7f763f4e-d76b-11e7-a066-52540065bddc
      https://testing.hpdd.intel.com/test_sets/6860bbbe-d021-11e7-a066-52540065bddc (interop with 2.9.0)
      https://testing.hpdd.intel.com/test_sets/591a4292-ca59-11e7-9840-52540065bddc
      https://testing.hpdd.intel.com/test_sets/54b95766-bb6c-11e7-84a9-52540065bddc

      b2_10
      https://testing.hpdd.intel.com/test_sets/bbeaa6be-d459-11e7-9c63-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: