Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8990

Failback LBUG lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.10.0, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.3
    • Soak cluster lustre: 2.9.51_4_g39af202 - tip of master on 12/30
    • 3
    • 9223372036854775807

    Description

      System completed failover of lola-8 to lola-9 (MDS failover)
      Attempting to tailback, trigger assertion.

      <4>Lustre: Failing over soaked-MDT0000
      <3>LustreError: 34433:0:(osp_precreate.c:912:osp_precreate_cleanup_orphans()) soaked-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5
      <3>LustreError: 34433:0:(osp_precreate.c:912:osp_precreate_cleanup_orphans()) Skipped 10 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.126@o2ib100 (stopping)
      <6>Lustre: Skipped 3 previous similar messages
      <3>LustreError: 34664:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0000-mdtlov: statfs: rc = -108
      <3>LustreError: 34664:0:(lod_qos.c:208:lod_statfs_and_check()) Skipped 43 previous similar messages
      <3>LustreError: 34700:0:(ldlm_resource.c:882:ldlm_resource_complain()) mdt-soaked-MDT0000_UUID: namespace resource [0x20002ac2a:0x1fbe:0x0].0xe8519c29 (ffff8803f859fec0) refcount nonzero (1) after lock cleanup; forcing cleanup.
      <3>LustreError: 34700:0:(ldlm_resource.c:882:ldlm_resource_complain()) Skipped 5 previous similar messages
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0xe8519c29 (ffff8803f859fec0) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1469:ldlm_resource_dump()) ### ### ns: mdt-soaked-MDT0000_UUID lock: ffff8804001e4b40/0xda913c453295768d lrc: 2/0,1 mode: PW/PW res: [0x20002ac2a:0x1fbe:0x0].0xe8519c29 bits 0x2 rrc: 2 type: IBT flags: 0x40316400000000 nid: local remote: 0x0 expref: -99 pid: 34549 timeout: 0 lvb_type: 0
      <3>LustreError: 34700:0:(ldlm_resource.c:1469:ldlm_resource_dump()) Skipped 4 previous similar messages
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac14:0xd2e4:0x0].0x786cbc9c (ffff8807f5e87b00) refcount = 5
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1484:ldlm_resource_dump()) Waiting locks:
      <3>LustreError: 34700:0:(ldlm_resource.c:1486:ldlm_resource_dump()) ### ### ns: mdt-soaked-MDT0000_UUID lock: ffff880836be2b80/0xda913c4532957639 lrc: 2/0,1 mode: --/PW res: [0x20002ac14:0xd2e4:0x0].0x786cbc9c bits 0x2 rrc: 5 type: IBT flags: 0x40316400000020 nid: local remote: 0x0 expref: -99 pid: 34463 timeout: 0 lvb_type: 0
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0xc7d60cfe (ffff8807f5e87a40) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0x0 (ffff88041165ac00) refcount = 9
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0x78a9f709 (ffff880401cd9b40) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac2a:0x1fbe:0x0].0xd5a94b89 (ffff8803f859fbc0) refcount = 2
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 34700:0:(ldlm_resource.c:1463:ldlm_resource_dump()) --- Resource: [0x20002ac14:0xd2e4:0x0].0x0 (ffff8807fbdccf00) refcount = 9
      <3>LustreError: 34700:0:(ldlm_resource.c:1466:ldlm_resource_dump()) Granted locks (in reverse order):
      <3>LustreError: 9135:0:(client.c:1166:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff8803f0bb0c80 x1555552176176736/t0(0) o13->soaked-OST000f-osc-MDT0000@192.168.1.105@o2ib10:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
      <3>LustreError: 9135:0:(client.c:1166:ptlrpc_import_delay_req()) Skipped 6 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.107@o2ib10 (stopping)
      <6>Lustre: Skipped 7 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.117@o2ib100 (stopping)
      <6>Lustre: Skipped 15 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.130@o2ib100 (stopping)
      <6>Lustre: Skipped 6 previous similar messages
      <3>LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0000_UUID namespace with 6 resources in use, (rc=-110)
      <3>LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0000_UUID namespace with 6 resources in use, (rc=-110)
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.133@o2ib100 (stopping)
      <6>Lustre: Skipped 9 previous similar messages
      <6>Lustre: soaked-MDT0000: Not available for connect from 192.168.1.106@o2ib10 (stopping)
      <6>Lustre: Skipped 23 previous similar messages
      <3>LustreError: 0-0: Forced cleanup waiting for mdt-soaked-MDT0000_UUID namespace with 6 resources in use, (rc=-110)
      <3>LustreError: 34661:0:(lod_qos.c:208:lod_statfs_and_check()) soaked-MDT0000-mdtlov: statfs: rc = -108
      <3>LustreError: 34661:0:(lod_qos.c:208:lod_statfs_and_check()) Skipped 201 previous similar messages
      <0>LustreError: 34700:0:(mdt_handler.c:4565:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed:
      <0>LustreError: 9095:0:(lod_dev.c:1654:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: lu is ffff8803ffd54000
      <0>LustreError: 9095:0:(lod_dev.c:1654:lod_device_free()) LBUG
      <4>Pid: 9095, comm: obd_zombid
      

      A crash dump is available on lola-9 -/var/crash/127.0.0.1-2017-01-05-05:48:01

      Attachments

        1. lu-8990.txt
          20 kB
          Cliff White
        2. vmcore-dmesg.txt
          177 kB
          Cliff White

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: