Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.9.0
    • 3
    • 9223372036854775807

    Description

      during mds failover test, slow mds unmount on failback resulted in stonith.

      00010000:00020000:1.0:1466436921.009410:0:33744:0:(ldlm_resource.c:764:ldlm_resource_complain()) mdt-snx11155-MDT0000_UUID: namespace resource [0x200034af6:0xfed1:0x0].ac32b32 (ffff880e2ba1f980) refcount nonzero (1) after lock cleanup; forcing cleanup.
      00010000:00020000:1.0:1466436921.009411:0:33744:0:(ldlm_resource.c:1341:ldlm_resource_dump()) --- Resource: [0x200034af6:0xfed1:0x0].ac32b32 (ffff880e2ba1f980) refcount = 2
      00010000:00020000:1.0:1466436921.009413:0:33744:0:(ldlm_resource.c:1344:ldlm_resource_dump()) Granted locks (in reverse order):
      00010000:00020000:1.0:1466436921.009415:0:33744:0:(ldlm_resource.c:1347:ldlm_resource_dump()) ### ### ns: mdt-snx11155-MDT0000_UUID lock: ffff880f1f9ee0c0/0xda49ab87f04aa1d9 lrc: 2/0,1 mode: PW/PW res: [0x200034af6:0xfed1:0x0].ac32b32 bits 0x2 rrc: 3 type: IBT flags: 0x40316400000000 nid: local remote: 0x0 cl:  expref: -99 pid: 42438 timeout: 0 lvb_type: 0
      00010000:02020000:1.0:1466436995.998826:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110)
      00010000:02020000:1.0:1466437071.011840:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110)
      00010000:02020000:1.0:1466437146.024812:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110)
      00000020:01000000:3.0:1466437225.618852:0:33744:0:(lprocfs_status_server.c:118:lprocfs_free_client_stats()) stat ffff88076f7fe640 - data ffff880747890740/ffff8807950caee0
      

      umount slept at
      mdt_fini()->ldlm_namespace_free_prior(). ldlm_namespace_free_prior() waited local locks to release namespace. A general mdt threads hold this locks and slept at osp_precreate_reserve(). But osp_precreate thread stop is called later after LDLM namespace sleep.

      Attachments

        Activity

          People

            wc-triage WC Triage
            aboyko Alexander Boyko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: