Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.9.0
-
3
-
9223372036854775807
Description
during mds failover test, slow mds unmount on failback resulted in stonith.
00010000:00020000:1.0:1466436921.009410:0:33744:0:(ldlm_resource.c:764:ldlm_resource_complain()) mdt-snx11155-MDT0000_UUID: namespace resource [0x200034af6:0xfed1:0x0].ac32b32 (ffff880e2ba1f980) refcount nonzero (1) after lock cleanup; forcing cleanup. 00010000:00020000:1.0:1466436921.009411:0:33744:0:(ldlm_resource.c:1341:ldlm_resource_dump()) --- Resource: [0x200034af6:0xfed1:0x0].ac32b32 (ffff880e2ba1f980) refcount = 2 00010000:00020000:1.0:1466436921.009413:0:33744:0:(ldlm_resource.c:1344:ldlm_resource_dump()) Granted locks (in reverse order): 00010000:00020000:1.0:1466436921.009415:0:33744:0:(ldlm_resource.c:1347:ldlm_resource_dump()) ### ### ns: mdt-snx11155-MDT0000_UUID lock: ffff880f1f9ee0c0/0xda49ab87f04aa1d9 lrc: 2/0,1 mode: PW/PW res: [0x200034af6:0xfed1:0x0].ac32b32 bits 0x2 rrc: 3 type: IBT flags: 0x40316400000000 nid: local remote: 0x0 cl: expref: -99 pid: 42438 timeout: 0 lvb_type: 0 00010000:02020000:1.0:1466436995.998826:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110) 00010000:02020000:1.0:1466437071.011840:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110) 00010000:02020000:1.0:1466437146.024812:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110) 00000020:01000000:3.0:1466437225.618852:0:33744:0:(lprocfs_status_server.c:118:lprocfs_free_client_stats()) stat ffff88076f7fe640 - data ffff880747890740/ffff8807950caee0
umount slept at
mdt_fini()->ldlm_namespace_free_prior(). ldlm_namespace_free_prior() waited local locks to release namespace. A general mdt threads hold this locks and slept at osp_precreate_reserve(). But osp_precreate thread stop is called later after LDLM namespace sleep.