[LU-8356] umount hang/stonith on mds failback Created: 30/Jun/16  Updated: 17/Aug/16  Resolved: 17/Aug/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: Alexander Boyko Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

during mds failover test, slow mds unmount on failback resulted in stonith.

00010000:00020000:1.0:1466436921.009410:0:33744:0:(ldlm_resource.c:764:ldlm_resource_complain()) mdt-snx11155-MDT0000_UUID: namespace resource [0x200034af6:0xfed1:0x0].ac32b32 (ffff880e2ba1f980) refcount nonzero (1) after lock cleanup; forcing cleanup.
00010000:00020000:1.0:1466436921.009411:0:33744:0:(ldlm_resource.c:1341:ldlm_resource_dump()) --- Resource: [0x200034af6:0xfed1:0x0].ac32b32 (ffff880e2ba1f980) refcount = 2
00010000:00020000:1.0:1466436921.009413:0:33744:0:(ldlm_resource.c:1344:ldlm_resource_dump()) Granted locks (in reverse order):
00010000:00020000:1.0:1466436921.009415:0:33744:0:(ldlm_resource.c:1347:ldlm_resource_dump()) ### ### ns: mdt-snx11155-MDT0000_UUID lock: ffff880f1f9ee0c0/0xda49ab87f04aa1d9 lrc: 2/0,1 mode: PW/PW res: [0x200034af6:0xfed1:0x0].ac32b32 bits 0x2 rrc: 3 type: IBT flags: 0x40316400000000 nid: local remote: 0x0 cl:  expref: -99 pid: 42438 timeout: 0 lvb_type: 0
00010000:02020000:1.0:1466436995.998826:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110)
00010000:02020000:1.0:1466437071.011840:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110)
00010000:02020000:1.0:1466437146.024812:0:33744:0:(ldlm_resource.c:822:__ldlm_namespace_free()) 0-0: Forced cleanup waiting for mdt-snx11155-MDT0000_UUID namespace with 44 resources in use, (rc=-110)
00000020:01000000:3.0:1466437225.618852:0:33744:0:(lprocfs_status_server.c:118:lprocfs_free_client_stats()) stat ffff88076f7fe640 - data ffff880747890740/ffff8807950caee0

umount slept at
mdt_fini()->ldlm_namespace_free_prior(). ldlm_namespace_free_prior() waited local locks to release namespace. A general mdt threads hold this locks and slept at osp_precreate_reserve(). But osp_precreate thread stop is called later after LDLM namespace sleep.



 Comments   
Comment by Gerrit Updater [ 30/Jun/16 ]

Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: http://review.whamcloud.com/21103
Subject: LU-8356 osp: wakeup osp_precreate_reserve on umount
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f0d51e3b43e7d54042b6141f71a5f927762745e0

Comment by Gerrit Updater [ 27/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/21103/
Subject: LU-8356 osp: wakeup osp_precreate_reserve on umount
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 098a19e8b3ce7595e3c5a9d671a8bf6928b12393

Comment by Peter Jones [ 17/Aug/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:16:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.