[LU-13988] mdt_lock_handle_fini()) ASSERTION( !lustre_handle_is_used(&lh->mlh_reg_lh) ) failed Created: 25/Sep/20  Updated: 13/Jul/22  Resolved: 19/Nov/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Andriy Skulysh Assignee: Andriy Skulysh
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
00000100:00100000:1.0:1586690876.387325:0:13960:0:(service.c:2228:ptlrpc_server_handle_request()) Handling RPC req@ffff886d97baf180 pname:cluuid+ref:pid:xid:nid:opc:job mdt00_008:fd032a03-ac98-3bd0-9cda-ed98f0173f88+12716:20834:x1663761626790848:12345-192.168.103.15@tcp:101:truncate.0
00010000:00010000:1.0:1586690876.387331:0:13960:0:(ldlm_lockd.c:1186:ldlm_handle_enqueue0()) ### server-side enqueue handler START
00010000:00010000:1.0:1586690876.387333:0:13960:0:(ldlm_lockd.c:1229:ldlm_handle_enqueue0()) @@@ found existing lock cookie 0x9f8ceb1ed5a613fb  req@ffff886d97baf180 x1663761626790848/t0(0) o101->fd032a03-ac98-3bd0-9cda-ed98f0173f88@192.168.103.15@tcp:254/0 lens 480/0 e 0 to 0 dl 1586690889 ref 1 fl Interpret:/2/ffffffff rc 0/-1 job:'truncate.0'
00000004:00010000:1.0:1586690876.387353:0:13960:0:(mdt_handler.c:3879:mdt_intent_fixup_resent()) ### Restoring lock cookie ns: mdt-lustre-MDT0000_UUID lock: ffff886d74800240/0x9f8ceb1ed5a613fb lrc: 3/0,0 mode: PW/PW res: [0x200000404:0x16ae6:0x0].0x0 bits 0x40/0x0 rrc: 6 type: IBT flags: 0x40200000000000 nid: 192.168.103.15@tcp remote: 0x806313bfb96a7bd expref: 12716 pid: 13961 timeout: 0 lvb_type: 0
00000004:00010000:1.0:1586690876.387357:0:13960:0:(mdt_handler.c:3881:mdt_intent_fixup_resent()) @@@ restoring lock cookie 0x9f8ceb1ed5a613fb  req@ffff886d97baf180 x1663761626790848/t0(0) o101->fd032a03-ac98-3bd0-9cda-ed98f0173f88@192.168.103.15@tcp:254/0 lens 480/0 e 0 to 0 dl 1586690889 ref 1 fl Interpret:/2/ffffffff rc 0/-1 job:'truncate.0'
00000004:00040000:1.0:1586690876.387385:0:13960:0:(mdt_handler.c:3650:mdt_lock_handle_fini()) ASSERTION( !lustre_handle_is_used(&lh->mlh_reg_lh) ) failed:
00000004:00040000:1.0:1586690876.387395:0:13960:0:(mdt_handler.c:3650:mdt_lock_handle_fini()) LBUG
 #5 [ffff886db2d3bb10] mdt_thread_info_fini at ffffffffc1443206 [mdt]
    ffff886db2d3bb18: [ffff886e2ab5d000:kmalloc-4096] 00000000fffffffe
    ffff886db2d3bb28: ffff886e352ac510 ffff886db2d3bb70
    ffff886db2d3bb38: mdt_intent_policy+218
crash> p (int)0xfffffffe
$7 = -2 // ENOENT


 Comments   
Comment by Gerrit Updater [ 25/Sep/20 ]

Andriy Skulysh (c17819@cray.com) uploaded a new patch: https://review.whamcloud.com/40045
Subject: LU-13988 mdt: ASSERTION(!lustre_handle_is_used(&lh->mlh_reg_lh)) failed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d76286107678bc5eeb1833ff1c25c54a13fbe943

Comment by Gerrit Updater [ 19/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40045/
Subject: LU-13988 mdt: ASSERTION(!lustre_handle_is_used(&lh->mlh_reg_lh)) failed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 804a837775f5f3527546de92f57f57fc5be5bdf2

Comment by Peter Jones [ 19/Nov/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 01/Jul/22 ]

"DELBARY Gael <gael.delbary@cea.fr>" uploaded a new patch: https://review.whamcloud.com/47854
Subject: LU-13988 mdt: ASSERTION(!lustre_handle_is_used(&lh->mlh_reg_lh)) failed
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a74be3f45f1257873ec575b35956cdbc2d3670a5

Comment by DELBARY Gael [ 01/Jul/22 ]

We hit this LBUG in production. Root cause analysis (user source code) is in progress.

Comment by DELBARY Gael [ 04/Jul/22 ]

The workload is (for one user), under a directory (where the ldlm_lock is held during the crash):

  • mkdir temp1 (crash lock on temp1 parent)
  • create many files under temp1
  • create zip file from temp1 contents (ie rename from zip temporary file to final zip name, zip internals...)
  • unlink files inside temp1
  • rmdir temp1
  • in temp1 parent dir, some noise, many truncate on somes files
  • on one file 18000 close in 2 minutes
  • on 5 fid in the same directory, 10 truncates + close in loop until crash

It is not hopefully a valid workload for us...

Comment by DELBARY Gael [ 13/Jul/22 ]

Finally on our side, we are not in DoM path but we hit due to this patch https://review.whamcloud.com/47487.

Fix now. Anyway pretty useful in DoM workload.

 

 

Generated at Sat Feb 10 03:05:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.