[LU-1429] LBUG while unmounting client Created: 21/May/12  Updated: 27/Sep/12  Resolved: 18/Jun/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Roger Spellman (Inactive) Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre servers are running 2.6.32-220.el6, with Lustre 2.1.1.rc4.
Lustre clients are running 2.6.38.2, with special code created for this release, with http://review.whamcloud.com/#change,2170. (patch 8)


Severity: 3
Rank (Obsolete): 6392

 Description   

The customer reports:

I was unmounting all of the lustre clients this morning in preparation of upgrading the system to test a bug fix when one of the clients panic'd:

2012-05-21 14:19:56 +0000 [3286924.778603] LustreError: 21993:0:(ldlm_request.c:1170:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
2012-05-21 14:19:56 +0000 [3286924.836572] LustreError: 21993:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
2012-05-21 14:19:56 +0000 [3286924.885902] LustreError: 21993:0:(ldlm_request.c:1170:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
2012-05-21 14:19:56 +0000 [3286924.942180] LustreError: 21993:0:(ldlm_request.c:1796:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
2012-05-21 14:19:56 +0000 [3286925.029419] LustreError: 21993:0:(ldlm_lock.c:1749:ldlm_lock_cancel()) ### lock still has references ns: xxxxxxx-MDT0000-mdc-ffff880baa687800 lock: ffff88179ce22ac0/0xf44cc339ed65dcfe lrc: 4/1,0 mode: PR/PR res: 8593723616/109284 bits 0x3 rrc: 2 type: IBT flags: 0x22002890 remote: 0x45adc59ebc8ce531 expref: -99 pid: 22254 timeout: 0
2012-05-21 14:19:56 +0000 [3286925.178583] LustreError: 21993:0:(ldlm_lock.c:1750:ldlm_lock_cancel()) LBUG
2012-05-21 14:19:56 +0000 [3286925.214528] Pid: 21993, comm: umount
2012-05-21 14:19:56 +0000 [3286925.233175]
2012-05-21 14:19:56 +0000 [3286925.233176] Call Trace:
2012-05-21 14:19:56 +0000 [3286925.254280] [<ffffffffa04da875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2012-05-21 14:19:56 +0000 [3286925.290628] [<ffffffffa04dada7>] lbug_with_loc+0x47/0xc0 [libcfs]
2012-05-21 14:19:56 +0000 [3286925.322003] [<ffffffffa06ff50a>] ldlm_lock_cancel+0x13a/0x140 [ptlrpc]
2012-05-21 14:19:56 +0000 [3286925.356112] [<ffffffffa07158ff>] ldlm_cli_cancel_local+0xbf/0x310 [ptlrpc]
2012-05-21 14:19:56 +0000 [3286925.392166] [<ffffffffa0719542>] ldlm_cli_cancel+0x62/0x340 [ptlrpc]
2012-05-21 14:19:56 +0000 [3286925.424934] [<ffffffffa0706455>] cleanup_resource+0x175/0x2d0 [ptlrpc]
2012-05-21 14:19:56 +0000 [3286925.460116] [<ffffffff8112b8de>] ? __free_slab+0xce/0x180
2012-05-21 14:19:57 +0000 [3286925.488137] [<ffffffffa07065da>] ldlm_resource_clean+0x2a/0x50 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.524533] [<ffffffffa04e8cce>] cfs_hash_for_each_relax+0x18e/0x350 [libcfs]
2012-05-21 14:19:57 +0000 [3286925.562352] [<ffffffffa07065b0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.597063] [<ffffffffa07065b0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.632590] [<ffffffffa04eb6a8>] cfs_hash_for_each_nolock+0x88/0x180 [libcfs]
2012-05-21 14:19:57 +0000 [3286925.671295] [<ffffffffa0703a39>] ldlm_namespace_cleanup+0x29/0x80 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.706770] [<ffffffffa07048c7>] __ldlm_namespace_free+0x57/0x470 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.743179] [<ffffffffa07188d0>] ? ldlm_cli_hash_cancel_unused+0x0/0x70 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.782004] [<ffffffffa07188d0>] ? ldlm_cli_hash_cancel_unused+0x0/0x70 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.819882] [<ffffffffa07188d0>] ? ldlm_cli_hash_cancel_unused+0x0/0x70 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.858343] [<ffffffffa04eb6b0>] ? cfs_hash_for_each_nolock+0x90/0x180 [libcfs]
2012-05-21 14:19:57 +0000 [3286925.896777] [<ffffffffa070542a>] ldlm_namespace_free_prior+0x7a/0x190 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.933682] [<ffffffffa070d8b5>] client_disconnect_export+0x1d5/0x350 [ptlrpc]
2012-05-21 14:19:57 +0000 [3286925.972642] [<ffffffffa0c4e615>] lmv_disconnect+0x505/0x9a0 [lmv]
2012-05-21 14:19:57 +0000 [3286926.004722] [<ffffffffa0b8651e>] client_common_put_super+0x4fe/0xb30 [lustre]
2012-05-21 14:19:57 +0000 [3286926.041470] [<ffffffffa0b86c33>] ll_put_super+0xe3/0x280 [lustre]
2012-05-21 14:19:57 +0000 [3286926.073663] [<ffffffff81151ab6>] ? destroy_inode+0x36/0x60
2012-05-21 14:19:57 +0000 [3286926.102214] [<ffffffff811524fb>] ? dispose_list+0xdb/0x100
2012-05-21 14:19:57 +0000 [3286926.131705] [<ffffffff8113b1c2>] generic_shutdown_super+0x72/0x100
2012-05-21 14:19:57 +0000 [3286926.163750] [<ffffffff8113b2e6>] kill_anon_super+0x16/0x60
2012-05-21 14:19:57 +0000 [3286926.193020] [<ffffffffa05d8ef6>] lustre_kill_super+0x36/0x50 [obdclass]
2012-05-21 14:19:57 +0000 [3286926.227717] [<ffffffff8113ba45>] deactivate_locked_super+0x45/0x60
2012-05-21 14:19:57 +0000 [3286926.259521] [<ffffffff8113c9aa>] deactivate_super+0x4a/0x70
2012-05-21 14:19:57 +0000 [3286926.289284] [<ffffffff811567af>] mntput_no_expire+0x11f/0x190
2012-05-21 14:19:57 +0000 [3286926.318995] [<ffffffff81157cc8>] sys_umount+0x78/0x3c0
2012-05-21 14:19:57 +0000 [3286926.346323] [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
2012-05-21 14:19:57 +0000 [3286926.376895]
2012-05-21 14:19:57 +0000 [3286926.385435] Kernel panic - not syncing: LBUG
2012-05-21 14:19:57 +0000 [3286926.408223] Pid: 21993, comm: umount Tainted: G W 2.6.38.2-ts4 #11
2012-05-21 14:19:57 +0000 [3286926.443832] Call Trace:
2012-05-21 14:19:57 +0000 [3286926.458387] [<ffffffff8145a373>] ? panic+0x91/0x19c
2012-05-21 14:19:58 +0000 [3286926.483917] [<ffffffffa04dae0b>] ? lbug_with_loc+0xab/0xc0 [libcfs]
2012-05-21 14:19:58 +0000 [3286926.517230] [<ffffffffa06ff50a>] ? ldlm_lock_cancel+0x13a/0x140 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.551815] [<ffffffffa07158ff>] ? ldlm_cli_cancel_local+0xbf/0x310 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.589084] [<ffffffffa0719542>] ? ldlm_cli_cancel+0x62/0x340 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.623771] [<ffffffffa0706455>] ? cleanup_resource+0x175/0x2d0 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.658412] [<ffffffff8112b8de>] ? __free_slab+0xce/0x180
2012-05-21 14:19:58 +0000 [3286926.687756] [<ffffffffa07065da>] ? ldlm_resource_clean+0x2a/0x50 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.723131] [<ffffffffa04e8cce>] ? cfs_hash_for_each_relax+0x18e/0x350 [libcfs]
2012-05-21 14:19:58 +0000 [3286926.762757] [<ffffffffa07065b0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.798172] [<ffffffffa07065b0>] ? ldlm_resource_clean+0x0/0x50 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.832517] [<ffffffffa04eb6a8>] ? cfs_hash_for_each_nolock+0x88/0x180 [libcfs]
2012-05-21 14:19:58 +0000 [3286926.870484] [<ffffffffa0703a39>] ? ldlm_namespace_cleanup+0x29/0x80 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.907478] [<ffffffffa07048c7>] ? __ldlm_namespace_free+0x57/0x470 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.943632] [<ffffffffa07188d0>] ? ldlm_cli_hash_cancel_unused+0x0/0x70 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286926.982120] [<ffffffffa07188d0>] ? ldlm_cli_hash_cancel_unused+0x0/0x70 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286927.020929] [<ffffffffa07188d0>] ? ldlm_cli_hash_cancel_unused+0x0/0x70 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286927.058688] [<ffffffffa04eb6b0>] ? cfs_hash_for_each_nolock+0x90/0x180 [libcfs]
2012-05-21 14:19:58 +0000 [3286927.097335] [<ffffffffa070542a>] ? ldlm_namespace_free_prior+0x7a/0x190 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286927.136255] [<ffffffffa070d8b5>] ? client_disconnect_export+0x1d5/0x350 [ptlrpc]
2012-05-21 14:19:58 +0000 [3286927.174598] [<ffffffffa0c4e615>] ? lmv_disconnect+0x505/0x9a0 [lmv]
2012-05-21 14:19:58 +0000 [3286927.207396] [<ffffffffa0b8651e>] ? client_common_put_super+0x4fe/0xb30 [lustre]
2012-05-21 14:19:58 +0000 [3286927.245652] [<ffffffffa0b86c33>] ? ll_put_super+0xe3/0x280 [lustre]
2012-05-21 14:19:58 +0000 [3286927.278089] [<ffffffff81151ab6>] ? destroy_inode+0x36/0x60
2012-05-21 14:19:58 +0000 [3286927.306992] [<ffffffff811524fb>] ? dispose_list+0xdb/0x100
2012-05-21 14:19:58 +0000 [3286927.335308] [<ffffffff8113b1c2>] ? generic_shutdown_super+0x72/0x100
2012-05-21 14:19:58 +0000 [3286927.368598] [<ffffffff8113b2e6>] ? kill_anon_super+0x16/0x60
2012-05-21 14:19:58 +0000 [3286927.398226] [<ffffffffa05d8ef6>] ? lustre_kill_super+0x36/0x50 [obdclass]
2012-05-21 14:19:58 +0000 [3286927.433461] [<ffffffff8113ba45>] ? deactivate_locked_super+0x45/0x60
2012-05-21 14:19:59 +0000 [3286927.467365] [<ffffffff8113c9aa>] ? deactivate_super+0x4a/0x70
2012-05-21 14:19:59 +0000 [3286927.497891] [<ffffffff811567af>] ? mntput_no_expire+0x11f/0x190
2012-05-21 14:19:59 +0000 [3286927.529535] [<ffffffff81157cc8>] ? sys_umount+0x78/0x3c0
2012-05-21 14:19:59 +0000 [3286927.557320] [<ffffffff8100bfc2>] ? system_call_fastpath+0x16/0x1b



 Comments   
Comment by Peter Jones [ 22/May/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 23/May/12 ]

This looks to be a lock leak; I might have seen this before, did you remember what test have you done in this client?

Comment by Roger Spellman (Inactive) [ 18/Jun/12 ]

Lai, Do you think that this bug was fixed along with the fixes for 1328? The customer has not reported this problem recently, so you can probably close this one.

Comment by Lai Siyao [ 18/Jun/12 ]

Yes, it should be the same one. I'll close it.

Comment by Lai Siyao [ 18/Jun/12 ]

Duplicate of LU-1328.

Generated at Sat Feb 10 01:16:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.