[LU-7530] upcall_cache_flush()) ASSERTION( !atomic_read(&entry->ue_refcount) ) failed Created: 09/Dec/15 Updated: 14/Dec/15 Resolved: 14/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
Just about a week ago I started to see these assertions on master: <4>[90142.530259] Lustre: DEBUG MARKER: == recovery-small test 27: fail LOV while using OSC's == 14:16:08 (1449602168) <4>[90144.006796] Lustre: Failing over lustre-MDT0000 <3>[90144.016902] LustreError: 528:0:(mdt_lib.c:493:old_init_ucred_common()) lustre-MDT0000: cli 7f023601-9100-9039-6ea4-5561717d207a/ffff880092a417f0 nodemap not set. <6>[90144.018588] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) <0>[90144.116494] LustreError: 1400:0:(upcall_cache.c:378:upcall_cache_flush()) ASSERTION( !atomic_read(&entry->ue_refcount) ) failed: <0>[90144.117605] LustreError: 1400:0:(upcall_cache.c:378:upcall_cache_flush()) LBUG <4>[90144.118625] Pid: 1400, comm: umount <4>[90144.123256] <4>[90144.123257] Call Trace: <4>[90144.124529] [<ffffffffa0ada885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[90144.130691] [<ffffffffa0adae87>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[90144.132956] [<ffffffffa0e13a64>] upcall_cache_flush+0x164/0x1c0 [obdclass] <4>[90144.134555] [<ffffffffa0e13ae0>] upcall_cache_cleanup+0x20/0xc0 [obdclass] <4>[90144.135671] [<ffffffffa08a6627>] mdt_device_fini+0x8c7/0x1450 [mdt] <4>[90144.136490] [<ffffffffa0dcfa1c>] ? class_disconnect_exports+0x17c/0x2f0 [obdclass] <4>[90144.138119] [<ffffffffa0de8d22>] class_cleanup+0x572/0xd20 [obdclass] <4>[90144.139591] [<ffffffffa0dcac8c>] ? class_name2dev+0x7c/0xe0 [obdclass] <4>[90144.140493] [<ffffffffa0deb246>] class_process_config+0x1d76/0x26d0 [obdclass] <4>[90144.142081] [<ffffffffa0ae6701>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4>[90144.143093] [<ffffffffa0dec05f>] class_manual_cleanup+0x4bf/0xe10 [obdclass] <4>[90144.144273] [<ffffffffa0dcac8c>] ? class_name2dev+0x7c/0xe0 [obdclass] <4>[90144.145183] [<ffffffffa0e2099c>] server_put_super+0x9bc/0xe80 [obdclass] <4>[90144.146170] [<ffffffff811b141a>] ? invalidate_inodes+0xfa/0x180 <4>[90144.147277] [<ffffffff8119564b>] generic_shutdown_super+0x5b/0xe0 <4>[90144.148106] [<ffffffff81195736>] kill_anon_super+0x16/0x60 <4>[90144.150251] [<ffffffffa0def756>] lustre_kill_super+0x36/0x60 [obdclass] <4>[90144.156662] [<ffffffff81195ed7>] deactivate_super+0x57/0x80 <4>[90144.158768] [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110 <4>[90144.162013] [<ffffffff811b699b>] sys_umount+0x7b/0x3a0 <4>[90144.164183] [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b <4>[90144.166668] I think this is introduced by http://review.whamcloud.com/16802 : + identity = mdt_identity_get(mdt->mdt_identity_cache,
...
+ uc->uc_identity = identity;
+ if (nodemap == NULL) {
+ CERROR("%s: cli %s/%p nodemap not set.\n",
+ mdt2obd_dev(mdt)->obd_name,
+ info->mti_exp->exp_client_uuid.uuid, info->mti_exp);
+ RETURN(-EACCES);
So we are leaking the identity refcount in this case where as we did not have that before. |
| Comments |
| Comment by Gerrit Updater [ 09/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/17519 |
| Comment by Kit Westneat [ 09/Dec/15 ] |
|
Interesting, I wonder how the nodemap became null. I guess it must be that the client was disconnected by mdt_obd_disconnect while old_init_ucred was run in a different thread? |
| Comment by Oleg Drokin [ 09/Dec/15 ] |
|
Yes, that's plausible, all tests it usually fails on are related to doing stuff while disconnections/reconnections are initiated. |
| Comment by Gerrit Updater [ 13/Dec/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17519/ |
| Comment by Peter Jones [ 14/Dec/15 ] |
|
Landed for 2.8 |