Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
Just about a week ago I started to see these assertions on master:
<4>[90142.530259] Lustre: DEBUG MARKER: == recovery-small test 27: fail LOV while using OSC's == 14:16:08 (1449602168) <4>[90144.006796] Lustre: Failing over lustre-MDT0000 <3>[90144.016902] LustreError: 528:0:(mdt_lib.c:493:old_init_ucred_common()) lustre-MDT0000: cli 7f023601-9100-9039-6ea4-5561717d207a/ffff880092a417f0 nodemap not set. <6>[90144.018588] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping) <0>[90144.116494] LustreError: 1400:0:(upcall_cache.c:378:upcall_cache_flush()) ASSERTION( !atomic_read(&entry->ue_refcount) ) failed: <0>[90144.117605] LustreError: 1400:0:(upcall_cache.c:378:upcall_cache_flush()) LBUG <4>[90144.118625] Pid: 1400, comm: umount <4>[90144.123256] <4>[90144.123257] Call Trace: <4>[90144.124529] [<ffffffffa0ada885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4>[90144.130691] [<ffffffffa0adae87>] lbug_with_loc+0x47/0xb0 [libcfs] <4>[90144.132956] [<ffffffffa0e13a64>] upcall_cache_flush+0x164/0x1c0 [obdclass] <4>[90144.134555] [<ffffffffa0e13ae0>] upcall_cache_cleanup+0x20/0xc0 [obdclass] <4>[90144.135671] [<ffffffffa08a6627>] mdt_device_fini+0x8c7/0x1450 [mdt] <4>[90144.136490] [<ffffffffa0dcfa1c>] ? class_disconnect_exports+0x17c/0x2f0 [obdclass] <4>[90144.138119] [<ffffffffa0de8d22>] class_cleanup+0x572/0xd20 [obdclass] <4>[90144.139591] [<ffffffffa0dcac8c>] ? class_name2dev+0x7c/0xe0 [obdclass] <4>[90144.140493] [<ffffffffa0deb246>] class_process_config+0x1d76/0x26d0 [obdclass] <4>[90144.142081] [<ffffffffa0ae6701>] ? libcfs_debug_msg+0x41/0x50 [libcfs] <4>[90144.143093] [<ffffffffa0dec05f>] class_manual_cleanup+0x4bf/0xe10 [obdclass] <4>[90144.144273] [<ffffffffa0dcac8c>] ? class_name2dev+0x7c/0xe0 [obdclass] <4>[90144.145183] [<ffffffffa0e2099c>] server_put_super+0x9bc/0xe80 [obdclass] <4>[90144.146170] [<ffffffff811b141a>] ? invalidate_inodes+0xfa/0x180 <4>[90144.147277] [<ffffffff8119564b>] generic_shutdown_super+0x5b/0xe0 <4>[90144.148106] [<ffffffff81195736>] kill_anon_super+0x16/0x60 <4>[90144.150251] [<ffffffffa0def756>] lustre_kill_super+0x36/0x60 [obdclass] <4>[90144.156662] [<ffffffff81195ed7>] deactivate_super+0x57/0x80 <4>[90144.158768] [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110 <4>[90144.162013] [<ffffffff811b699b>] sys_umount+0x7b/0x3a0 <4>[90144.164183] [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b <4>[90144.166668]
I think this is introduced by http://review.whamcloud.com/16802 :
+ identity = mdt_identity_get(mdt->mdt_identity_cache, ... + uc->uc_identity = identity; + if (nodemap == NULL) { + CERROR("%s: cli %s/%p nodemap not set.\n", + mdt2obd_dev(mdt)->obd_name, + info->mti_exp->exp_client_uuid.uuid, info->mti_exp); + RETURN(-EACCES);
So we are leaking the identity refcount in this case where as we did not have that before.
Probably just need to move the if (nodemap==NULL) check to the start of the function before the mdt_get_identity call?
Attachments
Issue Links
- is related to
-
LU-7199 Null pointer dereference in old_init_ucred
- Resolved