Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7530

upcall_cache_flush()) ASSERTION( !atomic_read(&entry->ue_refcount) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Just about a week ago I started to see these assertions on master:

      <4>[90142.530259] Lustre: DEBUG MARKER: == recovery-small test 27: fail LOV while using OSC's == 14:16:08 (1449602168)
      <4>[90144.006796] Lustre: Failing over lustre-MDT0000
      <3>[90144.016902] LustreError: 528:0:(mdt_lib.c:493:old_init_ucred_common()) lustre-MDT0000: cli 7f023601-9100-9039-6ea4-5561717d207a/ffff880092a417f0 nodemap not set.
      <6>[90144.018588] Lustre: lustre-MDT0000: Not available for connect from 0@lo (stopping)
      <0>[90144.116494] LustreError: 1400:0:(upcall_cache.c:378:upcall_cache_flush()) ASSERTION( !atomic_read(&entry->ue_refcount) ) failed: 
      <0>[90144.117605] LustreError: 1400:0:(upcall_cache.c:378:upcall_cache_flush()) LBUG
      <4>[90144.118625] Pid: 1400, comm: umount
      <4>[90144.123256] 
      <4>[90144.123257] Call Trace:
      <4>[90144.124529]  [<ffffffffa0ada885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      <4>[90144.130691]  [<ffffffffa0adae87>] lbug_with_loc+0x47/0xb0 [libcfs]
      <4>[90144.132956]  [<ffffffffa0e13a64>] upcall_cache_flush+0x164/0x1c0 [obdclass]
      <4>[90144.134555]  [<ffffffffa0e13ae0>] upcall_cache_cleanup+0x20/0xc0 [obdclass]
      <4>[90144.135671]  [<ffffffffa08a6627>] mdt_device_fini+0x8c7/0x1450 [mdt]
      <4>[90144.136490]  [<ffffffffa0dcfa1c>] ? class_disconnect_exports+0x17c/0x2f0 [obdclass]
      <4>[90144.138119]  [<ffffffffa0de8d22>] class_cleanup+0x572/0xd20 [obdclass]
      <4>[90144.139591]  [<ffffffffa0dcac8c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      <4>[90144.140493]  [<ffffffffa0deb246>] class_process_config+0x1d76/0x26d0 [obdclass]
      <4>[90144.142081]  [<ffffffffa0ae6701>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      <4>[90144.143093]  [<ffffffffa0dec05f>] class_manual_cleanup+0x4bf/0xe10 [obdclass]
      <4>[90144.144273]  [<ffffffffa0dcac8c>] ? class_name2dev+0x7c/0xe0 [obdclass]
      <4>[90144.145183]  [<ffffffffa0e2099c>] server_put_super+0x9bc/0xe80 [obdclass]
      <4>[90144.146170]  [<ffffffff811b141a>] ? invalidate_inodes+0xfa/0x180
      <4>[90144.147277]  [<ffffffff8119564b>] generic_shutdown_super+0x5b/0xe0
      <4>[90144.148106]  [<ffffffff81195736>] kill_anon_super+0x16/0x60
      <4>[90144.150251]  [<ffffffffa0def756>] lustre_kill_super+0x36/0x60 [obdclass]
      <4>[90144.156662]  [<ffffffff81195ed7>] deactivate_super+0x57/0x80
      <4>[90144.158768]  [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110
      <4>[90144.162013]  [<ffffffff811b699b>] sys_umount+0x7b/0x3a0
      <4>[90144.164183]  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
      <4>[90144.166668] 
      

      I think this is introduced by http://review.whamcloud.com/16802 :

      +               identity = mdt_identity_get(mdt->mdt_identity_cache,
      ...
      +       uc->uc_identity = identity;
       
      +       if (nodemap == NULL) {
      +               CERROR("%s: cli %s/%p nodemap not set.\n",
      +                     mdt2obd_dev(mdt)->obd_name,
      +                     info->mti_exp->exp_client_uuid.uuid, info->mti_exp);
      +               RETURN(-EACCES);
      

      So we are leaking the identity refcount in this case where as we did not have that before.
      Probably just need to move the if (nodemap==NULL) check to the start of the function before the mdt_get_identity call?

      Attachments

        Issue Links

          Activity

            People

              green Oleg Drokin
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: