Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8165

(tgt_lastrcvd.c:656:tgt_client_del()) lustre-OST0000: client 4294967295: bit already clear in bitmap!!

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      With a Lustre 2.5 based version, when trying to find a way to reproduce a leaked cl_object reference situation, likely to occur upon Client eviction from OST, I have triggered the following LBUG when running "while true; do echo <Client-UUID> > /proc/fs/lustre/obdfilter/<OST>/evict_client; done" cmd/loop on the concerned OSS :

      Lustre: 702:0:(genops.c:1521:obd_export_evict_by_uuid()) lustre-OST0000: evicting c12f1e59-5f4d-9f75-bd66-7fad18ddd33f at adminstrative request
      LustreError: 702:0:(genops.c:1518:obd_export_evict_by_uuid()) lustre-OST0000: can't disconnect c12f1e59-5f4d-9f75-bd66-7fad18ddd33f: no exports found
      LustreError: 702:0:(tgt_lastrcvd.c:656:tgt_client_del()) lustre-OST0000: client 4294967295: bit already clear in bitmap!!
      LustreError: 702:0:(tgt_lastrcvd.c:657:tgt_client_del()) LBUG
      Pid: 702, comm: bash
      
      Call Trace:
       [<ffffffffa0531895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa0531e97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa12e30ef>] tgt_client_del+0x50f/0x510 [ptlrpc]
       [<ffffffffa1995ace>] ? ofd_grant_discard+0x3e/0x1c0 [ofd]
       [<ffffffffa197b87d>] ofd_obd_disconnect+0xfd/0x1f0 [ofd]
       [<ffffffffa1088a2d>] class_fail_export+0x23d/0x540 [obdclass]
       [<ffffffffa1088e72>] obd_export_evict_by_uuid+0x142/0x240 [obdclass]
       [<ffffffffa0541a31>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa12b7613>] lprocfs_wr_evict_client+0x2d3/0x3b0 [ptlrpc]
       [<ffffffffa10919eb>] lprocfs_fops_write+0x7b/0xa0 [obdclass]
       [<ffffffff811fa65e>] proc_reg_write+0x7e/0xc0
       [<ffffffff8118e7f8>] vfs_write+0xb8/0x1a0
       [<ffffffff8118f1c1>] sys_write+0x51/0x90
       [<ffffffff810e608e>] ? __audit_syscall_exit+0x25e/0x290
       [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
      
      Kernel panic - not syncing: LBUG
      Pid: 702, comm: bash Not tainted 2.6.32.504.30.3.el6_lustre #1
      Call Trace:
       [<ffffffff8152a81c>] ? panic+0xa7/0x16f
       [<ffffffffa0531eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
       [<ffffffffa12e30ef>] ? tgt_client_del+0x50f/0x510 [ptlrpc]
       [<ffffffffa1995ace>] ? ofd_grant_discard+0x3e/0x1c0 [ofd]
       [<ffffffffa197b87d>] ? ofd_obd_disconnect+0xfd/0x1f0 [ofd]
       [<ffffffffa1088a2d>] ? class_fail_export+0x23d/0x540 [obdclass]
       [<ffffffffa1088e72>] ? obd_export_evict_by_uuid+0x142/0x240 [obdclass]
       [<ffffffffa0541a31>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
       [<ffffffffa12b7613>] ? lprocfs_wr_evict_client+0x2d3/0x3b0 [ptlrpc]
       [<ffffffffa10919eb>] ? lprocfs_fops_write+0x7b/0xa0 [obdclass]
       [<ffffffff811fa65e>] ? proc_reg_write+0x7e/0xc0
       [<ffffffff8118e7f8>] ? vfs_write+0xb8/0x1a0
       [<ffffffff8118f1c1>] ? sys_write+0x51/0x90
       [<ffffffff810e608e>] ? __audit_syscall_exit+0x25e/0x290
       [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
      

      Problem seems to occur due to a race between Client reconnection and forced eviction processes, because new export can be found using Client uuid but Client's last_rcvd index has not already been assigned (-1), and as Lustre full debug trace from crash-dump also seems to indicate :

      00000001:00000001:13.0:1463584027.765801:0:702:0:(tgt_lastrcvd.c:638:tgt_client_del()) Process entered
      00000001:00000040:13.0:1463584027.765802:0:702:0:(tgt_lastrcvd.c:650:tgt_client_del()) lustre-OST0000: del client at idx 4294967295, off 0, UUID 'c12f1e59-5f4d-9f75-bd66-7fad18ddd33f'
      00000001:00000001:20.0:1463584027.765802:0:27360:0:(tgt_lastrcvd.c:536:tgt_client_new()) Process entered
      00000001:00000040:20.0:1463584027.765803:0:27360:0:(tgt_lastrcvd.c:565:tgt_client_new()) lustre-OST0000: client at idx 3 with UUID 'c12f1e59-5f4d-9f75-bd66-7fad18ddd33f' added
      00000001:00020000:13.0:1463584027.765804:0:702:0:(tgt_lastrcvd.c:656:tgt_client_del()) lustre-OST0000: client 4294967295: bit already clear in bitmap!!
      00000001:00000040:20.0:1463584027.765804:0:27360:0:(tgt_lastrcvd.c:575:tgt_client_new()) lustre-OST0000: new client at index 3 (8576) with UUID 'c12f1e59-5f4d-9f75-bd66-7fad18ddd33f'
      

      According to the source code, problem is also present in current master.

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            bfaccini Bruno Faccini (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: