Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2167

Race in ptlrpcd_stop/fini vs ptlrpcd leaving to use after free

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 5198

    Description

      I met this oops on my testing system several times and thought it did not make any sense initially, but after looking into the code I see that it does make sense after all.

      [11898.846141] BUG: unable to handle kernel paging request at ffff88008fb39800
      [11898.846670] IP: [<ffffffffa1368c93>] ptlrpcd+0x2c3/0x3a0 [ptlrpc]
      [11898.847291] PGD 1a26063 PUD 501067 PMD 57f067 PTE 800000008fb39160
      [11898.847789] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
      [11898.848237] last sysfs file: /sys/devices/system/cpu/possible
      [11898.848678] CPU 2 
      [11898.848743] Modules linked in: ptlrpc(-) obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: fld]
      [11898.850099] 
      [11898.850099] Pid: 30737, comm: ptlrpcd_rcv Not tainted 2.6.32-debug #6 Bochs Bochs
      [11898.850099] RIP: 0010:[<ffffffffa1368c93>]  [<ffffffffa1368c93>] ptlrpcd+0x2c3/0x3a0 [ptlrpc]
      [11898.850099] RSP: 0018:ffff880088529e70  EFLAGS: 00010286
      [11898.850099] RAX: 0000000000000002 RBX: ffff88008fb39800 RCX: 0000000000000001
      [11898.850099] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
      [11898.850099] RBP: ffff880088529f40 R08: 0000000000000000 R09: 0000000000000001
      [11898.850099] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880088529ea0
      [11898.850099] R13: ffff880083785f30 R14: 0000000000000002 R15: ffff880088529ee0
      [11898.850099] FS:  00007f449616c700(0000) GS:ffff880006300000(0000) knlGS:0000000000000000
      [11898.850099] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [11898.850099] CR2: ffff88008fb39800 CR3: 00000000b3178000 CR4: 00000000000006e0
      [11898.850099] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [11898.850099] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [11898.850099] Process ptlrpcd_rcv (pid: 30737, threadinfo ffff880088528000, task ffff880070208500)
      [11898.850099] Stack:
      [11898.850099]  ffff880070208500 ffffffffffffffff ffff880083785f40 0000000000000000
      [11898.850099] <d> ffff880070208500 0000000000000000 00000004b0000008 0000000000000000
      [11898.850099] <d> 0000000000000000 ffff880088529eb8 ffff880088529eb8 0000000000000054
      [11898.850099] Call Trace:
      [11898.850099]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [11898.850099]  [<ffffffffa13689d0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
      [11898.850099]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [11898.850099]  [<ffffffffa13689d0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
      [11898.850099]  [<ffffffffa13689d0>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
      [11898.850099]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [11898.850099] Code: e9 ba fe ff ff 0f 1f 00 49 8d 45 40 49 39 45 40 74 08 4c 89 ef e8 8e f6 fc ff 4c 89 e7 e8 46 1f ea ff 48 8d 7b 50 e8 8d 91 ce df <f0> 80 23 fd f0 80 23 fb f0 80 23 ef f0 80 63 02 fe e9 17 fe ff 
      [11898.850099] RIP  [<ffffffffa1368c93>] ptlrpcd+0x2c3/0x3a0 [ptlrpc]
      [11898.850099]  RSP <ffff880088529e70>
      [11898.850099] CR2: ffff88008fb39800
      

      Looking at the code, in ptlrpcd() we see:

              if (!cfs_list_empty(&set->set_requests))
                      ptlrpc_set_wait(set);
              lu_context_fini(&env.le_ctx);
              cfs_complete(&pc->pc_finishing);
      
              cfs_clear_bit(LIOD_START, &pc->pc_flags);  <=== Crash Here
              cfs_clear_bit(LIOD_STOP, &pc->pc_flags);
              cfs_clear_bit(LIOD_FORCE, &pc->pc_flags);
      

      And in ptlrpcd_stop/fini():

      {
      ...
      #ifdef __KERNEL__
              cfs_wait_for_completion(&pc->pc_finishing);
      #else
              liblustre_deregister_wait_callback(pc->pc_wait_callback);
              liblustre_deregister_idle_callback(pc->pc_idle_callback);
      #endif
              lu_context_fini(&pc->pc_env.le_ctx);
      ...
              EXIT;
      }
      static void ptlrpcd_fini(void)
      {
              int i;
              ENTRY;
      
              if (ptlrpcds != NULL) {
                      for (i = 0; i < ptlrpcds->pd_nthreads; i++)
                              ptlrpcd_stop(&ptlrpcds->pd_threads[i], 0);
                      ptlrpcd_stop(&ptlrpcds->pd_thread_rcv, 0);
      lost race =>>   OBD_FREE(ptlrpcds, ptlrpcds->pd_size);
                      ptlrpcds = NULL;
              }
      }
      

      I think the proper solution to avoid the race should be to move cfs_clear_bit()s to before signalling the completion.

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: