Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20074

NRS TBF Use-After-Free and Fatal LBUG - refcount_inc() in nrs_tbf_hop_get()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Medium
    • None
    • None
    • None
    • master (2.17)
      Servers and Clients: Rocky 9.7 - 5.14.0-611.27.1.el9_7.x86_64
    • 3
    • 9223372036854775807

    Description

      On master, I was able to reliably trigger an LBUG when doing the following with NRS TBF configuration,

      [SERVER]# lctl set_param ost.OSS.ost_io.nrs_policies='tbf'
      
      [CLIENT]# lfs setstripe -i 0 /lustre/boom0 && lfs setstripe -i 0 /lustre/boom1 && dd if=/dev/zero of=/lustre/boom0 bs=1MB count=1 oflag=sync && sleep 2 && dd if=/dev/zero of=/lustre/boom1 bs=1MB count=1 oflag=sync
      1+0 records in
      1+0 records out
      1000000 bytes (1.0 MB, 977 KiB) copied, 0.0835055 s, 12.0 MB/s
      1+0 records in
      1+0 records out
      1000000 bytes (1.0 MB, 977 KiB) copied, 0.0579082 s, 17.3 MB/s
      

      This triggers the following kernel warning trace on the server due to a use-after-free interception

      [ 1762.253344] ------------[ cut here ]------------
      [ 1762.254325] refcount_t: saturated; leaking memory.
      [ 1762.255205] WARNING: CPU: 0 PID: 3790 at lib/refcount.c:22 refcount_warn_saturate+0x51/0x110
      [ 1762.255222] Modules linked in: ofd(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) mbcache jbd2 lustre(OE) mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE
      ) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) 8021q garp mrp stp llc rfkill vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_class intel_vse
      c kvm_intel kvm iTCO_wdt bochs iTCO_vendor_support drm_client_lib drm_shmem_helper rapl drm_kms_helper virtio_balloon pcspkr i2c_i801 lpc_ich i2c_smbus joydev drm fuse xfs libcrc32c ahci libahci libata crct10d
      if_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net net_failover failover virtio_console virtio_blk serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod
      [ 1762.255318] CPU: 0 PID: 3790 Comm: ll_ost_io00_001 Kdump: loaded Tainted: G        W  OE      ------  ---  5.14.0-611.27.1.el9_7.x86_64 #1
      [ 1762.255326] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
      [ 1762.255328] RIP: 0010:refcount_warn_saturate+0x51/0x110
      [ 1762.255334] Code: 84 bc 00 00 00 c3 cc cc cc cc 85 f6 74 46 80 3d 1e db cc 01 00 75 ee 48 c7 c7 60 18 72 8c c6 05 0e db cc 01 01 e8 4f 51 a9 ff <0f> 0b c3 cc cc cc cc 80 3d fa da cc 01 00 75 cb 48 c7 c7 88
      18 72
      [ 1762.255339] RSP: 0018:ffffa12b40a2bc88 EFLAGS: 00010282
      [ 1762.255345] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [ 1762.255348] RDX: ffff913fbfc2da40 RSI: ffff913fbfc20900 RDI: ffff913fbfc20900
      [ 1762.255351] RBP: ffff913f8614b180 R08: 0000000000000000 R09: ffffa12b40a2bb38
      [ 1762.255355] R10: ffffa12b40a2bb30 R11: ffffffff8d1e2348 R12: ffff913f7da7fe18
      [ 1762.255357] R13: ffffa12b40a2bcdc R14: 0000000000000002 R15: ffffa12b40a2bcc8
      [ 1762.255360] FS:  0000000000000000(0000) GS:ffff913fbfc00000(0000) knlGS:0000000000000000
      [ 1762.255365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1762.255368] CR2: 00007f57d86aa000 CR3: 00000000061c0004 CR4: 0000000000772ef0
      [ 1762.255379] PKRU: 55555554
      [ 1762.255382] Call Trace:
      [ 1762.255386]  <TASK>
      [ 1762.255392]  ? show_trace_log_lvl+0x1c4/0x2df
      [ 1762.255403]  ? show_trace_log_lvl+0x1c4/0x2df
      [ 1762.255415]  ? cfs_hash_bd_lookup_intent+0x6f/0xd0 [obdclass]
      [ 1762.255601]  ? refcount_warn_saturate+0x51/0x110
      [ 1762.255606]  ? __warn+0x7d/0xd0
      [ 1762.255614]  ? refcount_warn_saturate+0x51/0x110
      [ 1762.255618]  ? report_bug+0x102/0x140
      [ 1762.255625]  ? handle_bug+0x3c/0x70
      [ 1762.255634]  ? exc_invalid_op+0x14/0x70
      [ 1762.255642]  ? asm_exc_invalid_op+0x16/0x20
      [ 1762.255653]  ? refcount_warn_saturate+0x51/0x110
      [ 1762.255657]  ? refcount_warn_saturate+0x51/0x110
      [ 1762.255661]  cfs_hash_bd_lookup_intent+0x6f/0xd0 [obdclass]
      [ 1762.255832]  nrs_tbf_cli_find+0xd4/0x200 [ptlrpc]
      [ 1762.284215]  nrs_tbf_res_get+0x42/0x5e0 [ptlrpc]
      [ 1762.284582]  ? ofd_punch_hpreq_check+0x128/0x360 [ofd]
      [ 1762.284626]  nrs_resource_get+0x6b/0xf0 [ptlrpc]
      [ 1762.286019]  nrs_resource_get_safe+0xb0/0x170 [ptlrpc]
      [ 1762.286116]  ptlrpc_nrs_req_initialize+0x58/0xb0 [ptlrpc]
      [ 1762.286200]  ptlrpc_server_request_add+0xfc/0x620 [ptlrpc]
      [ 1762.286270]  ptlrpc_server_handle_req_in.isra.0+0x561/0xa10 [ptlrpc]
      [ 1762.286339]  ptlrpc_main+0x976/0xea0 [ptlrpc]
      [ 1762.286409]  ? __pfx_ptlrpc_main+0x10/0x10 [ptlrpc]
      [ 1762.286477]  kthread+0x101/0x110
      [ 1762.286480]  ? __pfx_kthread+0x10/0x10
      [ 1762.286482]  ret_from_fork+0x28/0x50
      [ 1762.286485]  </TASK>
      [ 1762.286486] ---[ end trace 0000000000000000 ]---
      

      Finally, the LBUG is triggered by disabling the TBF policy,

      [SERVER]# lctl set_param ost.OSS.ost_io.nrs_policies='fifo'
      
      [ 1971.798726] LustreError: 5765:0:(nrs_tbf.c:245:nrs_tbf_cli_fini()) ASSERTION( refcount_read(&cli->tc_ref) == 0 ) failed:
      [ 1971.799583] LustreError: 5765:0:(nrs_tbf.c:245:nrs_tbf_cli_fini()) LBUG
      [ 1971.800808] CPU: 1 PID: 5765 Comm: lctl Kdump: loaded Tainted: G        W  OE      ------  ---  5.14.0-611.27.1.el9_7.x86_64 #1
      [ 1971.802306] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-4.el9 04/01/2014
      [ 1971.804074] Call Trace:
      [ 1971.805666]  <TASK>
      [ 1971.807391]  dump_stack_lvl+0x34/0x48
      [ 1971.808937]  lbug_with_loc.cold+0x5/0x50 [libcfs]
      [ 1971.810466]  nrs_tbf_cli_fini+0x1fe/0x210 [ptlrpc]
      [ 1971.812486]  cfs_hash_destroy+0x1c5/0x480 [obdclass]
      [ 1971.814296]  nrs_tbf_stop+0x5b/0x330 [ptlrpc]
      [ 1971.816294]  nrs_policy_stop0+0x35/0x1d0 [ptlrpc]
      [ 1971.817607]  nrs_policy_stop_primary+0x5b/0x1c0 [ptlrpc]
      [ 1971.818958]  nrs_policy_start_locked+0x4f2/0x6c0 [ptlrpc]
      [ 1971.820109]  nrs_policy_ctl+0x2af/0x370 [ptlrpc]
      [ 1971.821123]  ptlrpc_nrs_policy_control+0xef/0x2c0 [ptlrpc]
      [ 1971.821827]  ptlrpc_lprocfs_nrs_policies_seq_write+0x48a/0x5d0 [ptlrpc]
      [ 1971.822457]  full_proxy_write+0x5e/0xa0
      [ 1971.822911]  vfs_write+0xeb/0x470
      [ 1971.823384]  ? do_filp_open+0xb0/0x160
      [ 1971.823743]  ksys_write+0x5f/0xe0
      [ 1971.824076]  do_syscall_64+0x5c/0xe0
      [ 1971.824428]  ? __check_object_size.part.0+0x47/0xd0
      [ 1971.825049]  ? strncpy_from_user+0x20/0x100
      [ 1971.825999]  ? do_sys_openat2+0x81/0xd0
      [ 1971.826957]  ? syscall_exit_work+0xff/0x130
      [ 1971.827866]  ? syscall_exit_to_user_mode+0x19/0x40
      [ 1971.828203]  ? do_syscall_64+0x6b/0xe0
      [ 1971.828481]  ? __handle_mm_fault+0x2fe/0x650
      [ 1971.828736]  ? syscall_exit_work+0xff/0x130
      [ 1971.829001]  ? __count_memcg_events+0x4f/0xb0
      [ 1971.829282]  ? mm_account_fault+0x6c/0x100
      [ 1971.829667]  ? handle_mm_fault+0x138/0x270
      [ 1971.830367]  ? do_user_addr_fault+0x35d/0x620
      [ 1971.831039]  ? exc_page_fault+0x61/0x150
      [ 1971.831723]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
      [ 1971.831989] RIP: 0033:0x7fe7068ff117
      [ 1971.832262] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89
      74 24
      [ 1971.832849] RSP: 002b:00007ffd4240d478 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 1971.833304] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fe7068ff117
      [ 1971.833913] RDX: 0000000000000004 RSI: 00007ffd424107dd RDI: 0000000000000003
      [ 1971.834487] RBP: 0000000000000003 R08: f8f0000000040000 R09: 0000000000000000
      [ 1971.835071] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd424107dd
      [ 1971.835404] R13: 000055ac0b8dcfc0 R14: 00007ffd4240f770 R15: 000055ac0b8dca50
      [ 1971.835715]  </TASK>
      [ 1971.836023] Kernel panic - not syncing: LBUG
      

      This bug appears to have been introduced in LU-16796 when the reference counting was moved from atomic_t to refcount_t, however it replaced atomic_inc() with refcount_inc() inside nrs_tbf_hop_get():

      static void nrs_tbf_hop_get(struct cfs_hash *hs, struct hlist_node *hnode)
      {
        struct nrs_tbf_client *cli = hlist_entry(hnode,
                   struct nrs_tbf_client,
                   tc_hnode);
      
        refcount_inc(&cli->tc_ref);
      }
      

      When a TBF client hits 0 references, it moves to an LRU list for deferred destruction. If a new RPC arrives before garbage collection, nrs_tbf_hop_get() finds the object with a 0-count. Calling refcount_inc() on a 0-count object violates kernel hardening rules, triggering the above warning and then the LBUG when garbage collection is run.

      I have a short patch to fix this with refcount_inc_not_zero() that resolves the problem that I'll submit shortly.

      Attachments

        Issue Links

          Activity

            People

              mrasobarnett Matt Rásó-Barnett
              mrasobarnett Matt Rásó-Barnett
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: