Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19091

GPF in lprocfs_stats_collect() on Client during umount

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When trying to reproduce the crash for LU-18788 by running the following script/loop with recent master version :

      while true ; do
      mount <Lustre FS with KRB flavor>
      timeout 20 dd if=<Lustre big file wide striped> of=/dev/null
      umount
      lustre_rmmod
      done
      

      I have triggered the following other crash/GPF :

      [4391638.976039] general protection fault, probably for non-canonical address 0x14f4d7cf9ff71fc7: 0000 [#1] PREEMPT SMP NOPTI
      [4391638.988979] CPU: 108 PID: 1586 Comm: kworker/108:1 Kdump: loaded Tainted: G           OE      6.8.0-51-generic #52-Ubuntu
      [4391639.002029] Hardware name: <Platform Name>, BIOS 1.3.6 01/10/2024
      [4391639.010269] Workqueue: events sec_gc_main [ptlrpc]
      [4391639.016265] RIP: 0010:lprocfs_stats_collect+0x90/0x150 [obdclass]
      [4391639.023757] Code: b7 6f 14 45 85 ed 74 67 48 63 f6 45 89 ed 4c 89 e2 48 8d 34 b6 4f 8d 04 ec 48 c1 e6 03 48 8b 42 38 48 85 c0 74 38 48 8d 0c 30 <48> 8b 39 48 01 3b 48 8b 79 18 48 01 7b 18 48 8b 49 08 48 3b 4b 08
      [4391639.045441] RSP: 0018:ff5e71981e2efbc8 EFLAGS: 00010202
      [4391639.051790] RAX: 14f4d7cf9ff71fc7 RBX: ff5e71981e2efc28 RCX: 14f4d7cf9ff71fc7
      [4391639.060252] RDX: ff3cc127be0b83c8 RSI: 0000000000000000 RDI: 0000000000000000
      [4391639.068714] RBP: ff5e71981e2efbe8 R08: ff3cc127be0b83e0 R09: 0000000000000000
      [4391639.077170] R10: 0000000000000000 R11: 0000000000000000 R12: ff3cc127be0b8000
      [4391639.085625] R13: 000000000000007c R14: ff3cc231dfd89058 R15: ff3cc231dfd89040
      [4391639.094072] FS:  0000000000000000(0000) GS:ff3cc3213d200000(0000) knlGS:0000000000000000
      [4391639.103593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [4391639.110474] CR2: 00005ed672f4e038 CR3: 0000010b05f12004 CR4: 0000000000f71ef0
      [4391639.118932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [4391639.127395] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
      [4391639.135851] PKRU: 55555554
      [4391639.139335] Call Trace:
      [4391639.142534]  <TASK>
      [4391639.145332]  ? show_regs+0x6d/0x80
      [4391639.149602]  ? die_addr+0x37/0xa0
      [4391639.153774]  ? exc_general_protection+0x1db/0x480
      [4391639.159511]  ? kiblnd_send+0x2aa/0xa70 [ko2iblnd]
      [4391639.165263]  ? asm_exc_general_protection+0x27/0x30
      [4391639.171197]  ? lprocfs_stats_collect+0x90/0x150 [obdclass]
      [4391639.177882]  ? ptl_send_buf+0x138/0x5b0 [ptlrpc]
      [4391639.183594]  kick_cpu_latency+0xca/0x6c0 [ptlrpc]
      [4391639.189397]  ? __pfx_ptlrpc_master_callback+0x10/0x10 [ptlrpc]
      [4391639.196468]  ptl_send_rpc+0x529/0x14a0 [ptlrpc]
      [4391639.202068]  gss_do_ctx_fini_rpc+0x230/0x510 [ptlrpc_gss]
      [4391639.208609]  gss_cli_ctx_fini_common+0x5d/0x2f0 [ptlrpc_gss]
      [4391639.215438]  ctx_destroy_kr+0x7a/0x2f0 [ptlrpc_gss]
      [4391639.221391]  gss_sec_release_ctx_kr+0x2d/0xd0 [ptlrpc_gss]
      [4391639.228026]  sptlrpc_cli_ctx_put+0x41/0xd0 [ptlrpc]
      [4391639.234029]  sec_process_ctx_list+0xbf/0x1b0 [ptlrpc]
      [4391639.240226]  sec_gc_main+0x1d/0x290 [ptlrpc]
      [4391639.245543]  process_one_work+0x175/0x350
      [4391639.250526]  worker_thread+0x306/0x440
      [4391639.255213]  ? _raw_spin_lock_irqsave+0xe/0x20
      [4391639.260670]  ? __pfx_worker_thread+0x10/0x10
      [4391639.265936]  kthread+0xef/0x120
      [4391639.269923]  ? __pfx_kthread+0x10/0x10
      [4391639.274606]  ret_from_fork+0x44/0x70
      [4391639.279094]  ? __pfx_kthread+0x10/0x10
      [4391639.283771]  ret_from_fork_asm+0x1b/0x30

       

      Crash dump analysis has permitted to determine that this occurs in recent PM-QoS related code coming from LU-18446  where OBD dev stats are used to evaluate best time period for low CPUs latency to be kept, but at the time a late KRB-related kthread wants to send an RPC to the server/target, obd->obd_svc_stats has already been freed (and reused, thus causing the crash due to garbage)  and pointer was even found being NULL at the time of the crash-dump, as a highly suspected consequence of this umount thread also found in present in crash-dump :

      PID: 3715482  TASK: ff3cc22e6001af80  CPU: 220  COMMAND: "umount"
       #0 [ff5e7198732e7a28] __schedule at ffffffff8504095c
       #1 [ff5e7198732e7ad0] schedule at ffffffff85040dd3
       #2 [ff5e7198732e7af0] synchronize_rcu_expedited at ffffffff83fdd03c
       #3 [ff5e7198732e7ba0] bdi_unregister at ffffffff841f8770
       #4 [ff5e7198732e7bf8] generic_shutdown_super at ffffffff842e7c0d
       #5 [ff5e7198732e7c20] kill_anon_super at ffffffff842e82d8
       #6 [ff5e7198732e7c40] lustre_kill_super at ffffffffc32a8cf4 [lustre]
       #7 [ff5e7198732e7c58] deactivate_locked_super at ffffffff842e91d2
       #8 [ff5e7198732e7c78] deactivate_super at ffffffff842e92b6
       #9 [ff5e7198732e7c90] cleanup_mnt at ffffffff843197d3
      #10 [ff5e7198732e7cc0] __cleanup_mnt at ffffffff84319902
      #11 [ff5e7198732e7cd0] task_work_run at ffffffff83f33cde
      #12 [ff5e7198732e7cf8] syscall_exit_to_user_mode at ffffffff85030d27
      #13 [ff5e7198732e7d20] do_syscall_64 at ffffffff85028fec
          RIP: 00007b2b3352a9fb  RSP: 00007fffc31c3248  RFLAGS: 00000246
          RAX: 0000000000000000  RBX: 00005ed672f464c0  RCX: 00007b2b3352a9fb
          RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00005ed672f4d990
          RBP: 00007fffc31c3320   R8: 00007b2b33603b20   R9: 0000000000000020
          R10: 0000000000000001  R11: 0000000000000246  R12: 00005ed672f465c0
          R13: 0000000000000000  R14: 00005ed672f4d990  R15: 00005ed672f4e040
          ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b 

      After looking more in Lustre source code to understand how obd->obd_svc_stats access is usually protected, I have found that before PM-QoS changes its access could only occur thru /sys or /debug external tunables, thus concurrent access protection was only achieved by removing these external tunables, but now some internal protection mechanism must be put in place to prevent such race...

      Attachments

        Activity

          People

            bfaccini-nvda Bruno Faccini
            bfaccini-nvda Bruno Faccini
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: