Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19089

crash in obd_set_max_rpcs_in_flight on lustre 2.15

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.6
    • lustre-2.15.6_8.llnl-1.t4.x86_64
      4.18.0-553.53.1.1toss.t4.x86_64
    • 3
    • 9223372036854775807

    Description

      we saw a crash with a stack that looks like LU-14441

      2025-06-04 01:52:00 [571507.318908] BUG: unable to handle kernel NULL pointer dereference at 0000000
      2025-06-04 01:52:00 ^[[23;80H00000000d0
      2025-06-04 01:52:00 [571507.324531] PGD 0 P4D 0
      2025-06-04 01:52:00 [571507.326567] Oops: 0000 [#1] SMP NOPTI
      2025-06-04 01:52:00 [571507.329363] CPU: 18 PID: 2569314 Comm: lctl Kdump: loaded Tainted: P
      2025-06-04 01:52:00 ^[[23;80H    OE  X  -------- -  - 4.18.0-553.53.1.1toss.t4.x86_64 #1
      2025-06-04 01:52:00 [571507.338484] Hardware name: HPE HPE_Cray_EXNNF/HPE Cray EXNNF, BIOS 1.4.0_SBI
      2025-06-04 01:52:00 ^[[23;80HIOS-3762_BootTimeMinimal_DisableCassiniOprom 01-23-2025
      2025-06-04 01:52:00 [571507.346598] RIP: 0010:obd_set_max_rpcs_in_flight+0x27/0x2c0 [obdclass]
      2025-06-04 01:52:00 [571507.350890] Code: 0f 1f 00 0f 1f 44 00 00 8d 46 ff 3d ff 01 00 00 0f 87 3b 0
      2025-06-04 01:52:00 ^[[23;80H02 00 00 41 57 41 56 41 55 41 89 f5 41 54 55 48 89 fd 53 48 8b 47 50 <48> 8b 90 d
      2025-06-04 01:52:00 ^[[23;80Hd0 00 00 00 f6 05 f7 54 81 ff 40 0f 85 45 01 00 00 48 8b
      2025-06-04 01:52:00 [571507.363429] RSP: 0018:ffffaac4d6a33e40 EFLAGS: 00010283
      2025-06-04 01:52:00 [571507.367034] RAX: 0000000000000000 RBX: ffff97bedf432310 RCX: 000000000000000
      2025-06-04 01:52:00 ^[[23;80H00^H
      2025-06-04 01:52:00 [571507.372428] RDX: 0000000000000080 RSI: 0000000000000080 RDI: ffff97bedf4317e
      2025-06-04 01:52:00 ^[[23;80He8^H
      2025-06-04 01:52:00 [571507.376879] RBP: ffff97bedf4317e8 R08: 0000000000000008 R09: 000000000000000
      2025-06-04 01:52:00 ^[[23;80H03^H
      2025-06-04 01:52:00 [571507.381896] R10: 000000000000000a R11: f000000000000000 R12: 000000000000000
      2025-06-04 01:52:00 ^[[23;80H03^H
      2025-06-04 01:52:00 [571507.387506] R13: 0000000000000080 R14: ffffaac4d6a33f08 R15: ffff97ce3dee84a
      2025-06-04 01:52:00 ^[[23;80Ha0^H
      2025-06-04 01:52:00 [571507.393143] FS:  00007ffff7fc3740(0000) GS:ffff97dd4f680000(0000) knlGS:0000
      2025-06-04 01:52:00 ^[[23;80H0000000000000
      2025-06-04 01:52:00 [571507.400388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      2025-06-04 01:52:00 [571507.404812] CR2: 00000000000000d0 CR3: 00000011e7fa2004 CR4: 0000000000770ee
      2025-06-04 01:52:00 ^[[23;80He0^H
      2025-06-04 01:52:00 [571507.410406] PKRU: 55555554
      2025-06-04 01:52:00 [571507.412758] Call Trace:
      2025-06-04 01:52:00 [571507.414747]  ? __die_body+0x1a/0x60
      2025-06-04 01:52:00 [571507.418160]  ? no_context+0x1c0/0x3f0
      2025-06-04 01:52:00 [571507.420680]  ? __bad_area_nosemaphore+0x157/0x180
      2025-06-04 01:52:00 [571507.424482]  ? do_page_fault+0x37/0x13f
      2025-06-04 01:52:00 [571507.427236]  ? page_fault+0x1e/0x30
      2025-06-04 01:52:00 [571507.429523]  ? obd_set_max_rpcs_in_flight+0x27/0x2c0 [obdclass]
      2025-06-04 01:52:00 [571507.433895]  max_rpcs_in_flight_store+0x64/0x80 [mdc]
      2025-06-04 01:52:00 [571507.438588]  kernfs_fop_write+0x11e/0x1a0
      2025-06-04 01:52:00 [571507.441880]  vfs_write+0xb7/0x1c0
      2025-06-04 01:52:00 [571507.445208]  ksys_write+0x4f/0xb0
      2025-06-04 01:52:00 [571507.448038]  do_syscall_64+0x5b/0x1a0
      2025-06-04 01:52:00 [571507.452091]  entry_SYSCALL_64_after_hwframe+0x66/0xcb
      2025-06-04 01:52:00 [571507.455957] RIP: 0033:0x7ffff6f9c685

      Attachments

        Issue Links

          Activity

            [LU-19089] crash in obd_set_max_rpcs_in_flight on lustre 2.15

            People

              bzzz Alex Zhuravlev
              defazio Gian-Carlo Defazio
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: