Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17076

ptlrpc_nrs_req_stop_nolock() use after free with ORR NRS policy

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      sanityn.sh test 77c crashed as below:

      [13898.279218] Lustre: DEBUG MARKER: == sanityn test 77c: check ORR NRS policy ================ 11:42:47 (1693482167)
      [13898.512652] Lustre: DEBUG MARKER: lctl set_param ost.OSS.ost_io.nrs_policies=orr ost.OSS.*.nrs_orr_quantum=1 ost.OSS.*.nrs_orr_offset_type=physical ost.OSS.*.nrs_orr_supported=reads
      [13900.571898] Lustre: DEBUG MARKER: lctl set_param ost.OSS.*.nrs_orr_supported=writes ost.OSS.*.nrs_orr_quantum=64
      [13903.067814] Lustre: DEBUG MARKER: lctl set_param ost.OSS.*.nrs_orr_supported=reads_and_writes ost.OSS.*.nrs_orr_offset_type=logical
      [13905.203004] general protection fault, probably for non-canonical address 0x5a5a5a5a5a5a5a5a: 0000 [#1] SMP PTI
      [13905.205028] CPU: 1 PID: 250894 Comm: ll_ost_io00_027 Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-477.15.1.el8_lustre.x86_64 #1
      [13905.207489] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [13905.208620] RIP: 0010:ptlrpc_nrs_req_stop_nolock+0x34/0x140 [ptlrpc]
      [13905.210443] Code: 00 00 a8 04 75 05 e9 0b 08 f0 c1 53 a8 01 74 65 a8 08 0f 85 8d 00 00 00 8b 87 08 01 00 00 48 8b 84 c7 f8 00 00 00 48 8b 58 08 <48> 8b 43 68 48 8b 40 20 48 8b 40 50 48 85 c0 74 0f 48 8d b7 f8 00
      [13905.213924] RSP: 0018:ffff9c0ec3513da8 EFLAGS: 00010246
      [13905.214961] RAX: ffff8ecb454357d0 RBX: 5a5a5a5a5a5a5a5a RCX: 0000000000000000
      [13905.216321] RDX: 0000000000000001 RSI: ffff8ecb309ca400 RDI: ffff8ecb309ca400
      [13905.217685] RBP: ffff8ecb05c4d8e8 R08: 00000000000000b8 R09: abcc77118461cefd
      [13905.219044] R10: ffff9c0ec3513d88 R11: ffff8ecb34982646 R12: ffff8ecb05c4d800
      [13905.220422] R13: ffff8ecb43bfb200 R14: 00000000000089b6 R15: ffff8ecb309ca400
      [13905.221866] FS:  0000000000000000(0000) GS:ffff8ecbbfd00000(0000) knlGS:0000000000000000
      [13905.223781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13905.224902] CR2: 00007ffc748ef221 CR3: 0000000076810004 CR4: 00000000001706e0
      [13905.226268] Call Trace:
      [13905.226846]  ptlrpc_server_finish_active_request+0x26/0x120 [ptlrpc]
      [13905.228184]  ptlrpc_server_handle_request+0x4a2/0xbc0 [ptlrpc]
      [13905.229415]  ptlrpc_main+0xc91/0x15a0 [ptlrpc]
      [13905.230399]  ? __schedule+0x2d9/0x870
      [13905.231251]  ? ptlrpc_wait_event+0x590/0x590 [ptlrpc]
      [13905.232347]  kthread+0x134/0x150
      [13905.233064]  ? set_kthread_struct+0x50/0x50
      [13905.233920]  ret_from_fork+0x35/0x40

      report link: https://testing.whamcloud.com/test_sets/79713f31-a8a9-4056-8478-4fb529a14077

      It can be related to recent changes in ORR/TRR code in 'LU-8130 nrs: convert NRS ORR/TRR to rhashtable' landed two weeks  ago

      Attachments

        Issue Links

          Activity

            People

              bzzz Alex Zhuravlev
              tappro Mikhail Pershin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: