Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5284

GPF in radix_tree_lookup_slot on OSS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.1.6
    • bullx supercomputer suite
    • 3
    • 14747

    Description

      OSS hits a general protection fault with the following trace:

      PID: 22785  TASK: ffff880612f90830  CPU: 1   COMMAND: "ll_ost_io_1011"
       #0 [ffff880612fd7620] machine_kexec at ffffffff8102902b
       #1 [ffff880612fd7680] crash_kexec at ffffffff810a5292
       #2 [ffff880612fd7750] oops_end at ffffffff8149a050
       #3 [ffff880612fd7780] die at ffffffff8100714b
       #4 [ffff880612fd77b0] do_general_protection at ffffffff81499be2
       #5 [ffff880612fd77e0] general_protection at ffffffff814993b5
          [exception RIP: radix_tree_lookup_slot+5]
          RIP: ffffffff81261465  RSP: ffff880612fd7890  RFLAGS: 00010286
          RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
          RDX: 00000000000200d2  RSI: 0000000000000000  RDI: e940201000000008
          RBP: ffff880612fd78b0   R8: ffff880612fdc140   R9: 0000000000000008
          R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000000
          R13: 0000000000000000  R14: e940201000000000  R15: 20105fa000080221
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #6 [ffff880612fd7898] find_get_page at ffffffff810ffe8e
       #7 [ffff880612fd78b8] find_lock_page at ffffffff8110112a
       #8 [ffff880612fd78e8] find_or_create_page at ffffffff8110129f
       #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
      #10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
      #11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
      #12 [ffff880612fd7ad8] obd_preprw at ffffffffa0c09051 [ost]
      #13 [ffff880612fd7b48] ost_brw_read at ffffffffa0c10091 [ost]
      #14 [ffff880612fd7c88] ost_handle at ffffffffa0c16423 [ost]
      #15 [ffff880612fd7da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
      #16 [ffff880612fd7f48] kernel_thread at ffffffff8100412a
      

      Taking a look at the dump, it seems there is a race leading to corruption of the struct inode which is passed from filter_preprw_read to filter_get_page.

      I attached my complete dump analysis log.

      I can also upload the dump if you request it.

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            spiechurski Sebastien Piechurski
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: