Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5284

GPF in radix_tree_lookup_slot on OSS

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.1.6
    • bullx supercomputer suite
    • 3
    • 14747

    Description

      OSS hits a general protection fault with the following trace:

      PID: 22785  TASK: ffff880612f90830  CPU: 1   COMMAND: "ll_ost_io_1011"
       #0 [ffff880612fd7620] machine_kexec at ffffffff8102902b
       #1 [ffff880612fd7680] crash_kexec at ffffffff810a5292
       #2 [ffff880612fd7750] oops_end at ffffffff8149a050
       #3 [ffff880612fd7780] die at ffffffff8100714b
       #4 [ffff880612fd77b0] do_general_protection at ffffffff81499be2
       #5 [ffff880612fd77e0] general_protection at ffffffff814993b5
          [exception RIP: radix_tree_lookup_slot+5]
          RIP: ffffffff81261465  RSP: ffff880612fd7890  RFLAGS: 00010286
          RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
          RDX: 00000000000200d2  RSI: 0000000000000000  RDI: e940201000000008
          RBP: ffff880612fd78b0   R8: ffff880612fdc140   R9: 0000000000000008
          R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000000
          R13: 0000000000000000  R14: e940201000000000  R15: 20105fa000080221
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #6 [ffff880612fd7898] find_get_page at ffffffff810ffe8e
       #7 [ffff880612fd78b8] find_lock_page at ffffffff8110112a
       #8 [ffff880612fd78e8] find_or_create_page at ffffffff8110129f
       #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
      #10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
      #11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
      #12 [ffff880612fd7ad8] obd_preprw at ffffffffa0c09051 [ost]
      #13 [ffff880612fd7b48] ost_brw_read at ffffffffa0c10091 [ost]
      #14 [ffff880612fd7c88] ost_handle at ffffffffa0c16423 [ost]
      #15 [ffff880612fd7da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
      #16 [ffff880612fd7f48] kernel_thread at ffffffff8100412a
      

      Taking a look at the dump, it seems there is a race leading to corruption of the struct inode which is passed from filter_preprw_read to filter_get_page.

      I attached my complete dump analysis log.

      I can also upload the dump if you request it.

      Attachments

        Activity

          [LU-5284] GPF in radix_tree_lookup_slot on OSS
          pjones Peter Jones added a comment -

          ok , thanks

          pjones Peter Jones added a comment - ok , thanks

          Given that we did not see this issue for almost 3 years, and all our customers have migrated away from 2.1, I guess you can close this ticket.

          spiechurski Sebastien Piechurski added a comment - Given that we did not see this issue for almost 3 years, and all our customers have migrated away from 2.1, I guess you can close this ticket.

          Seb, any news or new occurence ??

          bfaccini Bruno Faccini (Inactive) added a comment - Seb, any news or new occurence ??

          Even if this will impact crash-dump timing and on-disk space, could it be possible to set kdump config to allow full memory/all pages to be dumped next time ??

          bfaccini Bruno Faccini (Inactive) added a comment - Even if this will impact crash-dump timing and on-disk space, could it be possible to set kdump config to allow full memory/all pages to be dumped next time ??

          Crash dump analysis findings :

          _ there are 2 additional threads with references to the same page/slab/inode address range, they both triggered the same GPF too, with the same stack!

          crash> bt 21929 22191
          PID: 21929  TASK: ffff880c12206830  CPU: 2   COMMAND: "ll_ost_io_155"
           #0 [ffff880c12277780] die at ffffffff8100711e
           #1 [ffff880c122777b0] do_general_protection at ffffffff81499be2
           #2 [ffff880c122777e0] general_protection at ffffffff814993b5
              [exception RIP: radix_tree_lookup_slot+5]
              RIP: ffffffff81261465  RSP: ffff880c12277890  RFLAGS: 00010286
              RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
              RDX: 00000000000200d2  RSI: 0000000000000008  RDI: e940201000000008
              RBP: ffff880c122778b0   R8: ffff880614e926c0   R9: 00000000000000f8
              R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000008
              R13: 0000000000000008  R14: e940201000000000  R15: 20105fa000080221
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
           #3 [ffff880c12277898] find_get_page at ffffffff810ffe8e
           #4 [ffff880c122778b8] find_lock_page at ffffffff8110112a
           #5 [ffff880c122778e8] find_or_create_page at ffffffff8110129f
           #6 [ffff880c12277938] filter_get_page at ffffffffa0c4b065 [obdfilter]
           #7 [ffff880c12277968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
           #8 [ffff880c12277a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
           #9 [ffff880c12277ad8] obd_preprw at ffffffffa0c09051 [ost]
          #10 [ffff880c12277b48] ost_brw_read at ffffffffa0c10091 [ost]
          #11 [ffff880c12277c88] ost_handle at ffffffffa0c16423 [ost]
          #12 [ffff880c12277da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
          #13 [ffff880c12277f48] kernel_thread at ffffffff8100412a
          
          PID: 22191  TASK: ffff8806144fd0c0  CPU: 0   COMMAND: "ll_ost_io_417"
           #0 [ffff880614617780] die at ffffffff8100711e
           #1 [ffff8806146177b0] do_general_protection at ffffffff81499be2
           #2 [ffff8806146177e0] general_protection at ffffffff814993b5
              [exception RIP: radix_tree_lookup_slot+5]
              RIP: ffffffff81261465  RSP: ffff880614617890  RFLAGS: 00010286
              RAX: e940201000000010  RBX: e940201000000008  RCX: 0000000000000000
              RDX: 00000000000200d2  RSI: 0000000000000100  RDI: e940201000000008
              RBP: ffff8806146178b0   R8: ffff88061461c140   R9: 0000000000000008
              R10: 0000000000001000  R11: 0000000000000001  R12: 0000000000000100
              R13: 0000000000000100  R14: e940201000000000  R15: 20105fa000080221
              ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
           #3 [ffff880614617898] find_get_page at ffffffff810ffe8e
           #4 [ffff8806146178b8] find_lock_page at ffffffff8110112a
           #5 [ffff8806146178e8] find_or_create_page at ffffffff8110129f
           #6 [ffff880614617938] filter_get_page at ffffffffa0c4b065 [obdfilter]
           #7 [ffff880614617968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
           #8 [ffff880614617a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
           #9 [ffff880614617ad8] obd_preprw at ffffffffa0c09051 [ost]
          #10 [ffff880614617b48] ost_brw_read at ffffffffa0c10091 [ost]
          #11 [ffff880614617c88] ost_handle at ffffffffa0c16423 [ost]
          #12 [ffff880614617da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc]
          #13 [ffff880614617f48] kernel_thread at ffffffff8100412a
          crash> 
          

          _ The ldiskfs_inode slab page corruption looks like :

          crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | head
          ffff880040aa5600:  0000041100000401 20025bc900000421   ........!....[. 
          ffff880040aa5610:  0000000000040002 d514200200000000   ............. ..
          ffff880040aa5620:  0000041200000402 2010176b00000622   ........"...k.. 
          ffff880040aa5630:  0000000000050000 3e44201000000000   ............. D>
          ffff880040aa5640:  0000041300000403 20103b4500000823   ........#...E;. 
          ffff880040aa5650:  0000000000050000 ec37201000000000   ............. 7.
          ffff880040aa5660:  0000041400000404 20102f7700000a24   ........$...w/. 
          ffff880040aa5670:  0000000000050000 3928201000000000   ............. (9
          ffff880040aa5680:  0000041500000405 2010299500000c25   ........%....). 
          ffff880040aa5690:  0000000000050000 d013201000000000   ............. ..
          grep: write error
          crash> 
          crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | tail
          ffff880040aa5f60:  0020001b0020000b 201080000020162b   .. ... .+. .... 
          ffff880040aa5f70:  0000000000070000 2a74201000000000   ............. t*
          ffff880040aa5f80:  0020001c0020000c 201080000020182c   .. ... .,. .... 
          ffff880040aa5f90:  0000000000070000 6e46201000000000   ............. Fn
          ffff880040aa5fa0:  0020001d0020000d 2010800000201a2d   .. ... .-. .... 
          ffff880040aa5fb0:  0000000000070000 024d201000000000   ............. M.
          ffff880040aa5fc0:  0020001e0020000e 2010800000201c2e   .. ... ... .... 
          ffff880040aa5fd0:  0000000000070000 b650201000000000   ............. P.
          ffff880040aa5fe0:  0020001f0020000f 2010800000201e2f   .. ... ./. .... 
          ffff880040aa5ff0:  0000000000070000 da5b201000000000   ............. [.
          crash> 
          

          _ dump is missing the 2 previous/next 4K virtual/real pages (inside the same 2M page!) because they are un-mapped/free.

          bfaccini Bruno Faccini (Inactive) added a comment - Crash dump analysis findings : _ there are 2 additional threads with references to the same page/slab/inode address range, they both triggered the same GPF too, with the same stack! crash> bt 21929 22191 PID: 21929 TASK: ffff880c12206830 CPU: 2 COMMAND: "ll_ost_io_155" #0 [ffff880c12277780] die at ffffffff8100711e #1 [ffff880c122777b0] do_general_protection at ffffffff81499be2 #2 [ffff880c122777e0] general_protection at ffffffff814993b5 [exception RIP: radix_tree_lookup_slot+5] RIP: ffffffff81261465 RSP: ffff880c12277890 RFLAGS: 00010286 RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000 RDX: 00000000000200d2 RSI: 0000000000000008 RDI: e940201000000008 RBP: ffff880c122778b0 R8: ffff880614e926c0 R9: 00000000000000f8 R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000008 R13: 0000000000000008 R14: e940201000000000 R15: 20105fa000080221 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #3 [ffff880c12277898] find_get_page at ffffffff810ffe8e #4 [ffff880c122778b8] find_lock_page at ffffffff8110112a #5 [ffff880c122778e8] find_or_create_page at ffffffff8110129f #6 [ffff880c12277938] filter_get_page at ffffffffa0c4b065 [obdfilter] #7 [ffff880c12277968] filter_preprw_read at ffffffffa0c4d64d [obdfilter] #8 [ffff880c12277a98] filter_preprw at ffffffffa0c4dedc [obdfilter] #9 [ffff880c12277ad8] obd_preprw at ffffffffa0c09051 [ost] #10 [ffff880c12277b48] ost_brw_read at ffffffffa0c10091 [ost] #11 [ffff880c12277c88] ost_handle at ffffffffa0c16423 [ost] #12 [ffff880c12277da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc] #13 [ffff880c12277f48] kernel_thread at ffffffff8100412a PID: 22191 TASK: ffff8806144fd0c0 CPU: 0 COMMAND: "ll_ost_io_417" #0 [ffff880614617780] die at ffffffff8100711e #1 [ffff8806146177b0] do_general_protection at ffffffff81499be2 #2 [ffff8806146177e0] general_protection at ffffffff814993b5 [exception RIP: radix_tree_lookup_slot+5] RIP: ffffffff81261465 RSP: ffff880614617890 RFLAGS: 00010286 RAX: e940201000000010 RBX: e940201000000008 RCX: 0000000000000000 RDX: 00000000000200d2 RSI: 0000000000000100 RDI: e940201000000008 RBP: ffff8806146178b0 R8: ffff88061461c140 R9: 0000000000000008 R10: 0000000000001000 R11: 0000000000000001 R12: 0000000000000100 R13: 0000000000000100 R14: e940201000000000 R15: 20105fa000080221 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #3 [ffff880614617898] find_get_page at ffffffff810ffe8e #4 [ffff8806146178b8] find_lock_page at ffffffff8110112a #5 [ffff8806146178e8] find_or_create_page at ffffffff8110129f #6 [ffff880614617938] filter_get_page at ffffffffa0c4b065 [obdfilter] #7 [ffff880614617968] filter_preprw_read at ffffffffa0c4d64d [obdfilter] #8 [ffff880614617a98] filter_preprw at ffffffffa0c4dedc [obdfilter] #9 [ffff880614617ad8] obd_preprw at ffffffffa0c09051 [ost] #10 [ffff880614617b48] ost_brw_read at ffffffffa0c10091 [ost] #11 [ffff880614617c88] ost_handle at ffffffffa0c16423 [ost] #12 [ffff880614617da8] ptlrpc_main at ffffffffa07fd4e6 [ptlrpc] #13 [ffff880614617f48] kernel_thread at ffffffff8100412a crash> _ The ldiskfs_inode slab page corruption looks like : crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | head ffff880040aa5600: 0000041100000401 20025bc900000421 ........!....[. ffff880040aa5610: 0000000000040002 d514200200000000 ............. .. ffff880040aa5620: 0000041200000402 2010176b00000622 ........"...k.. ffff880040aa5630: 0000000000050000 3e44201000000000 ............. D> ffff880040aa5640: 0000041300000403 20103b4500000823 ........#...E;. ffff880040aa5650: 0000000000050000 ec37201000000000 ............. 7. ffff880040aa5660: 0000041400000404 20102f7700000a24 ........$...w/. ffff880040aa5670: 0000000000050000 3928201000000000 ............. (9 ffff880040aa5680: 0000041500000405 2010299500000c25 ........%....). ffff880040aa5690: 0000000000050000 d013201000000000 ............. .. grep: write error crash> crash> rd 0xffff880040aa5000 512 | grep -v "0000000000000000 0000000000000000" | tail ffff880040aa5f60: 0020001b0020000b 201080000020162b .. ... .+. .... ffff880040aa5f70: 0000000000070000 2a74201000000000 ............. t* ffff880040aa5f80: 0020001c0020000c 201080000020182c .. ... .,. .... ffff880040aa5f90: 0000000000070000 6e46201000000000 ............. Fn ffff880040aa5fa0: 0020001d0020000d 2010800000201a2d .. ... .-. .... ffff880040aa5fb0: 0000000000070000 024d201000000000 ............. M. ffff880040aa5fc0: 0020001e0020000e 2010800000201c2e .. ... ... .... ffff880040aa5fd0: 0000000000070000 b650201000000000 ............. P. ffff880040aa5fe0: 0020001f0020000f 2010800000201e2f .. ... ./. .... ffff880040aa5ff0: 0000000000070000 da5b201000000000 ............. [. crash> _ dump is missing the 2 previous/next 4K virtual/real pages (inside the same 2M page!) because they are un-mapped/free.

          Seb, Yes you are right, I should have written 0xffff880612fd7960 for the location/save of %rbp value at the time of filter_get_page() call in filter_preprw_read(), and I in fact since it was not available in your crash extracts for cut+paste, I have simply used 0xffff880612fd7970 but then have forgotten to manually modify it !!!

          And BTW, the "kmem" sub-command confirms the validity of the inode address and its errors confirm the Slab corruption !!...

          Thanks for uploading the crash-dump.

          bfaccini Bruno Faccini (Inactive) added a comment - Seb, Yes you are right, I should have written 0xffff880612fd7960 for the location/save of %rbp value at the time of filter_get_page() call in filter_preprw_read(), and I in fact since it was not available in your crash extracts for cut+paste, I have simply used 0xffff880612fd7970 but then have forgotten to manually modify it !!! And BTW, the "kmem" sub-command confirms the validity of the inode address and its errors confirm the Slab corruption !!... Thanks for uploading the crash-dump.

          As seen in the stack of filter_get_page(), the base pointer of filter_preprw_read() is really 0xffff880612fd7a90 (I think you swapped the addresses in your comment with the one of the inode struct):

          #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter]
              ffff880612fd7940: 00000000531db489 000000001c4be826 
              ffff880612fd7950: ffff880612fd7c4c ffff880610f480b8 
              ffff880612fd7960: ffff880612fd7a90 ffffffffa0c4d64d 
          #10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter]
          [...]
              ffff880612fd7a80: ffff880091cab1c8 ffff8805235be548 
              ffff880612fd7a90: ffff880612fd7ad0 ffffffffa0c4dedc 
          #11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter]
          

          Looking for the struct address in the slab, it would seem the ldiskfs_inode_cache slab itself is corrupted:

          crash> kmem -s ffff880040aa57c0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad prev pointer: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad inuse counter: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad next pointer: 0
          kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280  bad s_mem pointer: 0
          kmem: ldiskfs_inode_cache: address not found in cache: ffff880040aa57c0
          kmem: 16 errors encountered
          

          I am uploading the dump as well on the ftp site. Should be available tomorrow.

          spiechurski Sebastien Piechurski added a comment - As seen in the stack of filter_get_page(), the base pointer of filter_preprw_read() is really 0xffff880612fd7a90 (I think you swapped the addresses in your comment with the one of the inode struct): #9 [ffff880612fd7938] filter_get_page at ffffffffa0c4b065 [obdfilter] ffff880612fd7940: 00000000531db489 000000001c4be826 ffff880612fd7950: ffff880612fd7c4c ffff880610f480b8 ffff880612fd7960: ffff880612fd7a90 ffffffffa0c4d64d #10 [ffff880612fd7968] filter_preprw_read at ffffffffa0c4d64d [obdfilter] [...] ffff880612fd7a80: ffff880091cab1c8 ffff8805235be548 ffff880612fd7a90: ffff880612fd7ad0 ffffffffa0c4dedc #11 [ffff880612fd7a98] filter_preprw at ffffffffa0c4dedc [obdfilter] Looking for the struct address in the slab, it would seem the ldiskfs_inode_cache slab itself is corrupted: crash> kmem -s ffff880040aa57c0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad prev pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad inuse counter: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: partial list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: full list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad next pointer: 0 kmem: ldiskfs_inode_cache: free list: slab: ffff880040aa5280 bad s_mem pointer: 0 kmem: ldiskfs_inode_cache: address not found in cache: ffff880040aa57c0 kmem: 16 errors encountered I am uploading the dump as well on the ftp site. Should be available tomorrow.

          Hello Seb,
          Thanks for your crash-dump analysis log! But can you check for me if %rbp value at the time of filter_get_page() call in filter_preprw_read() is really 0xffff880040aa57c0? It should be saved at 0xffff880612fd7970 location in stack, but it is better to confirm this by disassembling the 1st instructions of filter_get_page() ...
          Also can you check if the inode address you found (0xffff880040aa57c0) is part of one of the inodes slab area/cache ?

          Or may be you can upload the crash-dump if you want ?

          bfaccini Bruno Faccini (Inactive) added a comment - Hello Seb, Thanks for your crash-dump analysis log! But can you check for me if %rbp value at the time of filter_get_page() call in filter_preprw_read() is really 0xffff880040aa57c0? It should be saved at 0xffff880612fd7970 location in stack, but it is better to confirm this by disassembling the 1st instructions of filter_get_page() ... Also can you check if the inode address you found (0xffff880040aa57c0) is part of one of the inodes slab area/cache ? Or may be you can upload the crash-dump if you want ?
          pjones Peter Jones added a comment -

          Bruno

          Could you please look into this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Bruno Could you please look into this one? Thanks Peter

          People

            bfaccini Bruno Faccini (Inactive)
            spiechurski Sebastien Piechurski
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: