Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4330

LustreError: 46336:0:(events.c:433:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback ... ) failed

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.1.6
    • None
    • 3
    • 11850

    Description

      Hi,

      More and more compute nodes of several different customer clusters are hitting an LBUG on this 'assertion failed' issue:

      2013-11-21 14:06:54 LustreError: 46336:0:(events.c:433:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback || callback == server_bulk_callback ) failed:
      2013-11-21 14:06:54 LustreError: 46336:0:(events.c:433:ptlrpc_master_callback()) LBUG
      2013-11-21 14:06:54 Nov 21 14:06:54 Pid: 46336, comm: kiblnd_sd_00
      2013-11-21 14:06:54 compute5666 kernel
      2013-11-21 14:06:54 : LustreError: 4Call Trace:
      2013-11-21 14:06:54 6336:0:(events.c [<ffffffffa041c7f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2013-11-21 14:06:54 :433:ptlrpc_mast [<ffffffffa041ce07>] lbug_with_loc+0x47/0xb0 [libcfs]
      2013-11-21 14:06:54 er_callback()) A [<ffffffffa06a426c>] ptlrpc_master_callback+0xcc/0xd0 [ptlrpc]
      2013-11-21 14:06:54 SSERTION( callba [<ffffffffa048ebd2>] lnet_enq_event_locked+0x62/0xd0 [lnet]
      2013-11-21 14:06:54 ck == request_ou [<ffffffffa048ecdb>] lnet_finalize+0x9b/0x2f0 [lnet]
      2013-11-21 14:06:54 t_callback || ca [<ffffffffa083d073>] kiblnd_recv+0x103/0x570 [ko2iblnd]
      2013-11-21 14:06:54 llback == reply_ [<ffffffffa04928dd>] lnet_ni_recv+0xad/0x2f0 [lnet]
      2013-11-21 14:06:54 in_callback || c [<ffffffffa0492c06>] lnet_recv_put+0xe6/0x120 [lnet]
      2013-11-21 14:06:54 allback == clien [<ffffffffa0499c33>] lnet_parse+0x1273/0x1b80 [lnet]
      2013-11-21 14:06:54 t_bulk_callback [<ffffffff81042ca3>] ? enqueue_task+0x43/0x90
      2013-11-21 14:06:54 || callback == r [<ffffffffa083d7ab>] kiblnd_handle_rx+0x2cb/0x680 [ko2iblnd]
      2013-11-21 14:06:54 equest_in_callba [<ffffffffa083e590>] kiblnd_rx_complete+0x2d0/0x440 [ko2iblnd]
      2013-11-21 14:06:54 ck || callback = [<ffffffff81042a63>] ? __wake_up+0x53/0x70
      2013-11-21 14:06:54 = reply_out_call [<ffffffffa083e762>] kiblnd_complete+0x62/0xe0 [ko2iblnd]
      2013-11-21 14:06:54 back || callback [<ffffffffa083eb19>] kiblnd_scheduler+0x339/0x7a0 [ko2iblnd]
      2013-11-21 14:06:54 == server_bulk_ [<ffffffff8104a320>] ? default_wake_function+0x0/0x20
      2013-11-21 14:06:54 callback ) faile [<ffffffffa083e7e0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-11-21 14:06:54 d:
      2013-11-21 14:06:54 Nov 21 14:06 [<ffffffff8100412a>] child_rip+0xa/0x20
      2013-11-21 14:06:54 :54 compute5666 ke [<ffffffffa083e7e0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-11-21 14:06:54 rnel: LustreErro [<ffffffffa083e7e0>] ? kiblnd_scheduler+0x0/0x7a0 [ko2iblnd]
      2013-11-21 14:06:54 r: 46336:0:(even [<ffffffff81004120>] ? child_rip+0x0/0x20
      2013-11-21 14:06:54 ts.c:433:ptlrpc_
      2013-11-21 14:06:54 master_callback()) LBUG
      2013-11-21 14:06:54 Nov 21 14:06:54 compute566Kernel panic - not syncing: LBUG
      

      For information, systems are running with kernel boot parameter 'tolerant=1' set.

      We have a crash dump that we will upload to ftp.

      Sebastien.

      Attachments

        Issue Links

          Activity

            [LU-4330] LustreError: 46336:0:(events.c:433:ptlrpc_master_callback()) ASSERTION( callback == request_out_callback || callback == reply_in_callback || callback == client_bulk_callback || callback == request_in_callback || callback == reply_out_callback ... ) failed

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18586/
            Subject: LU-4330 lnet: Allocate MEs and small MDs in own kmem_caches
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 9d9bb678d6b3707623845e0ce67dd7fd07a12fe9

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18586/ Subject: LU-4330 lnet: Allocate MEs and small MDs in own kmem_caches Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9d9bb678d6b3707623845e0ce67dd7fd07a12fe9

            James,
            Thanks for your input.
            But well, I can not tell for this old case/crash, but on my side I have seen recent <size-128> Slabs corruption cases that are ext4/ldiskfs related (see LU-7980).

            bfaccini Bruno Faccini (Inactive) added a comment - James, Thanks for your input. But well, I can not tell for this old case/crash, but on my side I have seen recent <size-128> Slabs corruption cases that are ext4/ldiskfs related (see LU-7980 ).

            I think I might know what the problem is. A recent patch for ko2iblnd in the upstream kernel landed that exposed a serious memory corruption.
            The commit is 3d1477309806459d39e13d8c3206ba35d183c34a "Replace sg++ with sg = sg_next(sg)" The scatter gather list is from
            tx->tx_frags which is IBLND_MAX_RDMA_FRAGS in size. Since you write at an offset into the tx_frags that mean you really need IBLND_MAX_RDMA_FRAGS + 1 in size for the frags. Currently the upstream clients will crash when you attempt to access scatter list entry IBLND_MAX_RDMA_FRAGS + 1.

            simmonsja James A Simmons added a comment - I think I might know what the problem is. A recent patch for ko2iblnd in the upstream kernel landed that exposed a serious memory corruption. The commit is 3d1477309806459d39e13d8c3206ba35d183c34a "Replace sg++ with sg = sg_next(sg)" The scatter gather list is from tx->tx_frags which is IBLND_MAX_RDMA_FRAGS in size. Since you write at an offset into the tx_frags that mean you really need IBLND_MAX_RDMA_FRAGS + 1 in size for the frags. Currently the upstream clients will crash when you attempt to access scatter list entry IBLND_MAX_RDMA_FRAGS + 1.

            Bruno, if you update those patches for newer kernels, please submit the new patches into lustre/kernel_patches/patches so they are available for use in the future, since bugzilla may disappear at some point.

            adilger Andreas Dilger added a comment - Bruno, if you update those patches for newer kernels, please submit the new patches into lustre/kernel_patches/patches so they are available for use in the future, since bugzilla may disappear at some point.
            adilger Andreas Dilger added a comment - - edited

            The scary thing is that there would continue to be random memory corruptions in the size-128 slab, but they will just be corrupting some other part of memory.

            If this problem can be found in a relatively short amount of testing time, then there are debugging patches available that could be applied to the kernel to make all kmalloc() calls actually map to vmalloc() internally and have vmalloc() always use a new memory address, and then when the memory is freed the page is unmapped and the address never used again. If another thread is incorrectly accessing an unmapped address (use after free) it will fault and then the source of the corruption may be found. Unfortunately, this impacts the performance and can only be used for debugging and not in production.

            Patches are available in https://bugzilla.lustre.org/show_bug.cgi?id=22471 but the would likely need to be updated for newer kernels. They can definitely help find memory corruption problems that are otherwise very difficult to find.

            adilger Andreas Dilger added a comment - - edited The scary thing is that there would continue to be random memory corruptions in the size-128 slab, but they will just be corrupting some other part of memory. If this problem can be found in a relatively short amount of testing time, then there are debugging patches available that could be applied to the kernel to make all kmalloc() calls actually map to vmalloc() internally and have vmalloc() always use a new memory address, and then when the memory is freed the page is unmapped and the address never used again. If another thread is incorrectly accessing an unmapped address (use after free) it will fault and then the source of the corruption may be found. Unfortunately, this impacts the performance and can only be used for debugging and not in production. Patches are available in https://bugzilla.lustre.org/show_bug.cgi?id=22471 but the would likely need to be updated for newer kernels. They can definitely help find memory corruption problems that are otherwise very difficult to find.

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/18586
            Subject: LU-4330 lnet: Allocate MEs and small MDs in own kmem_caches
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 11e4b34d75505476f363c1cf4400755a7f30f766

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/18586 Subject: LU-4330 lnet: Allocate MEs and small MDs in own kmem_caches Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 11e4b34d75505476f363c1cf4400755a7f30f766
            hugo_meiland Hugo Meiland added a comment -

            Hi Bruno, no disfunction seen, the fix seems to solve the issue completely; thanks! Hugo

            hugo_meiland Hugo Meiland added a comment - Hi Bruno, no disfunction seen, the fix seems to solve the issue completely; thanks! Hugo

            Hello Hugo, thank's for your feedback!
            The fact that both asserts for this ticket and LU-3848 disappeared seems to confirm what I suspected from the beginning, that LNET/Lustre were not involved with the corruptions but only victims. BTW, did you notice any other sub-system (networking, ...) disfunction since ?

            On the other-hand I am now working on a more generic (not based on hard-coded 128 bytes length) patch version to address Liang's last input, and also answer to Isaac comments too. Ans also push a master version as Isaac requested.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Hugo, thank's for your feedback! The fact that both asserts for this ticket and LU-3848 disappeared seems to confirm what I suspected from the beginning, that LNET/Lustre were not involved with the corruptions but only victims. BTW, did you notice any other sub-system (networking, ...) disfunction since ? On the other-hand I am now working on a more generic (not based on hard-coded 128 bytes length) patch version to address Liang's last input, and also answer to Isaac comments too. Ans also push a master version as Isaac requested.
            hugo_meiland Hugo Meiland added a comment -

            As an update: the fix has now been installed for a couple of weeks and no more of these LBUG's have been seen, so the fix looks good...

            hugo_meiland Hugo Meiland added a comment - As an update: the fix has now been installed for a couple of weeks and no more of these LBUG's have been seen, so the fix looks good...

            I need to indicate that after successful auto-tests, I made local extensive testing of http://review.whamcloud.com/8819/ and I can confirm it works as expected and with no MEs/MDs leak in Slabs.

            bfaccini Bruno Faccini (Inactive) added a comment - I need to indicate that after successful auto-tests, I made local extensive testing of http://review.whamcloud.com/8819/ and I can confirm it works as expected and with no MEs/MDs leak in Slabs.

            Hello Hugo,
            Thanks again for your help on this!
            I just had a look into the recent crash-dump you uploaded in LU4330-tcn82-vmcore-vmlinux-systemmap-weak-updates.zip, and I can confirm this the same scenario, with the same/-3 corruption of MD's md_user_ptr field/pointer.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Hugo, Thanks again for your help on this! I just had a look into the recent crash-dump you uploaded in LU4330-tcn82-vmcore-vmlinux-systemmap-weak-updates.zip, and I can confirm this the same scenario, with the same/-3 corruption of MD's md_user_ptr field/pointer.

            People

              bfaccini Bruno Faccini (Inactive)
              sebastien.buisson Sebastien Buisson (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: