Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2981

sanity.sh test_17m test_77i: oops in ptlrpc_server_hpreq_fini

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.4.0
    • 3
    • 7264

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/1000ccce-8f3b-11e2-aa82-52540035b04c.

      The sub-test test_77i failed with the following error:

      22:42:38:BUG: unable to handle kernel NULL pointer dereference at 0000000000000228
      22:42:39:IP: [<ffffffffa075dc77>] ptlrpc_server_hpreq_fini+0x27/0x160 [ptlrpc]
      22:42:41:Pid: 13574, comm: obd_zombid Not tainted 2.6.32-279.19.1.el6.x86_64 #1 Red Hat KVM
      22:42:44:Call Trace:
      22:42:44: [<ffffffffa0760dc9>] ptlrpc_unregister_service+0x4a9/0x10b0 [ptlrpc]
      22:42:44: [<ffffffffa04402e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      22:42:44: [<ffffffffa07367b2>] ldlm_cleanup+0x262/0x4f0 [ptlrpc]
      22:42:44: [<ffffffffa0736b65>] ldlm_put_ref+0x125/0x1a0 [ptlrpc]
      22:42:45: [<ffffffffa072a67d>] client_obd_cleanup+0x4d/0x120 [ptlrpc]
      22:42:45: [<ffffffffa0aa7133>] mgc_cleanup+0x53/0x130 [mgc]
      22:42:45: [<ffffffffa05ce012>] class_decref+0x212/0x580 [obdclass]
      22:42:45: [<ffffffffa04402e1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      22:42:47: [<ffffffffa05abff9>] obd_zombie_impexp_cull+0x309/0x5d0 [obdclass]
      22:42:47: [<ffffffffa05ac385>] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass]
      22:42:47: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
      22:42:47: [<ffffffffa05ac2c0>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
      22:42:48: [<ffffffff8100c0ca>] child_rip+0xa/0x20
      22:42:48: [<ffffffffa05ac2c0>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
      22:42:48: [<ffffffffa05ac2c0>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass]
      22:42:48: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Info required for matching: sanity 77i

      Attachments

        Issue Links

          Activity

            [LU-2981] sanity.sh test_17m test_77i: oops in ptlrpc_server_hpreq_fini
            pjones Peter Jones added a comment -

            ok so the extra tidy up from LU-298 that Nikitas believes should resolve this issue has now landed so I am closing this ticket. We can reopen it if there are still problems with this change in place.

            pjones Peter Jones added a comment - ok so the extra tidy up from LU-298 that Nikitas believes should resolve this issue has now landed so I am closing this ticket. We can reopen it if there are still problems with this change in place.

            Xyratex has patch for this issue, probably, I will submit it to master in a few days.

            aboyko Alexander Boyko added a comment - Xyratex has patch for this issue, probably, I will submit it to master in a few days.
            pjones Peter Jones added a comment -

            Nikitas

            Ah great - that timeframe works fine to get testing completed over the weekend. Obviously we want the version with the latest changes. I guess that I underestimated the end of your work day

            Peter

            pjones Peter Jones added a comment - Nikitas Ah great - that timeframe works fine to get testing completed over the weekend. Obviously we want the version with the latest changes. I guess that I underestimated the end of your work day Peter

            Hi Peter, please proceed to rebase the patch if you want to get a clean test run, though I was planning to refresh the patch at end of day today (so in 5-7 hours from now) after I included some additional changes.

            nangelinas Nikitas Angelinas added a comment - Hi Peter, please proceed to rebase the patch if you want to get a clean test run, though I was planning to refresh the patch at end of day today (so in 5-7 hours from now) after I included some additional changes.
            pjones Peter Jones added a comment -

            Thanks Nikitas! As it seems that you are not around to do so (I appreciate it is late on a Friday in the UK) Bob is going to rebase this patch to avoid the LU-2910 failure in the hope that we can get a clean test run over the weekend

            pjones Peter Jones added a comment - Thanks Nikitas! As it seems that you are not around to do so (I appreciate it is late on a Friday in the UK) Bob is going to rebase this patch to avoid the LU-2910 failure in the hope that we can get a clean test run over the weekend

            I think this must be due to the NRS framework follow-up patch itself, as the version that fired those bugs had some important parts missing. I have just updated that patch and this new version should address this ticket.

            nangelinas Nikitas Angelinas added a comment - I think this must be due to the NRS framework follow-up patch itself, as the version that fired those bugs had some important parts missing. I have just updated that patch and this new version should address this ticket.

            Could this relate to NRS? The patch http://review.whamcloud.com/5665 was landed on the 12th.

            adilger Andreas Dilger added a comment - Could this relate to NRS? The patch http://review.whamcloud.com/5665 was landed on the 12th.

            sanity.sh test_77i has failed 10 times and test_17m once in the past 4 weeks, but only starting 2013-03-12.

            adilger Andreas Dilger added a comment - sanity.sh test_77i has failed 10 times and test_17m once in the past 4 weeks, but only starting 2013-03-12.
            adilger Andreas Dilger added a comment - - edited

            Another hit in in ptlrpc_unregister_service() though this time from an unmount:

            https://maloo.whamcloud.com/test_sets/018d9cec-8d58-11e2-bb99-52540035b04c

            19:19:04:RIP: 0010:[<ffffffffa07edc27>]  [<ffffffffa07edc27>] ptlrpc_server_hpreq_fini+0x27/0x160 [ptlrpc]
            19:19:05:Process umount (pid: 11123, threadinfo ffff880069806000, task ffff88005
            19:19:05:Call Trace:
            19:19:05: [<ffffffffa07f0d79>] ptlrpc_unregister_service+0x4a9/0x10b0 [ptlrpc]
            19:19:05: [<ffffffff81052223>] ? __wake_up+0x53/0x70
            19:19:05: [<ffffffffa0de49fe>] mgs_device_fini+0xee/0x5a0 [mgs]
            19:19:06: [<ffffffffa06489c7>] class_cleanup+0x577/0xda0 [obdclass]
            19:19:06: [<ffffffffa061dd36>] ? class_name2dev+0x56/0xe0 [obdclass]
            19:19:06: [<ffffffffa064a2ac>] class_process_config+0x10bc/0x1c80 [obdclass]
            19:19:06: [<ffffffffa0643ad3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
            19:19:06: [<ffffffffa064afe9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
            19:19:06: [<ffffffffa061dd36>] ? class_name2dev+0x56/0xe0 [obdclass]
            19:19:06: [<ffffffffa0657a3d>] server_put_super+0x46d/0xf00 [obdclass]
            19:19:06: [<ffffffff811785ab>] generic_shutdown_super+0x5b/0xe0
            19:19:06: [<ffffffff81178696>] kill_anon_super+0x16/0x60
            19:19:07: [<ffffffffa064ce46>] lustre_kill_super+0x36/0x60 [obdclass]
            19:19:07: [<ffffffff81179670>] deactivate_super+0x70/0x90
            19:19:07: [<ffffffff811955cf>] mntput_no_expire+0xbf/0x110
            19:19:07: [<ffffffff81195f2b>] sys_umount+0x7b/0x3a0
            19:19:07: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            
            adilger Andreas Dilger added a comment - - edited Another hit in in ptlrpc_unregister_service() though this time from an unmount: https://maloo.whamcloud.com/test_sets/018d9cec-8d58-11e2-bb99-52540035b04c 19:19:04:RIP: 0010:[<ffffffffa07edc27>] [<ffffffffa07edc27>] ptlrpc_server_hpreq_fini+0x27/0x160 [ptlrpc] 19:19:05:Process umount (pid: 11123, threadinfo ffff880069806000, task ffff88005 19:19:05:Call Trace: 19:19:05: [<ffffffffa07f0d79>] ptlrpc_unregister_service+0x4a9/0x10b0 [ptlrpc] 19:19:05: [<ffffffff81052223>] ? __wake_up+0x53/0x70 19:19:05: [<ffffffffa0de49fe>] mgs_device_fini+0xee/0x5a0 [mgs] 19:19:06: [<ffffffffa06489c7>] class_cleanup+0x577/0xda0 [obdclass] 19:19:06: [<ffffffffa061dd36>] ? class_name2dev+0x56/0xe0 [obdclass] 19:19:06: [<ffffffffa064a2ac>] class_process_config+0x10bc/0x1c80 [obdclass] 19:19:06: [<ffffffffa0643ad3>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] 19:19:06: [<ffffffffa064afe9>] class_manual_cleanup+0x179/0x6f0 [obdclass] 19:19:06: [<ffffffffa061dd36>] ? class_name2dev+0x56/0xe0 [obdclass] 19:19:06: [<ffffffffa0657a3d>] server_put_super+0x46d/0xf00 [obdclass] 19:19:06: [<ffffffff811785ab>] generic_shutdown_super+0x5b/0xe0 19:19:06: [<ffffffff81178696>] kill_anon_super+0x16/0x60 19:19:07: [<ffffffffa064ce46>] lustre_kill_super+0x36/0x60 [obdclass] 19:19:07: [<ffffffff81179670>] deactivate_super+0x70/0x90 19:19:07: [<ffffffff811955cf>] mntput_no_expire+0xbf/0x110 19:19:07: [<ffffffff81195f2b>] sys_umount+0x7b/0x3a0 19:19:07: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: