Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11122

NULL pointer dereference in fld_local_lookup()

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • None
    • lustre-2.8.2_2.chaos-1.ch6.x86_64
      4 MDTs used in DNE-1 fashion (remote dirs, no striped dirs)
      RHEL 7.5
    • 1
    • 9223372036854775807

    Description

      MDS nodes were power cycled during hardware maintenance. After they came back up, got below (some material redacted, see comments below for full console log contents):

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      IP: [<ffffffffc0cdc392>] fld_local_lookup+0x52/0x270 [fld]
      CPU: 17 PID: 180501 Comm: orph_cleanup_ls Kdump: loaded Tainted: P OE ------------ 3.10.0-862.3.2.1chaos.ch6.x86_64 #1
      Call Trace:
      [<ffffffffc06e8f6c>] ? dmu_tx_hold_object_impl+0x6c/0xc0 [zfs]
      [<ffffffffc109ff28>] osd_fld_lookup+0x48/0xd0 [osd_zfs]
      [<ffffffffc10a008a>] fid_is_on_ost+0xda/0x2f0 [osd_zfs]
      [<ffffffffc10a02e9>] osd_get_name_n_idx+0x49/0xd00 [osd_zfs]
      [<ffffffffc109902c>] ? osd_declare_attr_set+0x14c/0x730 [osd_zfs]
      [<ffffffffc0753b7e>] ? zap_lookup_by_dnode+0x2e/0x30 [zfs]
      [<ffffffffc1097510>] osd_declare_object_destroy+0xe0/0x3e0 [osd_zfs]
      [<ffffffffc1139ffe>] lod_sub_object_declare_destroy+0xce/0x2d0 [lod]
      [<ffffffffc1129700>] lod_declare_object_destroy+0x170/0x4a0 [lod]
      [<ffffffffc1513689>] ? orph_declare_index_delete+0x179/0x460 [mdd]
      [<ffffffffc1513f66>] orph_key_test_and_del+0x5f6/0xd30 [mdd]
      [<ffffffffc1514c57>] __mdd_orphan_cleanup+0x5b7/0x840 [mdd]
      [<ffffffffc15146a0>] ? orph_key_test_and_del+0xd30/0xd30 [mdd]
      [<ffffffffbb2c05f1>] kthread+0xd1/0xe0
      [<ffffffffbb2c0520>] ? insert_kthread_work+0x40/0x40
      [<ffffffffbb9438b7>] ret_from_fork_nospec_begin+0x21/0x21
      [<ffffffffbb2c0520>] ? insert_kthread_work+0x40/0x40

      Attachments

        Activity

          [LU-11122] NULL pointer dereference in fld_local_lookup()

          was it possible that umount was initiated at that time? I see that 2.8.2_2.chaos stops orphan thread in mdd_shutdown() (LCFG_CLEANUP path) while master branch stops orphan thread in the preceding phase (LCFG_PRE_CLEANUP path)

           

          bzzz Alex Zhuravlev added a comment - was it possible that umount was initiated at that time? I see that 2.8.2_2.chaos stops orphan thread in mdd_shutdown() (LCFG_CLEANUP path) while master branch stops orphan thread in the preceding phase (LCFG_PRE_CLEANUP path)  
          pjones Peter Jones added a comment -

          Alex

          What do you recommend here?

          Peter

          pjones Peter Jones added a comment - Alex What do you recommend here? Peter
          pjones Peter Jones added a comment -

          No problem. We'll discuss tomorrow whether further work could be useful to be able to avoid this kind of scenario.

          pjones Peter Jones added a comment - No problem. We'll discuss tomorrow whether further work could be useful to be able to avoid this kind of scenario.
          ofaaland Olaf Faaland added a comment -

          Thank you for responding so quickly.

          ofaaland Olaf Faaland added a comment - Thank you for responding so quickly.
          ofaaland Olaf Faaland added a comment -

          John, thanks for the suggestion.  I'll try it if this comes up again.

          ofaaland Olaf Faaland added a comment - John, thanks for the suggestion.  I'll try it if this comes up again.
          ofaaland Olaf Faaland added a comment -

          The file system is back up.  This is no longer an emergency.

          In the course of rebooting the nodes and mounting manually to get more information, the messages about orphan cleanup failing stopped appearing after recovery.  It's been about 6 minutes now since the MDTs completed recovery.

          ofaaland Olaf Faaland added a comment - The file system is back up.  This is no longer an emergency. In the course of rebooting the nodes and mounting manually to get more information, the messages about orphan cleanup failing stopped appearing after recovery.  It's been about 6 minutes now since the MDTs completed recovery.
          ofaaland Olaf Faaland added a comment -

          Lustre only, no IML.

          ofaaland Olaf Faaland added a comment - Lustre only, no IML.
          wcjohnso Will Johnson added a comment -

          Hi ofaaland,

          Is this a Lustre only system or do you also have IML installed?

          Regards,

          Will

          wcjohnso Will Johnson added a comment - Hi ofaaland , Is this a Lustre only system or do you also have IML installed? Regards, Will
          jhammond John Hammond added a comment -

          > Is there a way we can tell the MDT to skip orphan cleanup once, so it forgets about these orphaned objects? We don't really care if some unused objects are orphaned.

          You could mount the backing FS and remove the files from the PENDING directory.

          jhammond John Hammond added a comment - > Is there a way we can tell the MDT to skip orphan cleanup once, so it forgets about these orphaned objects? We don't really care if some unused objects are orphaned. You could mount the backing FS and remove the files from the PENDING directory.
          ofaaland Olaf Faaland added a comment -

          Is there a way we can tell the MDT to skip orphan cleanup once, so it forgets about these orphaned objects?  We don't really care if some unused objects are orphaned.

          ofaaland Olaf Faaland added a comment - Is there a way we can tell the MDT to skip orphan cleanup once, so it forgets about these orphaned objects?  We don't really care if some  unused objects are orphaned.
          ofaaland Olaf Faaland added a comment -

          For our code, including the tags we build from, see the lustre-release-fe-llnl project in gerrit.

           

           

          ofaaland Olaf Faaland added a comment - For our code, including the tags we build from, see the lustre-release-fe-llnl project in gerrit.    

          People

            bzzz Alex Zhuravlev
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: