Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5859

Running lfs changelog with no registered user results in LBUG

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0
    • Lustre 2.7.0
    • None
    • Seen on master b6a3222, Robert confirmed he saw it on a master build from yesterday.
    • 3
    • 16400

    Description

      See the attached reproducer script for details.

      In short, running lfs changelog after deregistering the last registered user results in an LBUG on umount of the client. While it is obviously a user error to run lfs changelog with no registered users, it shouldn't result in an LBUG.

      Attachments

        Issue Links

          Activity

            [LU-5859] Running lfs changelog with no registered user results in LBUG
            pjones Peter Jones added a comment -

            Landed for 2.7

            pjones Peter Jones added a comment - Landed for 2.7

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13414/
            Subject: LU-5859 llog: do not cleanup orphans in remote catalogs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 97df2f4cae374130c057cbf1168ad1427c96cbc5

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13414/ Subject: LU-5859 llog: do not cleanup orphans in remote catalogs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 97df2f4cae374130c057cbf1168ad1427c96cbc5

            Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/13414
            Subject: LU-5859 llog: do not cleanup orphans in remote catalogs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c418baa20e78da556916f48d5dc407fa6e4aab50

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/13414 Subject: LU-5859 llog: do not cleanup orphans in remote catalogs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c418baa20e78da556916f48d5dc407fa6e4aab50

            Michael, looking at that now, thanks for reminding.

            bzzz Alex Zhuravlev added a comment - Michael, looking at that now, thanks for reminding.

            Hi. Just looking for a status update on this ticket. Seems like it should be a blocker... Any input, bzzz?

            mjmac Michael MacDonald (Inactive) added a comment - - edited Hi. Just looking for a status update on this ticket. Seems like it should be a blocker... Any input, bzzz ?

            Alex, could you please offer some advice on how to fix this problem? It was apparently introduced by your patch http://review.whamcloud.com/10308

            adilger Andreas Dilger added a comment - Alex, could you please offer some advice on how to fix this problem? It was apparently introduced by your patch http://review.whamcloud.com/10308

            I confirm that LU-5038 seems to be the culprit. I reverted the commit in a local build and my reproducer script no longer causes an LBUG.

            mjmac Michael MacDonald (Inactive) added a comment - I confirm that LU-5038 seems to be the culprit. I reverted the commit in a local build and my reproducer script no longer causes an LBUG.
            rread Robert Read added a comment -

            I'm also seeing an LBUG when unmounting the client after testing changelogs.

            rread Robert Read added a comment - I'm also seeing an LBUG when unmounting the client after testing changelogs.

            I was bisecting it to determine whether LU-1996 was guilty or not. Seems to have been introduced by LU-5038 instead

            I have a stack here:
            crash> bt
            PID: 3610 TASK: ffff8801fea80ae0 CPU: 5 COMMAND: "mdc_clg_send_th"
            #0 [ffff88020be279f8] machine_kexec at ffffffff8103900b
            #1 [ffff88020be27a58] crash_kexec at ffffffff810c62c2
            #2 [ffff88020be27b28] panic at ffffffff8152896e
            #3 [ffff88020be27ba8] lbug_with_loc at ffffffffa0297eeb [libcfs]
            #4 [ffff88020be27bc8] llog_write at ffffffffa03d6ae4 [obdclass]
            #5 [ffff88020be27c18] llog_cancel_rec at ffffffffa03d6c8f [obdclass]
            #6 [ffff88020be27c68] llog_cat_cleanup at ffffffffa03db89c [obdclass]
            #7 [ffff88020be27c98] llog_cat_process_cb at ffffffffa03dc7fd [obdclass]
            #8 [ffff88020be27cf8] llog_process_thread at ffffffffa03d7b1f [obdclass]
            #9 [ffff88020be27da8] llog_process_or_fork at ffffffffa03d9817 [obdclass]
            #10 [ffff88020be27df8] llog_cat_process_or_fork at ffffffffa03dab7d [obdclass]
            #11 [ffff88020be27e88] llog_cat_process at ffffffffa03dace9 [obdclass]
            #12 [ffff88020be27ea8] mdc_changelog_send_thread at ffffffffa08db54b [mdc]
            #13 [ffff88020be27ee8] kthread at ffffffff8109af86
            #14 [ffff88020be27f48] kernel_thread at ffffffff8100c20a

            hdoreau Henri Doreau (Inactive) added a comment - I was bisecting it to determine whether LU-1996 was guilty or not. Seems to have been introduced by LU-5038 instead I have a stack here: crash> bt PID: 3610 TASK: ffff8801fea80ae0 CPU: 5 COMMAND: "mdc_clg_send_th" #0 [ffff88020be279f8] machine_kexec at ffffffff8103900b #1 [ffff88020be27a58] crash_kexec at ffffffff810c62c2 #2 [ffff88020be27b28] panic at ffffffff8152896e #3 [ffff88020be27ba8] lbug_with_loc at ffffffffa0297eeb [libcfs] #4 [ffff88020be27bc8] llog_write at ffffffffa03d6ae4 [obdclass] #5 [ffff88020be27c18] llog_cancel_rec at ffffffffa03d6c8f [obdclass] #6 [ffff88020be27c68] llog_cat_cleanup at ffffffffa03db89c [obdclass] #7 [ffff88020be27c98] llog_cat_process_cb at ffffffffa03dc7fd [obdclass] #8 [ffff88020be27cf8] llog_process_thread at ffffffffa03d7b1f [obdclass] #9 [ffff88020be27da8] llog_process_or_fork at ffffffffa03d9817 [obdclass] #10 [ffff88020be27df8] llog_cat_process_or_fork at ffffffffa03dab7d [obdclass] #11 [ffff88020be27e88] llog_cat_process at ffffffffa03dace9 [obdclass] #12 [ffff88020be27ea8] mdc_changelog_send_thread at ffffffffa08db54b [mdc] #13 [ffff88020be27ee8] kthread at ffffffff8109af86 #14 [ffff88020be27f48] kernel_thread at ffffffff8100c20a

            Mike, can you please paste the whole stack into the bug?

            adilger Andreas Dilger added a comment - Mike, can you please paste the whole stack into the bug?

            People

              bfaccini Bruno Faccini (Inactive)
              mjmac Michael MacDonald (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: