Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5202

LFSCK 5: LFSCK needs to log all changes and errors found

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.5.5
    • 14526

    Description

      LFSCK needs to log with D_LFSCK all fixes that it makes and any inconsistencies that it finds that it does not repair (e.g. unknown LOV magic layouts). Otherwise it will be making secret changes to the filesystem and when there are problems they will be impossible to debug.

      There should be a mechanism for logging D_LFSCK messages to a separate log file for administrators to review, so that the kernel debug messages are not lost. A simple mechanism would be debug_daemon to log all messages to a file, then "lctl filter all; lctl show lfsck" (or similar) to filter all except lfsck messages into a text log file.

      The problem with this approach is that debug_daemon will consume all debug messages while it is running, and it will log a lot more than just D_LFSCK messages to disk. We may want to consider some other logging mechanism to capture just the D_LFSCK messages.

      Attachments

        Issue Links

          Activity

            [LU-5202] LFSCK 5: LFSCK needs to log all changes and errors found

            What you've got there is belt and suspenders. The printk setting will itself dump all lfsck messages to the console (dmesg & /var/log/messages). If you don't want that, then debug_daemon is the alternative. The filter setting is indeed irrelevant since all the subsystems are already enabled by default.

            adilger Andreas Dilger added a comment - What you've got there is belt and suspenders. The printk setting will itself dump all lfsck messages to the console (dmesg & /var/log/messages). If you don't want that, then debug_daemon is the alternative. The filter setting is indeed irrelevant since all the subsystems are already enabled by default.
            ndauchy Nathan Dauchy (Inactive) added a comment - - edited

            While we are waiting for a revised patch and new feature... can you please clarify the procedure for gathering the logs and perhaps deciphering them?

            What I have been able to come up with so far is...

             lctl set_param printk=+lfsck
             lctl filter all_subs
             lctl show lfsck
             lctl clear
             lctl debug_daemon start /var/log/lfsck.debug
             
             lctl lfsck_start -A -M nbptest-MDT0000 -c on -C on -o
             lctl get_param -n osd-ldiskfs.*.oi_scrub | grep status
             
             lctl debug_daemon stop
             lctl set_param printk=-lfsck
             lctl debug_file /var/log/lfsck.debug | grep lfsck | egrep -v "(namespace LFSCK add flag| RPC pname)" > /var/log/lfsck.log
             lctl show all_subs
            

            I'm not sure that the "filter" commands are actually doing anything useful though. And I would like to know if that method leaves out anything useful from the lfsck.log file.

            ndauchy Nathan Dauchy (Inactive) added a comment - - edited While we are waiting for a revised patch and new feature... can you please clarify the procedure for gathering the logs and perhaps deciphering them? What I have been able to come up with so far is... lctl set_param printk=+lfsck lctl filter all_subs lctl show lfsck lctl clear lctl debug_daemon start /var/log/lfsck.debug lctl lfsck_start -A -M nbptest-MDT0000 -c on -C on -o lctl get_param -n osd-ldiskfs.*.oi_scrub | grep status lctl debug_daemon stop lctl set_param printk=-lfsck lctl debug_file /var/log/lfsck.debug | grep lfsck | egrep -v "(namespace LFSCK add flag| RPC pname)" > /var/log/lfsck.log lctl show all_subs I'm not sure that the "filter" commands are actually doing anything useful though. And I would like to know if that method leaves out anything useful from the lfsck.log file.

            The patch current has two negative reviews and has failed Maloo - we cannot use this on soak test in this state.
            Is there a proper fix that is going to land somewhere?

            Sorry, there is not suitable patch yet, we need more consideration for that.

            yong.fan nasf (Inactive) added a comment - The patch current has two negative reviews and has failed Maloo - we cannot use this on soak test in this state. Is there a proper fix that is going to land somewhere? Sorry, there is not suitable patch yet, we need more consideration for that.

            The patch current has two negative reviews and has failed Maloo - we cannot use this on soak test in this state.
            Is there a proper fix that is going to land somewhere?

            cliffw Cliff White (Inactive) added a comment - The patch current has two negative reviews and has failed Maloo - we cannot use this on soak test in this state. Is there a proper fix that is going to land somewhere?

            The patch 13864 is used as temporary solution for lola test.

            yong.fan nasf (Inactive) added a comment - The patch 13864 is used as temporary solution for lola test.

            I've just been using lctl set_param printk=+lfsck to have lfsck messages go to the console. That is ok with master, but b2_5 needs to be fixed so all the OI scrub status messages are not using D_LFSCK.

            adilger Andreas Dilger added a comment - I've just been using lctl set_param printk=+lfsck to have lfsck messages go to the console. That is ok with master, but b2_5 needs to be fixed so all the OI scrub status messages are not using D_LFSCK.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13864
            Subject: LU-5202 lfsck: dump LFSCK debug log automatically
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 06777610a8e5b16cfb66cd0a54bc616e965c28d2

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13864 Subject: LU-5202 lfsck: dump LFSCK debug log automatically Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 06777610a8e5b16cfb66cd0a54bc616e965c28d2

            In the meantime, it would be pretty useful to have a procfs tunable to dump lustre debug logs when an inconsistency is found by lfsck (similar to dump_on_eviction). It would help us debugging on lola.

            johann Johann Lombardi (Inactive) added a comment - In the meantime, it would be pretty useful to have a procfs tunable to dump lustre debug logs when an inconsistency is found by lfsck (similar to dump_on_eviction). It would help us debugging on lola.

            On 2.5.3-ge835226 I ran lctl lfsck_start -M myth-MDT0000 -t namespace and enabled D_LFSCK printing to the console via lctl set_param printk=lfsck. This generated a lot of console messages that weren't very useful:

            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621438, rc = 0
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621439, rc = 0
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621440, rc = 0
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621441, rc = 1
            Jan  4 21:36:33 mookie kernel: Lustre: 3403:0:(osd_scrub.c:1301:osd_scrub_main()) OI scrub: stop, rc = 1, pos = 2621441
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(lfsck_engine.c:340:lfsck_master_engine()) LFSCK exit: oit_flags = 0xc0002, dir_flags = 0xc004, oit_cookie = 2130251, dir_cookie = 3397299287892376453, parent = [0x20814b:0x17d4596b:0x0], pid 
            = 3402, rc = 1
            

            This should be fixed for b2_5.

            For 2.7, we discussed adding lctl lfsck_start -v and -q options to enable the D_LFSCK messages to print to the console or not. Also, major messages like start/stop, urgent scrub, and first message of non-upgrade error should always be printed to the console via D_CONSOLE.

            adilger Andreas Dilger added a comment - On 2.5.3-ge835226 I ran lctl lfsck_start -M myth-MDT0000 -t namespace and enabled D_LFSCK printing to the console via lctl set_param printk=lfsck . This generated a lot of console messages that weren't very useful: Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621438, rc = 0 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621439, rc = 0 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621440, rc = 0 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621441, rc = 1 Jan 4 21:36:33 mookie kernel: Lustre: 3403:0:(osd_scrub.c:1301:osd_scrub_main()) OI scrub: stop, rc = 1, pos = 2621441 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(lfsck_engine.c:340:lfsck_master_engine()) LFSCK exit: oit_flags = 0xc0002, dir_flags = 0xc004, oit_cookie = 2130251, dir_cookie = 3397299287892376453, parent = [0x20814b:0x17d4596b:0x0], pid = 3402, rc = 1 This should be fixed for b2_5. For 2.7, we discussed adding lctl lfsck_start -v and -q options to enable the D_LFSCK messages to print to the console or not. Also, major messages like start/stop, urgent scrub, and first message of non-upgrade error should always be printed to the console via D_CONSOLE.

            Because LU-4610 needs to be closed for 2.6.0 to be released, and it isn't likely that this code will be landed for 2.6.0.

            adilger Andreas Dilger added a comment - Because LU-4610 needs to be closed for 2.6.0 to be released, and it isn't likely that this code will be landed for 2.6.0.

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: