Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5202

LFSCK 5: LFSCK needs to log all changes and errors found

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.6.0, Lustre 2.5.5
    • 14526

    Description

      LFSCK needs to log with D_LFSCK all fixes that it makes and any inconsistencies that it finds that it does not repair (e.g. unknown LOV magic layouts). Otherwise it will be making secret changes to the filesystem and when there are problems they will be impossible to debug.

      There should be a mechanism for logging D_LFSCK messages to a separate log file for administrators to review, so that the kernel debug messages are not lost. A simple mechanism would be debug_daemon to log all messages to a file, then "lctl filter all; lctl show lfsck" (or similar) to filter all except lfsck messages into a text log file.

      The problem with this approach is that debug_daemon will consume all debug messages while it is running, and it will log a lot more than just D_LFSCK messages to disk. We may want to consider some other logging mechanism to capture just the D_LFSCK messages.

      Attachments

        Issue Links

          Activity

            [LU-5202] LFSCK 5: LFSCK needs to log all changes and errors found

            The patch 13864 is used as temporary solution for lola test.

            yong.fan nasf (Inactive) added a comment - The patch 13864 is used as temporary solution for lola test.

            I've just been using lctl set_param printk=+lfsck to have lfsck messages go to the console. That is ok with master, but b2_5 needs to be fixed so all the OI scrub status messages are not using D_LFSCK.

            adilger Andreas Dilger added a comment - I've just been using lctl set_param printk=+lfsck to have lfsck messages go to the console. That is ok with master, but b2_5 needs to be fixed so all the OI scrub status messages are not using D_LFSCK.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13864
            Subject: LU-5202 lfsck: dump LFSCK debug log automatically
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 06777610a8e5b16cfb66cd0a54bc616e965c28d2

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13864 Subject: LU-5202 lfsck: dump LFSCK debug log automatically Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 06777610a8e5b16cfb66cd0a54bc616e965c28d2

            In the meantime, it would be pretty useful to have a procfs tunable to dump lustre debug logs when an inconsistency is found by lfsck (similar to dump_on_eviction). It would help us debugging on lola.

            johann Johann Lombardi (Inactive) added a comment - In the meantime, it would be pretty useful to have a procfs tunable to dump lustre debug logs when an inconsistency is found by lfsck (similar to dump_on_eviction). It would help us debugging on lola.

            On 2.5.3-ge835226 I ran lctl lfsck_start -M myth-MDT0000 -t namespace and enabled D_LFSCK printing to the console via lctl set_param printk=lfsck. This generated a lot of console messages that weren't very useful:

            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621438, rc = 0
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621439, rc = 0
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621440, rc = 0
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621441, rc = 1
            Jan  4 21:36:33 mookie kernel: Lustre: 3403:0:(osd_scrub.c:1301:osd_scrub_main()) OI scrub: stop, rc = 1, pos = 2621441
            Jan  4 21:36:33 mookie kernel: Lustre: 3402:0:(lfsck_engine.c:340:lfsck_master_engine()) LFSCK exit: oit_flags = 0xc0002, dir_flags = 0xc004, oit_cookie = 2130251, dir_cookie = 3397299287892376453, parent = [0x20814b:0x17d4596b:0x0], pid 
            = 3402, rc = 1
            

            This should be fixed for b2_5.

            For 2.7, we discussed adding lctl lfsck_start -v and -q options to enable the D_LFSCK messages to print to the console or not. Also, major messages like start/stop, urgent scrub, and first message of non-upgrade error should always be printed to the console via D_CONSOLE.

            adilger Andreas Dilger added a comment - On 2.5.3-ge835226 I ran lctl lfsck_start -M myth-MDT0000 -t namespace and enabled D_LFSCK printing to the console via lctl set_param printk=lfsck . This generated a lot of console messages that weren't very useful: Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621438, rc = 0 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621439, rc = 0 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621440, rc = 0 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(osd_scrub.c:1240:osd_otable_it_preload()) OSD pre-loaded: max = 2621440, preload = 2621441, rc = 1 Jan 4 21:36:33 mookie kernel: Lustre: 3403:0:(osd_scrub.c:1301:osd_scrub_main()) OI scrub: stop, rc = 1, pos = 2621441 Jan 4 21:36:33 mookie kernel: Lustre: 3402:0:(lfsck_engine.c:340:lfsck_master_engine()) LFSCK exit: oit_flags = 0xc0002, dir_flags = 0xc004, oit_cookie = 2130251, dir_cookie = 3397299287892376453, parent = [0x20814b:0x17d4596b:0x0], pid = 3402, rc = 1 This should be fixed for b2_5. For 2.7, we discussed adding lctl lfsck_start -v and -q options to enable the D_LFSCK messages to print to the console or not. Also, major messages like start/stop, urgent scrub, and first message of non-upgrade error should always be printed to the console via D_CONSOLE.

            Because LU-4610 needs to be closed for 2.6.0 to be released, and it isn't likely that this code will be landed for 2.6.0.

            adilger Andreas Dilger added a comment - Because LU-4610 needs to be closed for 2.6.0 to be released, and it isn't likely that this code will be landed for 2.6.0.

            Why not use LU-4610?

            yong.fan nasf (Inactive) added a comment - Why not use LU-4610 ?

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: