Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13304

add testing for LFSCK DRYRUN

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      I noticed in a patch review that there are places in lfsck_layout.c (and probably others) where lb_param & LPF_DRYRUN is not being checked before modifying the filesystem. For example, lfsck_layout_ins_dangling_rec(), lfsck_layout_del_dangling_rec(), lfsck_layout_refill_lovea(), __lfsck_layout_update_pfid(), lfsck_layout_recreate_parent(), etc. While it is possible that those functions are skipped at a higher level when DRYRUN is set, it would be prudent to add checks and skip any filesystem-modifying code at the level where those changes are being done, maybe with a CDEBUG(D_LFSCK, ...) message so that the potential fix appears in the debug log (LU-9548). That makes it very clear to the reader/developer that there is no chance those modifications would be done.

      We should do a lot to improve the testing of DRYRUN during LFSCK to make sure that this is being handled correctly, and not change the filesystem when it is asked to only do a check.

      In addition to adding checks for DRYRUN in all of the modifying functions (even if this is redundant), some suggestions to detect problems during testing would include:

      • run each test with "lctl lfsck_start --dryrun" first, then verify the problem was not fixed until the second (non-DRYRUN) run
      • set "lctl printk=+lfsck" during sanity-lfsck.sh and grab the D_LFSCK output from all nodes during each subtest, use "sed" to drop the run-unique output like timestamps, PID, FID, then compare the output with "expect" to ensure that the right thing is being done for each subtest
      • add a lfsck_declare_trans_start() wrapper for dt_declare_trans_start() (and/or lfsck_trans_start() for dt_trans_start()) that prints and returns an error if called when DRYRUN is set. That might be too much, if there are "empty" transactions created when DRYRUN is set, but it wouldn't be terrible to avoid that completely.
      • if the above is too broken, add lfsck_* wrappers for all of the dt_* modifying operations to return an error if called with DRYRUN set. That would handle the case of dt_trans_start() being called when --dryrun is set, but is not as good a solution as avoiding it completely, since there is still overhead from calling dt_trans_start() for empty transactions, and more danger for the filesystem if something is missed.

      Attachments

        Activity

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: