Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3420

OI scrubbing could not automatically engage after restoring a secondary MDT from a (file-level) backup

Details

    • 3
    • 8481

    Description

      When adapting sanity-scrub 4 to exercise not only MDT 0 but also the secondary MDTs, I found that, after restoring a secondary MDT from its file-level backup, looking up corresponding "remote" directory would return ENOENT on clients:

      [root@linux tests]# ls /mnt/lustre/d0.sanity-scrub/d4/mdt1
      ls: cannot access /mnt/lustre/d0.sanity-scrub/d4/mdt1: No such file or directory

      "mdt1" was created by "lfs mkdir -i 1". And, OI scrubbing did not engage automatically:

      [root@linux tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0001/oi_scrub
      name: OI_scrub
      magic: 0x4c5fd252
      oi_files: 64
      status: init
      flags: inconsistent
      param:
      time_since_last_completed: N/A
      time_since_latest_start: N/A
      time_since_last_checkpoint: N/A
      latest_start_position: N/A
      last_checkpoint_position: N/A
      first_failure_position: N/A
      checked: 0
      updated: 0
      failed: 0
      prior_updated: 0
      noscrub: 0
      igif: 0
      success_count: 0
      run_time: 0 seconds
      average_speed: 0 objects/sec
      real-time_speed: N/A
      current_position: N/A

      The debug log shows that MDT 0 sent an UPDATE_OBJ OBJ_ATTR_GET RPC to MDT 1. The FID was found in the OI but the ino was (naturally) stale:

      00000004:00000002:0.0:1369882480.737242:0:7229:0:(osd_handler.c:226:osd_iget()) unmatched inode: ino = 102, gen0 = 2698313523, gen1 = 294820613

      According to osd_fid_lookup(), OI scrubbing is not triggered in this case.

      Attachments

        Issue Links

          Activity

            [LU-3420] OI scrubbing could not automatically engage after restoring a secondary MDT from a (file-level) backup

            The patch has been landed to Lustre-2.5

            yong.fan nasf (Inactive) added a comment - The patch has been landed to Lustre-2.5

            I have made a patch to fix it:
            http://review.whamcloud.com/#change,6515

            Related reason has been described in the patch commit message.

            yong.fan nasf (Inactive) added a comment - I have made a patch to fix it: http://review.whamcloud.com/#change,6515 Related reason has been described in the patch commit message.

            Andreas, all MDTs (MDSCOUNT=2, so both MDT 0 and 1) were backed up and restored during the test. The problem, as far as I discussed with Fan Yong yesterday, was on MDT 1---the direct FID lookup (without a prior name lookup) does not trigger OI scrubbing.

            liwei Li Wei (Inactive) added a comment - Andreas, all MDTs (MDSCOUNT=2, so both MDT 0 and 1) were backed up and restored during the test. The problem, as far as I discussed with Fan Yong yesterday, was on MDT 1---the direct FID lookup (without a prior name lookup) does not trigger OI scrubbing.

            Fan Yong, I understand that remote directory checking for DNE MDTs is part of LFSCK Phase III, but could you please investigate what work would be needed to fix the file-level backup/restore?

            Li Wei, do you know if this is a problem on mdt0 or mdt1? Were both of them backed up and restored, or just mdt1?

            adilger Andreas Dilger added a comment - Fan Yong, I understand that remote directory checking for DNE MDTs is part of LFSCK Phase III, but could you please investigate what work would be needed to fix the file-level backup/restore? Li Wei, do you know if this is a problem on mdt0 or mdt1? Were both of them backed up and restored, or just mdt1?

            This and LU-3332 depends on each other.

            liwei Li Wei (Inactive) added a comment - This and LU-3332 depends on each other.

            CC'ed Wang Di and Fan Yong.

            liwei Li Wei (Inactive) added a comment - CC'ed Wang Di and Fan Yong.

            Attached the debug log. Note that this was a single-node setup.

            liwei Li Wei (Inactive) added a comment - Attached the debug log. Note that this was a single-node setup.

            People

              yong.fan nasf (Inactive)
              liwei Li Wei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: