Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2142

"lctl lfsck_start" should start a scrub

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.4.0
    • None
    • 3
    • 5150

    Description

      Running "lctl lfsck_start -M

      {fsname}

      -MDT0000" should start a scrub, unless one is already running. However, if the scrub was previously run and completed (leaving last_checkpoint_position == inode_count, it appears a new scrub will not be run because the start position is not reset at the end of the previous lfsck run or the start of the new run:

      latest_start_position: 143392770
      last_checkpoint_position: 143392769
      

      It makes sense to restart the scrub at the last checkpoint position if it didn't complete for some reason, but if latest_start_position >= inode_count then the start position should be reset to start again. Both Cliff and I were confused by the current behaviour, and it took us a while to determine that "-r" was needed, and I expect that most users will have the same problem. The "-r" option should only be needed in case the admin has to handle some unusual condition where a previous scrub was interrupted, but a new full scrub is desired.

      Attachments

        Activity

          [LU-2142] "lctl lfsck_start" should start a scrub

          Fan Yong, you are correct. This patch fixes the problem. I must have been testing on my system after rebuilding, but not reloading the modules.

          adilger Andreas Dilger added a comment - Fan Yong, you are correct. This patch fixes the problem. I must have been testing on my system after rebuilding, but not reloading the modules.

          This is the output from myself test against the latest master branch (top ID I2ff03a611267292d0cd6a465c1eb14023516234b), containing the patch for LU-2142 (ID I5b8e9ee51ccbf95ed131b963389c4ecfb92b9035):

          [root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub 
          name: OI scrub
          magic: 0x4c5fd252
          oi_files: 64
          status: init
          flags:
          param:
          time_since_last_completed: N/A
          time_since_latest_start: N/A
          time_since_last_checkpoint: N/A
          latest_start_position: N/A
          last_checkpoint_position: N/A
          first_failure_position: N/A
          checked: 0
          updated: 0
          failed: 0
          prior_updated: 0
          noscrub: 0
          igif: 0
          success_count: 0
          run_time: 0 seconds
          average_speed: 0 objects/sec
          real-time_speed: N/A
          current_position: N/A
          [root@RHEL6-nasf-CSW tests]# ../utils/lctl lfsck_start -M lustre-MDT0000
          Started LFSCK on the MDT device lustre-MDT0000.
          [root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub 
          name: OI scrub
          magic: 0x4c5fd252
          oi_files: 64
          status: completed
          flags:
          param:
          time_since_last_completed: 3 seconds
          time_since_latest_start: 3 seconds
          time_since_last_checkpoint: 3 seconds
          latest_start_position: 11
          last_checkpoint_position: 100001
          first_failure_position: N/A
          checked: 206
          updated: 0
          failed: 0
          prior_updated: 0
          noscrub: 38
          igif: 168
          success_count: 1
          run_time: 0 seconds
          average_speed: 206 objects/sec
          real-time_speed: N/A
          current_position: N/A
          [root@RHEL6-nasf-CSW tests]# 
          [root@RHEL6-nasf-CSW tests]# 
          [root@RHEL6-nasf-CSW tests]# 
          [root@RHEL6-nasf-CSW tests]# ../utils/lctl lfsck_start -M lustre-MDT0000
          Started LFSCK on the MDT device lustre-MDT0000.
          [root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub 
          name: OI scrub
          magic: 0x4c5fd252
          oi_files: 64
          status: completed
          flags:
          param:
          time_since_last_completed: 1 seconds
          time_since_latest_start: 1 seconds
          time_since_last_checkpoint: 1 seconds
          latest_start_position: 11
          last_checkpoint_position: 100001
          first_failure_position: N/A
          checked: 206
          updated: 0
          failed: 0
          prior_updated: 0
          noscrub: 0
          igif: 206
          success_count: 2
          run_time: 0 seconds
          average_speed: 206 objects/sec
          real-time_speed: N/A
          current_position: N/A
          [root@RHEL6-nasf-CSW tests]# 
          

          As you can see, repeatedly run OI scrub by "lctl lfsck_start" can repeatedly trigger OI scrub as our expectation. The condition to re-trigger OI scrub is: the former OI scrub completed "status: completed".

          You can judge whether the OI scrub re-triggered by checking the item "checked:": if it is "0", then no re-triggered; otherwise, it is re-triggered.

          On the other hand, the OI scrub may skip the new created inodes (only once) since last OI scrub run. So the item "checked:" may be not the same as the real allocated inodes count.

          So Andreas, would you please describe in detail which operations you did to reproduce the issues? Then I can analysis what happened. Thanks!

          yong.fan nasf (Inactive) added a comment - This is the output from myself test against the latest master branch (top ID I2ff03a611267292d0cd6a465c1eb14023516234b), containing the patch for LU-2142 (ID I5b8e9ee51ccbf95ed131b963389c4ecfb92b9035): [root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub name: OI scrub magic: 0x4c5fd252 oi_files: 64 status: init flags: param: time_since_last_completed: N/A time_since_latest_start: N/A time_since_last_checkpoint: N/A latest_start_position: N/A last_checkpoint_position: N/A first_failure_position: N/A checked: 0 updated: 0 failed: 0 prior_updated: 0 noscrub: 0 igif: 0 success_count: 0 run_time: 0 seconds average_speed: 0 objects/sec real-time_speed: N/A current_position: N/A [root@RHEL6-nasf-CSW tests]# ../utils/lctl lfsck_start -M lustre-MDT0000 Started LFSCK on the MDT device lustre-MDT0000. [root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub name: OI scrub magic: 0x4c5fd252 oi_files: 64 status: completed flags: param: time_since_last_completed: 3 seconds time_since_latest_start: 3 seconds time_since_last_checkpoint: 3 seconds latest_start_position: 11 last_checkpoint_position: 100001 first_failure_position: N/A checked: 206 updated: 0 failed: 0 prior_updated: 0 noscrub: 38 igif: 168 success_count: 1 run_time: 0 seconds average_speed: 206 objects/sec real-time_speed: N/A current_position: N/A [root@RHEL6-nasf-CSW tests]# [root@RHEL6-nasf-CSW tests]# [root@RHEL6-nasf-CSW tests]# [root@RHEL6-nasf-CSW tests]# ../utils/lctl lfsck_start -M lustre-MDT0000 Started LFSCK on the MDT device lustre-MDT0000. [root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub name: OI scrub magic: 0x4c5fd252 oi_files: 64 status: completed flags: param: time_since_last_completed: 1 seconds time_since_latest_start: 1 seconds time_since_last_checkpoint: 1 seconds latest_start_position: 11 last_checkpoint_position: 100001 first_failure_position: N/A checked: 206 updated: 0 failed: 0 prior_updated: 0 noscrub: 0 igif: 206 success_count: 2 run_time: 0 seconds average_speed: 206 objects/sec real-time_speed: N/A current_position: N/A [root@RHEL6-nasf-CSW tests]# As you can see, repeatedly run OI scrub by "lctl lfsck_start" can repeatedly trigger OI scrub as our expectation. The condition to re-trigger OI scrub is: the former OI scrub completed "status: completed". You can judge whether the OI scrub re-triggered by checking the item "checked:": if it is "0", then no re-triggered; otherwise, it is re-triggered. On the other hand, the OI scrub may skip the new created inodes (only once) since last OI scrub run. So the item "checked:" may be not the same as the real allocated inodes count. So Andreas, would you please describe in detail which operations you did to reproduce the issues? Then I can analysis what happened. Thanks!

          Fan Yong, was there another patch landed? It seemed in my testing that this didn't actually fix the problem. As previously stated, it appears that LFSCK is starter, but since the starting inode is not reset, them LFSCK immediately exits without doing anything...

          Running "lfsck -r" appears to actually runs check, but not "lfsck" by itself does not appear to start a new scrub.

          adilger Andreas Dilger added a comment - Fan Yong, was there another patch landed? It seemed in my testing that this didn't actually fix the problem. As previously stated, it appears that LFSCK is starter, but since the starting inode is not reset, them LFSCK immediately exits without doing anything... Running "lfsck -r" appears to actually runs check, but not "lfsck" by itself does not appear to start a new scrub.
          yong.fan nasf (Inactive) added a comment - - edited

          The issue has been fixed as Andreas suggestion.

          yong.fan nasf (Inactive) added a comment - - edited The issue has been fixed as Andreas suggestion.

          I tested this patch by hand (on master, where it was landed after b2_3 where I assumed it had been tested), but it doesn't appear to have fixed lctl lfsck_start to actually run a scrub when asked. It now reports "Started LFSCK" every time:

          # lctl lfsck_start -M testfs-MDT0000 -s 4
          Started LFSCK on the MDT device testfs-MDT0000.
          

          But it doesn't actually seem to run a scrub (-s 4 to make the scrub slow enough to watch:

          lime_since_last_completed: 5 seconds
          time_since_latest_start: 5 seconds
          time_since_last_checkpoint: 5 seconds
          latest_start_position: 50002
          last_checkpoint_position: 50001
          success_count: 17
          run_time: 32 seconds
          

          It resets the start time, but not latest_start_position or the run time, so the scrub takes zero seconds to "finish" but doesn't actually do anything. Running with the "-r" option does seem to start a full scrub:

          time_since_last_completed: 88 seconds
          time_since_latest_start: 10 seconds
          time_since_last_checkpoint: 10 seconds
          latest_start_position: 11
          last_checkpoint_position: N/A
          run_time: 10 seconds
          

          But I would think that lctl lfsck_start should actually start a scrub, like the command is called, instead of only doing so if -r is given. If there is already a scrub running, it should continue to run, but if one is not running a new full scrub should be started...

          Seems the patch isn't quite working yet.

          adilger Andreas Dilger added a comment - I tested this patch by hand (on master, where it was landed after b2_3 where I assumed it had been tested), but it doesn't appear to have fixed lctl lfsck_start to actually run a scrub when asked. It now reports "Started LFSCK" every time: # lctl lfsck_start -M testfs-MDT0000 -s 4 Started LFSCK on the MDT device testfs-MDT0000. But it doesn't actually seem to run a scrub ( -s 4 to make the scrub slow enough to watch: lime_since_last_completed: 5 seconds time_since_latest_start: 5 seconds time_since_last_checkpoint: 5 seconds latest_start_position: 50002 last_checkpoint_position: 50001 success_count: 17 run_time: 32 seconds It resets the start time, but not latest_start_position or the run time, so the scrub takes zero seconds to "finish" but doesn't actually do anything. Running with the "-r" option does seem to start a full scrub: time_since_last_completed: 88 seconds time_since_latest_start: 10 seconds time_since_last_checkpoint: 10 seconds latest_start_position: 11 last_checkpoint_position: N/A run_time: 10 seconds But I would think that lctl lfsck_start should actually start a scrub, like the command is called, instead of only doing so if -r is given. If there is already a scrub running, it should continue to run, but if one is not running a new full scrub should be started... Seems the patch isn't quite working yet.
          yong.fan nasf (Inactive) added a comment - Patch for master: http://review.whamcloud.com/#change,4250
          adilger Andreas Dilger added a comment - Patch for b2_3 is http://review.whamcloud.com/4252

          If the OI scrub scanning policy is adjusted as above, we need consider more. For example:

          The last OI scrub scanning finished at the ino# 100'000. And then some new file is created, its ino# may be larger than the last OI scrub finished position, such as 100'001, it also may reuse some deleted inode, so the ino# may be smaller than the last OI scrub finished position, such as 50,001. Under such case, if the system admin run OI scrub again, it will cause different OI scrub behavior: for former one, it is continue scan from 100'000, and finished at the ino# 100'001; for later one, it will reset the scanning from the device beginning, and re-scan the whole MDT device. So from the sysadmin view, the OI scrub behavior become unpredictable. I do not think it is expected.

          So I suggest to use "-r" explicitly to reset the scanning position. If someone wants to re-run OI scrub before former instance finished, he/she can stop current OI scrub explicitly by "lctl lfsck_stop" firstly, then runs OI scrub again by "lctl lfsck_start -r". I do not think it is so trouble.

          yong.fan nasf (Inactive) added a comment - If the OI scrub scanning policy is adjusted as above, we need consider more. For example: The last OI scrub scanning finished at the ino# 100'000. And then some new file is created, its ino# may be larger than the last OI scrub finished position, such as 100'001, it also may reuse some deleted inode, so the ino# may be smaller than the last OI scrub finished position, such as 50,001. Under such case, if the system admin run OI scrub again, it will cause different OI scrub behavior: for former one, it is continue scan from 100'000, and finished at the ino# 100'001; for later one, it will reset the scanning from the device beginning, and re-scan the whole MDT device. So from the sysadmin view, the OI scrub behavior become unpredictable. I do not think it is expected. So I suggest to use "-r" explicitly to reset the scanning position. If someone wants to re-run OI scrub before former instance finished, he/she can stop current OI scrub explicitly by "lctl lfsck_stop" firstly, then runs OI scrub again by "lctl lfsck_start -r". I do not think it is so trouble.

          People

            yong.fan nasf (Inactive)
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: