Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5241

2.4.3<->2.5.2 interop: sanity-lfsck test_0: FAIL: (9) Expect 'completed', but got 'scanning-phase1'

Details

    • 3
    • 14614

    Description

      sanity-lfsck test 0 failed as follows:

      Started LFSCK on the device lustre-MDT0000: namespace.
      CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
      name: lfsck_namespace
      magic: 0xa0629d03
      version: 2
      status: scanning-phase1
      flags:
      param:
      time_since_last_completed: N/A
      time_since_latest_start: 1 seconds
      time_since_last_checkpoint: N/A
      latest_start_position: 13, N/A, N/A
      last_checkpoint_position: N/A, N/A, N/A
      first_failure_position: N/A, N/A, N/A
      checked_phase1: 0
      checked_phase2: 0
      updated_phase1: 0
      updated_phase2: 0
      failed_phase1: 0
      failed_phase2: 0
      dirs: 0
      M-linked: 0
      nlinks_repaired: 0
      lost_found: 0
      success_count: 0
      run_time_phase1: 0 seconds
      run_time_phase2: 0 seconds
      average_speed_phase1: 0 items/sec
      average_speed_phase2: N/A
      real-time_speed_phase1: 0 items/sec
      real-time_speed_phase2: N/A
      current_position: 12, N/A, N/A
      CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
      CMD: shadow-4vm4 /usr/sbin/lctl lfsck_stop -M lustre-MDT0000
      Stopped LFSCK on the device lustre-MDT0000.
      CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
      CMD: shadow-4vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace
      Started LFSCK on the device lustre-MDT0000: namespace.
      CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
      CMD: shadow-4vm4 /usr/sbin/lctl set_param fail_loc=0
      fail_loc=0
      CMD: shadow-4vm4 /usr/sbin/lctl set_param fail_val=0
      fail_val=0
      CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace
       sanity-lfsck test_0: @@@@@@ FAIL: (9) Expect 'completed', but got 'scanning-phase1' 
      

      Maloo report: https://maloo.whamcloud.com/test_sets/8d110b74-f903-11e3-9283-52540035b04c

      Attachments

        Issue Links

          Activity

            [LU-5241] 2.4.3<->2.5.2 interop: sanity-lfsck test_0: FAIL: (9) Expect 'completed', but got 'scanning-phase1'
            pjones Peter Jones added a comment -

            Landed for 2.5.4

            pjones Peter Jones added a comment - Landed for 2.5.4

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10818/
            Subject: LU-5241 tests: speed up sanity-lfsck and sanity-scrub tests
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: 2ab8b98ea5dafbce59043e5d8477e794197116a0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10818/ Subject: LU-5241 tests: speed up sanity-lfsck and sanity-scrub tests Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: 2ab8b98ea5dafbce59043e5d8477e794197116a0
            yujian Jian Yu added a comment -

            The back-ported patch for Lustre b2_5 branch http://review.whamcloud.com/10818 was updated to depend on http://review.whamcloud.com/11006, which is the patch for LU-5248.

            yujian Jian Yu added a comment - The back-ported patch for Lustre b2_5 branch http://review.whamcloud.com/10818 was updated to depend on http://review.whamcloud.com/11006 , which is the patch for LU-5248 .
            emoly.liu Emoly Liu added a comment -

            This problem is being blocked by LU-5248. We should land that fix first.

            emoly.liu Emoly Liu added a comment - This problem is being blocked by LU-5248 . We should land that fix first.
            emoly.liu Emoly Liu added a comment - - edited

            The backported patch to b2_5 is here: http://review.whamcloud.com/#/c/10818/
            The backported patch to b2_4 is here: http://review.whamcloud.com/#/c/10892/

            emoly.liu Emoly Liu added a comment - - edited The backported patch to b2_5 is here: http://review.whamcloud.com/#/c/10818/ The backported patch to b2_4 is here: http://review.whamcloud.com/#/c/10892/
            pjones Peter Jones added a comment -

            Thanks Fanyong. Emoly, could you please make the appropriate change to the test for b2_5 and b2_4?

            pjones Peter Jones added a comment - Thanks Fanyong. Emoly, could you please make the appropriate change to the test for b2_5 and b2_4?

            The failure is related with the following test scripts:

                    do_facet $SINGLEMDS $LCTL set_param fail_loc=0
                    do_facet $SINGLEMDS $LCTL set_param fail_val=0
                    sleep 3
                    STATUS=$($SHOW_NAMESPACE | awk '/^status/ { print $2 }')
                    [ "$STATUS" == "completed" ] ||
                            error "(9) Expect 'completed', but got '$STATUS'"
            

            From the test log, I cannot find any abnormal cases to indicate potential Lustre bugs. Instead, I suspect that it is related with the "sleep 3". Because "sleep 3" is an average estimated time that the LFSCK can finish the scanning, but such estimation may be affected by kinds of facts, such as VM scheduler trouble. We have improved the test scripts in the master as following:

                    do_facet $SINGLEMDS $LCTL set_param fail_loc=0 fail_val=0
                    wait_update_facet $SINGLEMDS "$LCTL get_param -n \
                            mdd.${MDT_DEV}.lfsck_namespace |
                            awk '/^status/ { print \\\$2 }'" "completed" 6 || {
                            $SHOW_NAMESPACE
                            error "(9) unexpected status"
                    }
            

            So if possible, we should back-port the patch http://review.whamcloud.com/9704 to b2_5 and b2_4. Such patch improved the sanity-scrub/sanity-lfsck test scripts.

            yong.fan nasf (Inactive) added a comment - The failure is related with the following test scripts: do_facet $SINGLEMDS $LCTL set_param fail_loc=0 do_facet $SINGLEMDS $LCTL set_param fail_val=0 sleep 3 STATUS=$($SHOW_NAMESPACE | awk '/^status/ { print $2 }' ) [ "$STATUS" == "completed" ] || error "(9) Expect 'completed' , but got '$STATUS' " From the test log, I cannot find any abnormal cases to indicate potential Lustre bugs. Instead, I suspect that it is related with the "sleep 3". Because "sleep 3" is an average estimated time that the LFSCK can finish the scanning, but such estimation may be affected by kinds of facts, such as VM scheduler trouble. We have improved the test scripts in the master as following: do_facet $SINGLEMDS $LCTL set_param fail_loc=0 fail_val=0 wait_update_facet $SINGLEMDS "$LCTL get_param -n \ mdd.${MDT_DEV}.lfsck_namespace | awk '/^status/ { print \\\$2 }' " " completed" 6 || { $SHOW_NAMESPACE error "(9) unexpected status" } So if possible, we should back-port the patch http://review.whamcloud.com/9704 to b2_5 and b2_4. Such patch improved the sanity-scrub/sanity-lfsck test scripts.
            yujian Jian Yu added a comment -

            Hi Nasf,

            Could you please take a look at the failure to see whether this is an issue on Lustre b2_5 side? Thanks.

            yujian Jian Yu added a comment - Hi Nasf, Could you please take a look at the failure to see whether this is an issue on Lustre b2_5 side? Thanks.

            People

              emoly.liu Emoly Liu
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: