[LU-5241] 2.4.3<->2.5.2 interop: sanity-lfsck test_0: FAIL: (9) Expect 'completed', but got 'scanning-phase1' Created: 21/Jun/14 Updated: 14/Jun/15 Resolved: 04/Dec/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.2, Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.5.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jian Yu | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn4 | ||
| Environment: |
Lustre client build: https://build.hpdd.intel.com/job/lustre-b2_4/73/ (2.4.3) |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14614 | ||||||||
| Description |
|
sanity-lfsck test 0 failed as follows: Started LFSCK on the device lustre-MDT0000: namespace. CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace name: lfsck_namespace magic: 0xa0629d03 version: 2 status: scanning-phase1 flags: param: time_since_last_completed: N/A time_since_latest_start: 1 seconds time_since_last_checkpoint: N/A latest_start_position: 13, N/A, N/A last_checkpoint_position: N/A, N/A, N/A first_failure_position: N/A, N/A, N/A checked_phase1: 0 checked_phase2: 0 updated_phase1: 0 updated_phase2: 0 failed_phase1: 0 failed_phase2: 0 dirs: 0 M-linked: 0 nlinks_repaired: 0 lost_found: 0 success_count: 0 run_time_phase1: 0 seconds run_time_phase2: 0 seconds average_speed_phase1: 0 items/sec average_speed_phase2: N/A real-time_speed_phase1: 0 items/sec real-time_speed_phase2: N/A current_position: 12, N/A, N/A CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace CMD: shadow-4vm4 /usr/sbin/lctl lfsck_stop -M lustre-MDT0000 Stopped LFSCK on the device lustre-MDT0000. CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace CMD: shadow-4vm4 /usr/sbin/lctl lfsck_start -M lustre-MDT0000 -t namespace Started LFSCK on the device lustre-MDT0000: namespace. CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace CMD: shadow-4vm4 /usr/sbin/lctl set_param fail_loc=0 fail_loc=0 CMD: shadow-4vm4 /usr/sbin/lctl set_param fail_val=0 fail_val=0 CMD: shadow-4vm4 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_namespace sanity-lfsck test_0: @@@@@@ FAIL: (9) Expect 'completed', but got 'scanning-phase1' Maloo report: https://maloo.whamcloud.com/test_sets/8d110b74-f903-11e3-9283-52540035b04c |
| Comments |
| Comment by Jian Yu [ 21/Jun/14 ] |
|
Hi Nasf, Could you please take a look at the failure to see whether this is an issue on Lustre b2_5 side? Thanks. |
| Comment by nasf (Inactive) [ 22/Jun/14 ] |
|
The failure is related with the following test scripts: do_facet $SINGLEMDS $LCTL set_param fail_loc=0
do_facet $SINGLEMDS $LCTL set_param fail_val=0
sleep 3
STATUS=$($SHOW_NAMESPACE | awk '/^status/ { print $2 }')
[ "$STATUS" == "completed" ] ||
error "(9) Expect 'completed', but got '$STATUS'"
From the test log, I cannot find any abnormal cases to indicate potential Lustre bugs. Instead, I suspect that it is related with the "sleep 3". Because "sleep 3" is an average estimated time that the LFSCK can finish the scanning, but such estimation may be affected by kinds of facts, such as VM scheduler trouble. We have improved the test scripts in the master as following: do_facet $SINGLEMDS $LCTL set_param fail_loc=0 fail_val=0
wait_update_facet $SINGLEMDS "$LCTL get_param -n \
mdd.${MDT_DEV}.lfsck_namespace |
awk '/^status/ { print \\\$2 }'" "completed" 6 || {
$SHOW_NAMESPACE
error "(9) unexpected status"
}
So if possible, we should back-port the patch http://review.whamcloud.com/9704 to b2_5 and b2_4. Such patch improved the sanity-scrub/sanity-lfsck test scripts. |
| Comment by Peter Jones [ 22/Jun/14 ] |
|
Thanks Fanyong. Emoly, could you please make the appropriate change to the test for b2_5 and b2_4? |
| Comment by Emoly Liu [ 27/Jun/14 ] |
|
The backported patch to b2_5 is here: http://review.whamcloud.com/#/c/10818/ |
| Comment by Emoly Liu [ 22/Aug/14 ] |
|
This problem is being blocked by |
| Comment by Jian Yu [ 20/Sep/14 ] |
|
The back-ported patch for Lustre b2_5 branch http://review.whamcloud.com/10818 was updated to depend on http://review.whamcloud.com/11006, which is the patch for |
| Comment by Gerrit Updater [ 04/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/10818/ |
| Comment by Peter Jones [ 04/Dec/14 ] |
|
Landed for 2.5.4 |