[LU-5208] sanity-lfsck test_18c failure: Expect 3 fixed on mds1, but got: 2 Created: 16/Jun/14 Updated: 25/Aug/14 Resolved: 25/Aug/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lfsck | ||
| Environment: |
Lustre 2.5.60 on the OpenSFS cluster, CentOS 6.5 with one server (mds01) with a MGS and MDS with two MDTs, another server (mds02) with MDS and two MDTs, four OSSs with two OSTs each and four clients. |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 14536 | ||||||||||||||||
| Description |
|
Running sanity-lfsck with the stated environment, tests 18c, 18d, 18e, and 19a fail and test 19b hangs. Test results are at https://maloo.whamcloud.com/test_sessions/5ad54b54-f5a5-11e3-b29e-52540035b04c . sanity-lfsck test 18c fails with the error: sanity-lfsck test_18c: @@@@@@ FAIL: (4) Expect 3 fixed on mds1, but got: 2 Right before this test fails, the output from /proc/fs/lustre/mdd/scratch-MDT0000/lfsck_layout on mds01, MDT0, is: name: lfsck_layout magic: 0xb173ae14 version: 2 status: completed flags: param: all_targets,orphan time_since_last_completed: 2912 seconds time_since_latest_start: 2912 seconds time_since_last_checkpoint: 2912 seconds latest_start_position: 0 last_checkpoint_position: 25098 first_failure_position: 0 success_count: 1 repaired_dangling: 0 repaired_unmatched_pair: 0 repaired_multiple_referenced: 0 repaired_orphan: 2 repaired_inconsistent_owner: 0 repaired_others: 0 skipped: 0 failed_phase1: 0 failed_phase2: 0 checked_phase1: 8 checked_phase2: 2 run_time_phase1: 0 seconds run_time_phase2: 0 seconds average_speed_phase1: 8 items/sec average_speed_phase2: 2 objs/sec real-time_speed_phase1: N/A real-time_speed_phase2: N/A current_position: N/A |
| Comments |
| Comment by nasf (Inactive) [ 08/Jul/14 ] |
|
Please refer to the comment in the |
| Comment by James Nunez (Inactive) [ 15/Jul/14 ] |
|
sanity-lfsck test logs for 2.6.0-RC1 are at: https://testing.hpdd.intel.com/test_sessions/5e3c96b0-0c68-11e4-9892-5254006e85c2 . |
| Comment by nasf (Inactive) [ 30/Jul/14 ] |
|
Here is the patch: |
| Comment by James Nunez (Inactive) [ 05/Aug/14 ] |
|
I tried patch 11275 and the test passes, but the output and the comments don't match. From the output of sanity-lfsck test 18c with this patch: Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device scratch-MDT0000: scrub layout There should be some stub under .lustre/lost+found/MDT0001/ ls: cannot access /lustre/scratch/.lustre/lost+found/MDT0001/*-N-0: No such file or directory There should be some stub under .lustre/lost+found/MDT0000/ 216172799310430210 -r-------- 1 root root 2097152 Aug 4 16:42 /lustre/scratch/.lustre/lost+found/MDT0000/[0x300000401:0x2:0x0]-N-0 216172799310430211 -r-------- 1 root root 2097152 Aug 4 16:42 /lustre/scratch/.lustre/lost+found/MDT0000/[0x300000401:0x3:0x0]-N-0 Resetting fail_loc on all nodes...done. PASS 18c (7s) So, the comment expects something to be in $mount/.lustre/lost+found/scratch-MDT0001, but there is no scratch-MDT0001 subdirectory under lost+found. Maybe with the change in this patch to using "$LFS setstripe -c 1", we shouldn't expect anything there to be an MDT0001 subdirectory? I put some debug prints n the test and there is not scratch-MDT0001 subdirectory: Trigger layout LFSCK on all devices to find out orphan OST-object Started LFSCK on the device scratch-MDT0000: scrub layout ls -ail /lustre/scratch/.lustre/lost+found/ total 8 144115188109410307 dr-x------ 3 root root 4096 Aug 4 17:37 . 216172799310430209 drwx------ 3 root root 4096 Aug 4 17:38 MDT0000 ls -ail /lustre/scratch/.lustre/lost+found/MDT* total 4104 216172799310430209 drwx------ 3 root root 4096 Aug 4 17:38 . 144115188109410307 dr-x------ 3 root root 4096 Aug 4 17:37 .. 216172799310430210 -r-------- 1 root root 2097152 Aug 4 17:38 [0x300000401:0x2:0x0]-N-0 216172799310430211 -r-------- 1 root root 2097152 Aug 4 17:38 [0x300000401:0x3:0x0]-N-0 There should be some stub under .lustre/lost+found/MDT0001/ ls: cannot access /lustre/scratch/.lustre/lost+found/MDT0001/*-N-0: No such file or directory |
| Comment by nasf (Inactive) [ 05/Aug/14 ] |
|
It is the test scripts issue, the comment should be "There should NOT be some stub under .lustre/lost+found/MDT0001/". I will update the patch. |
| Comment by nasf (Inactive) [ 25/Aug/14 ] |
|
The patch has been landed to master. |