[LU-5833] sanity-lfsck test_6b: namespace lfsck completed unexpectedly Created: 31/Oct/14 Updated: 19/Nov/14 Resolved: 19/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 16356 |
| Description |
|
This issue was created by maloo for nasf <fan.yong@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/29838f2c-60fb-11e4-a66b-5254006e85c2. The sub-test test_6b failed with the following error: (6) Expect 'scanning-phase1', but got 'completed' Please provide additional information about the failure here. Info required for matching: sanity-lfsck 6b |
| Comments |
| Comment by nasf (Inactive) [ 31/Oct/14 ] |
|
According to the log on MDS, when the namespace LFSCK started (resume from former run) for the last time, it seemed that the low layer iteration did not return more objects, as to the injected failure stub (OBD_FAIL_LFSCK_DELAY2) has not been triggered as expected, so there was no delay, so the LFSCK completed quickly. I have met such situation before. Although I did not catch the root reason, it should not related with the patch http://review.whamcloud.com/#/c/11848/14, because it has ever happened without this patch. 00000004:00000080:0.0:1414753244.871791:0:9874:0:(mdt_handler.c:5682:mdt_iocontrol()) handling ioctl cmd 0xc00866e6 00100000:10000000:1.0:1414753245.981101:0:9876:0:(lfsck_engine.c:1620:lfsck_assistant_engine()) lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread start 00100000:10000000:1.0:1414753245.981150:0:9875:0:(lfsck_namespace.c:3966:lfsck_namespace_prep()) lustre-MDT0000-osd: namespace LFSCK prep done, start pos [732, [0x200000bd4:0xdf:0x0], 0xa6f862b9510000]: rc = 0 00100000:10000000:1.0:1414753245.981685:0:9875:0:(lfsck_namespace.c:4181:lfsck_namespace_post()) lustre-MDT0000-osd: namespace LFSCK post done: rc = 0 00100000:10000000:1.0:1414753245.981695:0:9876:0:(lfsck_engine.c:1691:lfsck_assistant_engine()) lustre-MDT0000-osd: lfsck_namespace LFSCK assistant thread post |
| Comment by nasf (Inactive) [ 31/Oct/14 ] |
|
Here is another failure instance without the patch 11848: |
| Comment by nasf (Inactive) [ 03/Nov/14 ] |
|
Another failure instance with the patch 11848: |
| Comment by nasf (Inactive) [ 03/Nov/14 ] |
|
Inside the lfsck_prep(), the returned value from lfsck_open_dir() was not properly handled before returned back to the caller. For example: if the LFSCK arrived at the end of current directory when call lfsck_open_dir(), then the lfsck_open_dir() will return positive number, if the lfsck_prep() continuously returns such value to its caller, then the whole LFSCK first-stage scanning will be guarded as done by wrong. Here is the patch to fix that: |
| Comment by Gerrit Updater [ 19/Nov/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12533/ |