[LU-7190] sanity-lfsck test_18a: FAIL: (6.1) Expect 1 fixed on mds1, but got: 0 Created: 22/Sep/15 Updated: 12/May/16 Resolved: 06/Oct/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Build: https://build.hpdd.intel.com/job/lustre-master/3179 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
sanity-lfsck test 18a failed as follows: CMD: shadow-17vm12 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.lfsck_layout sanity-lfsck test_18a: @@@@@@ FAIL: (6.1) Expect 1 fixed on mds1, but got: 0 Maloo report: https://testing.hpdd.intel.com/test_sets/bf6d7fc0-5a65-11e5-9147-5254006e85c2 |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 22/Sep/15 ] |
|
Hi Fan Yong, |
| Comment by nasf (Inactive) [ 29/Sep/15 ] |
|
The failure is caused by some unexpected MDT-OST connection issue. During the 2nd phase scanning, the layout LFSCK slave engine on the OST will query the master engine status from the MDT periodically. Sometimes, the query RPC may hit failure that may because network trouble, or the MDS node issues. To make the LFSCK can go ahead, the slave engine will not wait for ever, instead, it will assume the master engine has exited without notifying (or fail to notify) the slave engine. So the slave engine will exit also and clean up the LFSCK environment on the OST, including the OST-object access bitmap that is used to find out orphan OST-objects. On the other hand, the assumption of master engine exit maybe wrong. If the master engine does not exit, and the network trouble between the MDS and OSS recovered after the slave engine exited, then the master engine will try to find out orphan OST-objects during its 2nd phase scanning. But because the slave engine has already exited and released the OST-object access bitmap, the master engine has no way to find out orphan OST-objects. That is why test_18a failed. |
| Comment by Gerrit Updater [ 29/Sep/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/16667 |
| Comment by Gerrit Updater [ 06/Oct/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16667/ |
| Comment by Peter Jones [ 06/Oct/15 ] |
|
Landed for 2.8 |