[LU-924] Test failure on test suite recovery-small, subtest test_105 Created: 14/Dec/11 Updated: 29/Jun/12 Resolved: 04/Jan/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.2.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4788 | ||||||||
| Description |
|
This issue was created by maloo for Chris Gearing <chris@whamcloud.com> T This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/467f4a76-25fd-11e1-ae7f-5254004bbbd3. The sub-test test_105 failed with the following error:
Info required for matching: recovery-small 105 |
| Comments |
| Comment by Peter Jones [ 14/Dec/11 ] |
|
Jinshan in looking into this one? |
| Comment by Jinshan Xiong (Inactive) [ 14/Dec/11 ] |
|
The problem is clear: the client data was not written into persistent storage before failing over OSTs. From the log, client UUID 8f9b83bd-22e0-c555-20a1-fe22e2547e58 was the old one, and in the test case, we've remounted it new UUID 7419f6fe-3688-8bc3-9560-7367776851cd. However, it didn't update storage in time so when OST was up, it still waited for old client. |
| Comment by Chris Gearing (Inactive) [ 15/Dec/11 ] |
|
Hi Jinshan,
Can you elaborate please, autotest very much relies on lvm so I'm somewhat concerned. |
| Comment by Niu Yawei (Inactive) [ 16/Dec/11 ] |
We always sync write client data to disk, so I don't see why it wasn't updated. The test failed because the client (which invoke test) was evicted, this client isn't the remounted one, am I right? |
| Comment by Chris Gearing (Inactive) [ 16/Dec/11 ] |
|
I don't know the answer to the question about which client was evicted and remounted I'm affraid. Perhaps Jinshan has more knowledge of the test. |
| Comment by Niu Yawei (Inactive) [ 19/Dec/11 ] |
|
After looking closer code, I realized that we didn't sync write client data to disk, but just set exp_need_sync = 1 to notify following operation to sync the client data, which could probably is to avoid sync write flood when there are many many clients connecting. And the log shows that the df client (which invoking the test) has been evicted by ost-00000, since ost-0000 didn't find old export in disk and it regards df client as a new client, then the df was not resend after recovery, and result in test failuer at the end. I think we'd better make sure that the client data of df client is synced to disk (otherwise, it could be evicted as a new client during recovery), so there are probably two ways to achieve it:
Jinshan, what's your opinion? |
| Comment by Niu Yawei (Inactive) [ 19/Dec/11 ] |
|
patch for master: http://review.whamcloud.com/1888 |
| Comment by Jinshan Xiong (Inactive) [ 21/Dec/11 ] |
|
Though we have talked this over skype, I write it here as a record. I think sync seems fine to me. Sorry for delay response. |
| Comment by Peter Jones [ 04/Jan/12 ] |
|
Landed for 2.2 |
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 04/Jan/12 ] |
|
Integrated in Result = SUCCESS
|