HSM _not only_ small fixes and to do list goes here
(LU-3647)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.7.0 |
| Type: | Technical task | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HSM | ||
| Issue Links: |
|
||||||||||||||||
| Rank (Obsolete): | 9973 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Minh Diep <minh.diep@intel.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/3c70376c-0fbb-11e3-bb21-52540035b04c. The sub-test test_251 failed with the following error:
== sanity-hsm test 251: Coordinator request timeout == 00:59:34 (1377676774) |
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 28/Aug/13 ] |
|
patch is at http://review.whamcloud.com/7484 |
| Comment by Jian Yu [ 29/Aug/13 ] |
|
More instance: |
| Comment by Bobbie Lind (Inactive) [ 03/Sep/13 ] |
|
Another instance just incase it's needed. https://maloo.whamcloud.com/test_sets/9a710002-0fcb-11e3-a63c-52540035b04c |
| Comment by nasf (Inactive) [ 09/Sep/13 ] |
|
Another failure instance: https://maloo.whamcloud.com/test_sets/a4ed5d50-189f-11e3-aa54-52540035b04c |
| Comment by James Nunez (Inactive) [ 26/Sep/13 ] |
|
Reopening tickets due to 'No space left on device' failures seen in sanity-hsm again. All tests fail with error 'request on sanity-hsm is not @@@@@@'. https://maloo.whamcloud.com/test_sets/16a0a15e-2639-11e3-8d26-52540035b04c - tests 28, 104, 110b and 251 and |
| Comment by Jinshan Xiong (Inactive) [ 26/Sep/13 ] |
|
Hi James, Does your environment include the patch in
Jinshan |
| Comment by James Nunez (Inactive) [ 26/Sep/13 ] |
|
Yes, I'm using 2.4.93 build # 1687 and that includes the |
| Comment by Jinshan Xiong (Inactive) [ 30/Sep/13 ] |
|
After checking with James, it turned out there was an issue with setup. I will close it again. |
| Comment by Doug Oucharek (Inactive) [ 05/Dec/13 ] |
|
I just looked at a Maloo failure which has test_251 failing this way. Maloo stats indicate only a 50% pass rate right now. Looks like this issue is back. Here is the Maloo failure I was looking at: https://maloo.whamcloud.com/test_sessions/250897b4-5d1c-11e3-956b-52540035b04c |
| Comment by Bruno Faccini (Inactive) [ 09/Dec/13 ] |
|
Again the root cause of the "sanity-hsm test_251: @@@@@@ FAIL: request on sanity-hsm is not @@@@@@" symptom/failure for this ticket, is the "dd: writing `/mnt/lustre2/d0.sanity-hsm/d251/f.sanity-hsm.251': No space left on device" error. Since 2013-09-25 21:03:00, date of last problem's occurrence before Change #7484 has landed, there are only 5 occurrences reported by Maloo stats since 2013-12-05 13:03:17, 2 on client-26vm2 and 3 on client-26vm6. |
| Comment by Bruno Faccini (Inactive) [ 16/Dec/13 ] |
|
More occurrences and still/only on client-26vm* where Lustre file-system size appear much lower than on other Nodes and likely to fill when creating a big/103MB file in test_251 : bruno@brent:~$ ssh root@client-26vm6 df /mnt/lustre
root@client-26vm6's password:
Filesystem 1K-blocks Used Available Use% Mounted on
client-26vm3@tcp:/lustre
1464484 191516 1194876 14% /mnt/lustre
bruno@brent:~$ ssh root@client-27vm6 df /mnt/lustre
root@client-27vm6's password:
Filesystem 1K-blocks Used Available Use% Mounted on
client-27vm3@tcp:/lustre
14449456 797740 12917724 6% /mnt/lustre
bruno@brent:~$
Opened TEI-1289 for this issue. |
| Comment by Andreas Dilger [ 26/Dec/13 ] |
|
This subtest has been disabled at the autotest level, so a regular test run will skip it. You need to explicitly request testing on the subtest - hopefully Test-Parameters works. |
| Comment by Bruno Faccini (Inactive) [ 27/Dec/13 ] |
|
Oops, sorry but I think I forgot to indicate that I have created TEI-1289 to address client-26vm* very small sized Lustre filesystem issue. |
| Comment by Bruno Faccini (Inactive) [ 04/Jan/14 ] |
|
Andreas, I am not sure that test_251 has been "fully" disabled at the autotest level because I still see runs of it for recent patch submissions, and there has been at least one more failure for this same issue reported on December 28th at https://maloo.whamcloud.com/test_sets/87584406-6fbd-11e3-9a1b-52540035b04c. Thus, I will raise priority of TEI-1289 to get some update. |
| Comment by Andreas Dilger [ 24/Jan/14 ] |
|
the sanity-hsm test_251 may only be disabled for tests on master, which is ok I think. Could you please submit a patch to add it to ALWAYS_EXCEPT in the script. When that lands we can remove it from the autotest config so that it will be possible to re-enable it for any patch that is trying to fix the problem. |
| Comment by Bruno Faccini (Inactive) [ 27/Jan/14 ] |
|
Patch to disable test_251 internally in sanity-hsm has been pushed at http://review.whamcloud.com/9014. |
| Comment by Bruno Faccini (Inactive) [ 11/Mar/14 ] |
|
Requested Gerrit gate-keeper to land patch #9014, next actions will be to re-allow test_251 in autotest config (thru a new TEI?), and then to push a patch that find a way to handle this "small fs size vs file big enough for timing need" requirement, and also to re-allow test_251 (and also test_[200,221,223b] disabled for the same issue in |
| Comment by Jodi Levi (Inactive) [ 12/Mar/14 ] |
|
Test has been retriggered: unclear what happened to results |
| Comment by Bruno Faccini (Inactive) [ 21/Mar/14 ] |
|
Patch #9014 to disable test_251 internally in sanity-hsm has landed. So now, will be able to work on a definitive patch to strengthen/protect test_251 vs too small-sized Lustre FS. |
| Comment by Bruno Faccini (Inactive) [ 28/Oct/14 ] |
|
Sorry to have been soooo late on this ... |
| Comment by Bruno Faccini (Inactive) [ 02/Dec/14 ] |
|
I forgot to indicate that change #12456 also re-enables sanity-hsm/test_[200,221,223b,251] sub-tests. |
| Comment by Gerrit Updater [ 17/Dec/14 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12456/ |
| Comment by Bruno Faccini (Inactive) [ 07/Jan/15 ] |
|
Patch has landed. |
| Comment by Gerrit Updater [ 19/Feb/15 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13803 |