[LU-14358] interop: sanity-pcc and sanity-flr tests fail with ‘cannot open volatile file’on the MDS Created: 22/Jan/21 Updated: 05/Feb/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Qian Yingjin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | interop | ||
| Environment: |
2.13.0 servers and master clients with Lustre version>= 2.13.55.16 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
A variety of sanity-pcc tests fail with different error messages, but all have the following in the MDS console log [67028.845779] LustreError: 3916:0:(mdt_open.c:1613:mdt_reint_open()) lustre-MDT0000: cannot open volatile file [0x2000766a8:0x4:0x0], orphan file will be left in PENDING directory until next reboot, rc = -2 and the action on that file does not complete. This issue has only been seen in interop testing for 2.13.0 servers and master clients with Lustre version>= 2.13.55.16. It looks like this error message and failures started on 08 AUG 2020 with failures in sanity-pcc and sanity-flr in the test session https://testing.whamcloud.com/test_sessions/7175020e-210c-436a-a2a6-be91c6c12aad A few examples of this failure are: 2021-01-07 Lustre server 2.13.0 and Lustre client 2.13.57.44 - https://testing.whamcloud.com/test_sets/c445c494-4d76-4185-9554-f5eb131d5b03 2021-12-25 Lustre server 2.13.0 and Lustre client 2.13.57.12 - https://testing.whamcloud.com/test_sets/3f774e46-eb0f-4b4b-90e5-a70a48ae9c56 We do see this error message occasionally in ost-pools test 28 and the test passes: |
| Comments |
| Comment by Gerrit Updater [ 22/Jan/21 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41301 |
| Comment by James Nunez (Inactive) [ 28/Jan/21 ] |
|
The question was asked if we see this error using 2.13.0 for both servers and clients. In patch https://review.whamcloud.com/#/c/41301/, we tried to reproduce this error using 2.13.0 clients and servers, but, so far, we can't reproduce this error. |
| Comment by John Hammond [ 28/Jan/21 ] |
|
https://testing.whamcloud.com/test_sessions/7175020e-210c-436a-a2a6-be91c6c12aad Earlier in the same session I see sanity 185 passing and sanity-hsm passes. This needs to be reproduced with trace enabled on the MDT. Unfortunately the MDT debug logs on for sanity-pcc text_1c are useless due to 829055 messages of the form: 0000004:00020000:0.0:1596889762.316083:0:25563:0:(mdd_orphans.c:329:mdd_orphan_destroy()) lustre-MDD0000: could not delete orphan [0x1d6:0xf096f8eb:0x0]: rc = -2 00000004:00080000:0.0:1596889762.316093:0:25563:0:(mdd_orphans.c:376:mdd_orphan_key_test_and_delete()) Found orphan [0x1d6:0xf096f8eb:0x0], delete it 00000004:00020000:0.0:1596889762.316098:0:25563:0:(mdd_orphans.c:329:mdd_orphan_destroy()) lustre-MDD0000: could not delete orphan [0x1d6:0xf096f8eb:0x0]: rc = -2 I wonder if this is related. |
| Comment by Peter Jones [ 29/Jan/21 ] |
|
Lai Could you please comment on this one? Thanks Peter |
| Comment by Lai Siyao [ 05/Feb/21 ] |
|
I reproduced it in local system, but I'm not familiar with PCC code and test case, Qian will look into it. |