[LU-16733] recovery-small: cannot remove '/mnt/lustre/d110h.recovery-small' Created: 12/Apr/23 Updated: 20/Jun/23 Resolved: 20/Jun/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Feng Lei |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/d38d511e-7fdf-4ab8-bca4-a3f9d540464f
The test session reports "No sub tests failed in this test set." Test session details: Have seen this failure in a few different patches, unable to clean up at the end: == recovery-small test complete, duration 7270 sec ======= 11:52:10 (1679917930) rm: cannot remove '/mnt/lustre/d110h.recovery-small': Input/output error recovery-small : @@@@@@ FAIL: remove sub-test dirs failed I also saw it with d110i.recovery-small |
| Comments |
| Comment by Andreas Dilger [ 14/Apr/23 ] |
|
flei can you please check if there is some patch that landed recently that is causing this to be hit (or hit more frequently)? It looks like the first (recent) hit was 2023-03-27 (ver 2.15.54.114) but on a patch that hasn't landed yet. There was also a single hit on 2023-01-19 (ver 2.15.53.56 full testing, so no patch), but it complained about d110j. The problem has definitely been hit much more recently since 2023-04-04. This Maloo search shows all of the failures tagged with LU-16733, since it isn't otherwise possible to search for "no failure", at least until patch https://review.whamcloud.com/49582 lands. The patches landed after 2023-03-26 and before 2023-03-30 are: # git log --after 2023-03-25 --before 2023-03-30 --oneline 7c52cbf65218 LU-16515 tests: disable sanity test_118c/118d a7222127c7a6 LU-16642 tests: improve sanity-sec test_61 8f40a3d7110d LU-16639 misc: cleanup concole messages e998d21caf99 LU-16589 tests: add sanity/31l to test ln command 17bbf5bdd6f9 LU-930 docs: fix whatis output 36cbba150bce LU-16632 tests: more margin of error for sanity/56xh 91a3726f313d LU-16633 obdclass: fix rpc slot leakage 12c34651994b LU-14291 batch: don't include lustre_update.h for client only builds d5b26443a3d3 LU-16615 utils: add messages in l_getidentity b30f825232cb LU-16601 kernel: update SLES15 SP4 [5.14.21-150400.24.46.1] 8f004bc53b1a LU-16599 obdclass: job_stats can parse escaped jobid string fc7a0d6013b4 LU-14668 lnet: add 'lock_prim_nid" lnet module parameter f5293fb66e79 LU-16598 osp: cleanup comment in osp_sync.c 5e24b374f7bd LU-16595 test: save one second in wait_destroy_complete() da230373bd14 LU-16563 lnet: use discovered ni status to set initial health 0366422cfd1e LU-16221 kernel: update RHEL 9.1 [5.14.0-162.18.1.el9_1] 2d40d96b4ec8 LU-15053 tests: reset quota if ENABLE_QUOTA=1 7e893c70955d LU-16382 build: udev files in /usr/lib b33808d3aebb LU-16338 readahead: clip readahead with kms ccee6b92ec4d LU-13107 utils: remove duplicate lctl erase/fork_lcfg 2471d35c0e0e LU-16217 iokit: Add lst.sh wrapper and lst-survey bdbc7f9f42b9 LU-12805 tests: disable replay-single/36 73ee638813a8 LU-16604 kfilnd: kfilnd_peer ref leak on send 6fab1fe4a5c5 LU-9680 lnet: handle multi-rail setups 0ecb2a167c56 LU-11912 ofd: reduce LUSTRE_DATA_SEQ_MAX_WIDTH c97d4cdf4dc7 LU-16629 osd: refill the existing env I think there are a few approaches that could be used to debug this:
|
| Comment by Andreas Dilger [ 25/Apr/23 ] |
|
"Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50683 |
| Comment by Gerrit Updater [ 20/Jun/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50683/ |
| Comment by Peter Jones [ 20/Jun/23 ] |
|
Landed for 2.16 |