[LU-11366] replay-single timeout test 80f: rm: cannot remove '/mnt/lustre/d80f.replay-single/remote_dir': Input/output error Created: 11/Sep/18  Updated: 26/Feb/19  Resolved: 26/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9157 replay-single test_80c: rmdir failed Resolved
is related to LU-11330 replay-single test_70d: Directory not... Resolved
is related to LU-10740 replay-single test_2d: FAIL: checksta... Resolved
is related to LU-11538 replay-single test 80g fails with '/... Resolved
is related to LU-10143 LBUG dt_object.h:2166:dt_declare_reco... Resolved
is related to LU-10589 sanity-dom test_251: test_sanity fail... Resolved
is related to LU-11748 Maloo: tests are reported as failed w... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/5b970932-b5dc-11e8-9df3-52540065bddc

test_80f failed review-dne-zfs-part-4 timed out with the following error:

rm: cannot remove '/mnt/lustre/d80f.replay-single/remote_dir': Input/output error
rmdir failed

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-single test_80f - rmdir failed



 Comments   
Comment by Gerrit Updater [ 12/Sep/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33153
Subject: LU-11366 tests: disable tests for DNE/ZFS testing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ca1b8b35ff7ec0df64180fcfb19fcd56ef890e6d

Comment by Andreas Dilger [ 30/Oct/18 ]

Lai, could you please take a look into this. The replay-single test_80f is failing 100% of the time with review-dne-zfs-part-4 and causing a lot of test failures. We don't currently enforce this test session, but we'd like to get that test passing and enable it. James tried disabling that subtest, but then test_80g and the next tests are failing in a similar manner, so it is likely that the same issue is affecting all of those tests.

Comment by Gerrit Updater [ 30/Oct/18 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33515
Subject: LU-11366 test: test replay-single with full debug
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6606aca76ec770a27a6498fc75079eafd8d66517

Comment by Andreas Dilger [ 27/Nov/18 ]

Lai, I saw that you have abandoned the debug patch. Does that mean you understand what is causing the test failures here?

We'd like to enforce review-dne-zfs-part-4 and remove the non-DNE versions of the ZFS tests, to reduce the testing load and to get better test coverage.

Comment by Lai Siyao [ 28/Nov/18 ]

no, full debug mode failed to dump logs because of crash, I'll turn to this when I have spare time.

Comment by Bruno Faccini (Inactive) [ 03/Dec/18 ]

+1 at https://testing.whamcloud.com/test_sessions/86a18ffa-c1c0-4f89-99d9-bef2ad60e1ac

Comment by Bruno Faccini (Inactive) [ 08/Dec/18 ]

+1 at https://testing.whamcloud.com/test_sessions/d7f0ec38-7b1d-4cba-8e5f-ca45754a694d

Comment by Vladimir Saveliev [ 10/Dec/18 ]

+1 at https://testing.whamcloud.com/test_sessions/456887b0-6287-497f-8357-d511966ac54c

Comment by Patrick Farrell (Inactive) [ 18/Jan/19 ]

This has suddenly become brutally common:

test_80f
  • Error: 'rmdir failed' 
    Failure Rate: 34.15% of most recent 41 runs, 59 skipped (all branches)

That's review-dne-zfs-part-4

https://testing.whamcloud.com/test_sets/e1b9590e-1ac0-11e9-9ed8-52540065bddc

Comment by Alex Zhuravlev [ 21/Jan/19 ]

I was hitting this locally very often, but not with https://review.whamcloud.com/#/c/34069/

Comment by Alex Zhuravlev [ 22/Jan/19 ]

few successful replay-single with ZFS/DNE using the patch above:
https://testing.whamcloud.com/test_sets/40843630-1dea-11e9-9ed8-52540065bddc
https://testing.whamcloud.com/test_sets/07b11042-1e5a-11e9-8388-52540065bddc

Comment by Alex Zhuravlev [ 14/Feb/19 ]

I think 2.13 should be fine given LU-10143 landed

Comment by Alex Zhuravlev [ 14/Feb/19 ]

I checked with Maloo - all tests passed since LU-10143 landing.
https://testing.whamcloud.com/test_sets/query?utf8=✓&warn%5Bnotice%5D=&test_set_script_id=f6a12204-32c3-11e0-a61c-52540025f9ae&query_bugs=&builds=&hosts=&commit_id=&test_groups%5B%5D=review-dne-zfs-part-4&horizon=1123200&window%5Bstart_date%5D=&window%5Bend_date%5D=&os_type_id=&distribution_type_id=&architecture_type_id=&file_system_type_id=&branch_type_id=24a6947e-04a9-11e1-bb5f-52540025f9af&network_type_id=&commit=Update+results&num_results=250

Comment by Andreas Dilger [ 26/Feb/19 ]

Issue was fixed via patch https://review.whamcloud.com/34069 "LU-10143 osd-zfs: allocate sequence in advance".

Generated at Sat Feb 10 02:43:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.