[LU-3169] sanity 24v: ls: reading directory /mnt/lustre/d0.sanity/d24: Input/output error Created: 15/Apr/13  Updated: 25/Apr/13  Resolved: 25/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: nasf (Inactive)
Resolution: Duplicate Votes: 0
Labels: LB, zfs

Severity: 3
Rank (Obsolete): 7729

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/31e6229a-a18c-11e2-8fc0-52540035b04c.

The sub-test test_24v failed with the following error:

test failed to respond and timed out

Info required for matching: sanity 24v

From the test log:

== sanity test 24v: list directory with large files (handle hash collision, bug: 17560) == 05:48:16 (1365511696)

  • created 10000 (time 1365511718.82 total 21.86 last 21.86)
  • created 20000 (time 1365511741.85 total 44.90 last 23.03)
  • created 30000 (time 1365511763.98 total 67.02 last 22.13)
  • created 40000 (time 1365511803.12 total 106.17 last 39.14)
  • created 50000 (time 1365511905.09 total 208.14 last 101.97)
  • created 60000 (time 1365511983.21 total 286.26 last 78.12)
  • created 70000 (time 1365512040.82 total 343.87 last 57.61)
  • created 80000 (time 1365512097.71 total 400.75 last 56.88)
  • created 90000 (time 1365512214.20 total 517.25 last 116.50)
    total: 100000 creates in 1401.14 seconds: 71.37 creates/second
    mdc.lustre-MDT0000-mdc-ffff88007abcd000.stats=clear
    ls: reading directory /mnt/lustre/d0.sanity/d24: Input/output error
    sanity test_24v: @@@@@@ FAIL: error in listing large dir
    Trace dump:
    = /usr/lib64/lustre/tests/test-framework.sh:4024:error_noexit()
    = /usr/lib64/lustre/tests/test-framework.sh:4047:error()
    = /usr/lib64/lustre/tests/sanity.sh:1018:test_24v()
    = /usr/lib64/lustre/tests/test-framework.sh:4301:run_one()
    = /usr/lib64/lustre/tests/test-framework.sh:4334:run_one_logged()
    = /usr/lib64/lustre/tests/test-framework.sh:4189:run_test()
    = /usr/lib64/lustre/tests/sanity.sh:1036:main()
    Dumping lctl log to /logdir/test_logs/2013-04-09/lustre-reviews-el6-x86_64-review-1_1_1_14707_-70245651123900-051101/sanity.test_24v.*.1365513300.log
    CMD: wtm-27vm3,wtm-27vm4,wtm-27vm5,wtm-27vm6.rosso.whamcloud.com /usr/sbin/lctl dk > /logdir/test_logs/2013-04-09/lustre-reviews-el6-x86_64-review-1_1_1_14707_-70245651123900-051101/sanity.test_24v.debug_log.\$(hostname -s).1365513300.log;
    dmesg > /logdir/test_logs/2013-04-09/lustre-reviews-el6-x86_64-review-1_1_1_14707_-70245651123900-051101/sanity.test_24v.dmesg.\$(hostname -s).1365513300.log

From the client console log:

06:15:00:Lustre: DEBUG MARKER: == sanity test 24v: list directory with large files (handle hash collision, bug: 17560) == 05:48:16 (1365511696)
06:15:00:Lustre: DEBUG MARKER: cancel_lru_locks mdc start
06:15:01:Lustre: DEBUG MARKER: cancel_lru_locks mdc stop
06:15:01:Lustre: 18546:0:(dir.c:463:ll_get_dir_page()) Page-wide hash collision: 6491135612813312
06:15:01:LustreError: 18546:0:(dir.c:594:ll_dir_read()) error reading dir [0x200000400:0xa3:0x0] at 6491135612813312: rc -5
06:15:01:Lustre: 18546:0:(dir.c:463:ll_get_dir_page()) Page-wide hash collision: 6491135612813312
06:15:01:LustreError: 18546:0:(dir.c:594:ll_dir_read()) error reading dir [0x200000400:0xa3:0x0] at 6491135612813312: rc -5
06:15:01:Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_24v: @@@@@@ FAIL: error in listing large dir
06:15:01:Lustre: DEBUG MARKER: sanity test_24v: @@@@@@ FAIL: error in listing large dir

Note that:

  • The failure happened on ZFS; recent ldiskfs sessions are clean.
  • The build being tested includes the LU-2990 fix.
  • This test, if run by Autotest, were skipped on ZFS due to "not enough free inodes 15833 required 100000", but is enabled by the patch (http://review.whamcloud.com/5806) being tested.


 Comments   
Comment by Li Wei (Inactive) [ 15/Apr/13 ]

https://maloo.whamcloud.com/test_sets/c230bad4-a198-11e2-bdac-52540035b04c

Comment by Peter Jones [ 22/Apr/13 ]

Fanyong

Is this LU-2990 reoccurring?

Peter

Comment by Oleg Drokin [ 22/Apr/13 ]

This seems to be same as issue LU-2990 so Fan Yong should take a look

Comment by nasf (Inactive) [ 23/Apr/13 ]

The test failure for the 5806 (set 2) may not contain related fixes from LU-2990. I have refreshed the patch with sanity test_24v enable to verify. (http://review.whamcloud.com/#change,5806,set5)

Comment by Peter Jones [ 25/Apr/13 ]

The test passed ok so it seems that this is a duplicate of LU-2990

Generated at Sat Feb 10 01:31:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.