[LU-10094] sanity test_17f: 'ls' fails with "ls: reading directory *: Input/output error" Created: 06/Oct/17  Updated: 04/Sep/19  Resolved: 15/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Critical
Reporter: James Casper Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: ppc
Environment:

trevis, full, x86_64 servers, ppc clients
servers: el7.4, ldiskfs, branch master, v2.10.53.1, b3642
clients: el7.4, branch master, v2.10.53.1, b3642


Issue Links:
Related
is related to LU-10095 sanity test_18: Failed to ls /mnt/lus... Closed
is related to LU-10096 sanity test_22: ls -lR /mnt/lustre/d2... Closed
is related to LU-10983 sanity tests fails with 'ls: reading ... Closed
is related to LU-10100 sanity test_27a: setstripe failed wit... Resolved
is related to LU-10099 sanity test_24A: Expected 5000 files,... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.whamcloud.com/test_sessions/ba995751-659c-4e63-9b5b-fbf101137b78

From test_log:

ls: reading directory /mnt/lustre/d17f.sanity: Input/output error
 sanity test_17f: @@@@@@ FAIL: test_17f failed with 2 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:5289:error()
  = /usr/lib64/lustre/tests/test-framework.sh:5565:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5604:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5451:run_test()
  = /usr/lib64/lustre/tests/sanity.sh:459:main()


 Comments   
Comment by James Nunez (Inactive) [ 02/May/18 ]

Several sanity tests have 'ls' fail with the "reading directory ... Input/output error" for PPC architectures including test_17f, 18, 22, 24v, 24A, 32b, 32d, 32f, 32h, 48a, 48b, 48c, 51a, 51b, 56ab, and 154a.

For full test group results, the first time we see these tests fail for PPC is on 2017-09-17 20:43:36 UTC for master build # 3642, version 2.10.53.1.

Comment by Jian Yu [ 27/Jun/19 ]

On ppc64 client:

# ls /mnt/lustre/
ls: reading directory /mnt/lustre/: Input/output error

Dmesg:

[53353.003090] Lustre: Mounted lustre-client
[53354.406948] Lustre: DEBUG MARKER: Using TIMEOUT=20
[53372.104367] Lustre: 30604:0:(mdc_request.c:1549:mdc_read_page()) Page-wide hash collision: 0xfeffffffffffffff
[53378.035937] Lustre: lustre-OST0000-osc-c0000000788f6800: disconnect after 24s idle
[54485.730632] Lustre: 30675:0:(mdc_request.c:1549:mdc_read_page()) Page-wide hash collision: 0xfeffffffffffffff
[54485.730769] Lustre: 30675:0:(mdc_request.c:1549:mdc_read_page()) Skipped 1 previous similar message
Comment by Lai Siyao [ 15/Jul/19 ]

Can you run 'getconf PAGE_SIZE' on ppc64 client?

Comment by Gerrit Updater [ 15/Jul/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35517
Subject: LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5fcc4cc687eb995946fad3bdf4e96caf4264b20c

Comment by Jian Yu [ 15/Jul/19 ]

Can you run 'getconf PAGE_SIZE' on ppc64 client?

# uname -m
ppc64
# getconf PAGE_SIZE
65536
Comment by Lai Siyao [ 16/Jul/19 ]

Mmm, the above patch should be able to fix this issue.

Comment by Gerrit Updater [ 15/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35517/
Subject: LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d8b19ae6617733df003a906aca1791791a5f0eff

Comment by Peter Jones [ 15/Aug/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 18/Aug/19 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35812
Subject: LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 3511039df9f17f959fc57c4046ad00984824886d

Comment by Gerrit Updater [ 04/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35812/
Subject: LU-10094 mdc: dir page ldp_hash_end mistakenly adjusted
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: c7538cf4bf952fd5222e71541fbab544e35a5b77

Generated at Sat Feb 10 02:32:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.