[LU-3119] System hang when running sanity test 24x on DNE with ZFS Created: 05/Apr/13  Updated: 08/Apr/13  Resolved: 08/Apr/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: dne, zfs
Environment:

server and client: lustre-master build #1370


Issue Links:
Duplicate
duplicates LU-2990 Failure on sanity test_24v: error in ... Resolved
Severity: 3
Rank (Obsolete): 7574

 Description   

System hang when running sanity test_24x with 2MDTs over ZFS

MDS console shows:

Lustre: DEBUG MARKER: == sanity test 24x: cross rename/link should be failed == 11:47:46 (1365187666)
LustreError: 14710:0:(mdt_reint.c:944:mdt_reint_link()) Target directory [0x380000bd0:0x1ae82:0x0] is on another MDT
LNet: Service thread pid 14747 completed after 0.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).

client console shows:

Lustre: DEBUG MARKER: == sanity test 24x: cross rename/link should be failed == 11:47:46 (1365187666)
Lustre: 8053:0:(dir.c:463:ll_get_dir_page()) Page-wide hash collision: 1706989648068149248
LustreError: 8053:0:(dir.c:594:ll_dir_read()) error reading dir [0x380000bd0:0x27db:0x0] at 1706989648068149248: rc -5
Lustre: 8053:0:(dir.c:463:ll_get_dir_page()) Page-wide hash collision: 1706989648068149248
LustreError: 8053:0:(dir.c:594:ll_dir_read()) error reading dir [0x380000bd0:0x27db:0x0] at 1706989648068149248: rc -5


 Comments   
Comment by Di Wang [ 06/Apr/13 ]

I suspect this is already fixed, I tried DNE on ZFS (1 MDS/2MDTs), it works locally

== sanity test 24x: cross rename/link should be failed == 20:34:56 (1390278896)
rename returned -1: Invalid cross-device link
rename returned -1: Invalid cross-device link
ln: creating hard link `/mnt/lustre/d0.sanity/d24/remote_dir/tgt_file1' => `/mnt/lustre/d0.sanity/d24/src_file': Invalid cross-device link
Resetting fail_loc on all nodes...done.
PASS 24x (1s)

Please try lastest master.

Comment by Andreas Dilger [ 08/Apr/13 ]

Duplicate of LU-2990.

Generated at Sat Feb 10 01:31:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.