[LU-1469] Hyperion DAT - failures with multiple loads (simul+mdtest+ior) Created: 02/Jun/12  Updated: 14/Aug/16  Resolved: 14/Aug/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.2
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Cliff White (Inactive) Assignee: Oleg Drokin
Resolution: Won't Fix Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 3978

 Description   

Divided the 1012 DAT nodes into three group, each ~337 nodes.
Group 1 - iorfpp
Group 2 - simul
Group 3 - mdtestfpp
Each Group was run separately, and each passed. When all Groups run simultaneously, one or more of the tests will fail.
Server-side errors start with:

Jun 1 18:29:00 ehyperion-dit29 kernel: LustreError: 13446:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1338600340, 200s ago); not entering recovery in server code, just going back to sleep ns: filter-lustre-OST0013_UUID lock: ffff8802a31d4b40/0x676dc3d48e273bbd lrc: 3/0,1 mode: -/PW res: 96739/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0>18446744073709551615) flags: 0x80004000 remote: 0x0 expref: -99 pid: 13446 timeout 0

Server debug logs, one failure client debug log, and messages are on FTP site, filename DAT2.tar.gz



 Comments   
Comment by Peter Jones [ 03/Jun/12 ]

Oleg

Could you please comment on this one?

Thanks

Peter

Comment by Oleg Drokin [ 04/Jun/12 ]

unfortunately critical log file dumped after the error message was not collected, but luckily a new test run is being planned so we might be able to reproduce and get it this time

in any case all messages seen so far only show a deeply loaded system, but nothing indicating any sort of a failure.

Cliff told me separately that one client gets evicted and this aborts the test, though I don't see any signs of that in the logs provided.

Comment by James A Simmons [ 14/Aug/16 ]

Really old blocker for unsupported version

Generated at Sat Feb 10 01:16:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.