[LU-1469] Hyperion DAT - failures with multiple loads (simul+mdtest+ior) Created: 02/Jun/12 Updated: 14/Aug/16 Resolved: 14/Aug/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Cliff White (Inactive) | Assignee: | Oleg Drokin |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 3978 |
| Description |
|
Divided the 1012 DAT nodes into three group, each ~337 nodes. Jun 1 18:29:00 ehyperion-dit29 kernel: LustreError: 13446:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1338600340, 200s ago); not entering recovery in server code, just going back to sleep ns: filter-lustre-OST0013_UUID lock: ffff8802a31d4b40/0x676dc3d48e273bbd lrc: 3/0,1 mode: - Server debug logs, one failure client debug log, and messages are on FTP site, filename DAT2.tar.gz |
| Comments |
| Comment by Peter Jones [ 03/Jun/12 ] |
|
Oleg Could you please comment on this one? Thanks Peter |
| Comment by Oleg Drokin [ 04/Jun/12 ] |
|
unfortunately critical log file dumped after the error message was not collected, but luckily a new test run is being planned so we might be able to reproduce and get it this time in any case all messages seen so far only show a deeply loaded system, but nothing indicating any sort of a failure. Cliff told me separately that one client gets evicted and this aborts the test, though I don't see any signs of that in the logs provided. |
| Comment by James A Simmons [ 14/Aug/16 ] |
|
Really old blocker for unsupported version |