Details
-
Bug
-
Resolution: Won't Fix
-
Blocker
-
None
-
Lustre 2.1.2
-
None
-
3
-
3978
Description
Divided the 1012 DAT nodes into three group, each ~337 nodes.
Group 1 - iorfpp
Group 2 - simul
Group 3 - mdtestfpp
Each Group was run separately, and each passed. When all Groups run simultaneously, one or more of the tests will fail.
Server-side errors start with:
Jun 1 18:29:00 ehyperion-dit29 kernel: LustreError: 13446:0:(ldlm_request.c:91:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1338600340, 200s ago); not entering recovery in server code, just going back to sleep ns: filter-lustre-OST0013_UUID lock: ffff8802a31d4b40/0x676dc3d48e273bbd lrc: 3/0,1 mode: -/PW res: 96739/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0>18446744073709551615) flags: 0x80004000 remote: 0x0 expref: -99 pid: 13446 timeout 0
Server debug logs, one failure client debug log, and messages are on FTP site, filename DAT2.tar.gz