Can you specify the timestamp of the troubled recovery process, from the c4-0c0s5n0.log
[2013-03-08 21:03:28][c4-0c0s5n0]Lustre: 7635:0:(client.c:1866:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
[2013-03-08 21:06:04][c4-0c0s5n0]Lustre: routed1-OST00eb-osc-ffff88081e7e4000: Connection restored to routed1-OST00eb (at 10.36.227.92@o2ib)
[2013-03-08 21:06:04][c4-0c0s5n0]Lustre: routed1-OST008b-osc-ffff88081e7e4000: Connection restored to routed1-OST008b (at 10.36.227.92@o2ib)
[2013-03-08 21:28:08][c4-0c0s5n0]LustreError: 11-0: routed1-MDT0000-mdc-ffff88081e7e4000: Communicating with 10.36.227.211@o2ib, operation obd_ping failed with -107.
[2013-03-08 21:28:08][c4-0c0s5n0]Lustre: routed1-MDT0000-mdc-ffff88081e7e4000: Connection to routed1-MDT0000 (at 10.36.227.211@o2ib) was lost; in progress operations using this service will wait for recovery to complete
[2013-03-08 21:28:08][c4-0c0s5n0]LustreError: 167-0: routed1-MDT0000-mdc-ffff88081e7e4000: This client was evicted by routed1-MDT0000; in progress operations using this service will fail.
[2013-03-08 21:28:08][c4-0c0s5n0]Lustre: routed1-MDT0000-mdc-ffff88081e7e4000: Connection restored to routed1-MDT0000 (at 10.36.227.211@o2ib)
I only see that the client restored connection to OST00[eb|8b] at 2013-03-08 21:06:04, and 24 minutes later lost connection with MDS, and restored to it again soon. Then at 22:32 it is umounted (I think by manually).
I guess the assumption clients stopped pinging is because they were evicted for not pinging