Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
Lustre 1.8.8
-
None
-
Sun Fire x4540 server, 48 internal 1TB disks, lustre patched kernel - kernel-2.6.18-308.4.1.el5, Lustre 1.8.8
-
3
-
10643
Description
Since our recent upgrade to 1.8.8, we've been experiencing problems with the md subsystem. Our OSTs are constructed as 8+2 RAID6 metadevices using the mdadm utility.
Every Sunday morning, cron.weekly runs the raid.check scripts and starts re-syncing and if it hits a medium error, the md subsytem hangs, for example "cat /proc/mdstat" hangs. The load on the server immediately starts going up until the server becomes unusable and we have to reboot the OSS server
What could be causing this and should we be running raid.check on the ost metadevices?