Details
Description
We have a Lustre 2.1.5 system with two MDSes (active / standby), and two OSSes (active / active). Each OSS has 6 OSTs.
We filled the file system to 100%. To remove the files, one Lustre client ran the following script:
rm -rf /mnt/hss45/ost/ost-00/* &
rm -rf /mnt/hss45/ost/ost-01/* &
rm -rf /mnt/hss45/ost/ost-02/* &
rm -rf /mnt/hss45/ost/ost-03/* &
rm -rf /mnt/hss45/ost/ost-04/* &
rm -rf /mnt/hss45/ost/ost-05/* &
rm -rf /mnt/hss45/ost/ost-06/* &
rm -rf /mnt/hss45/ost/ost-07/* &
rm -rf /mnt/hss45/ost/ost-08/* &
rm -rf /mnt/hss45/ost/ost-09/* &
rm -rf /mnt/hss45/ost/ost-10/* &
rm -rf /mnt/hss45/ost/ost-11/* &
One OSS crashed with this error:
BUG: soft lockup - CPU#25 stuck for 67s! [jbd2/dm-8-8:8966]
. . .
Kernel panic - not syncing: softlockup: hung tasks
The OSS was STONITH'ed.
Shortly thereafter, the second OSS got the same error:
BUG: soft lockup - CPU#17 stuck for 67s! [jbd2/dm-6-8:21440]
Kernel panic - not syncing: softlockup: hung tasks
I have attached the full console output. There was nothing in /var/log/messages.