Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.2.0, Lustre 2.3.0
-
None
-
3
-
4605
Description
After the mkfs of all the FS I was able to mount it, and do a simple 'dd' to create few files. Once that I mount it on 12 client with lustre 1.8.4 and trying to make IOR benchmark, using 2 nodes for a total of 12 cores the file system immediately hang and the MDS01 had a kernel panic, as follow:
Message from syslogd@mds01 at May 8 12:00:59 ...
kernel:LustreError: 3523:0:(mdd_object.c:635:mdd_big_lmm_get()) ASSERTION( ma->ma_lmm_size > 0 ) failed:
Message from syslogd@mds01 at May 8 12:00:59 ...
kernel:LustreError: 3523:0:(mdd_object.c:635:mdd_big_lmm_get()) LBUG
Write failed: Broken pipe
The heartbeat tried to takeover but immediately had kernel panic too:
Message from syslogd@mds02 at May 8 12:04:05 ...
kernel:LustreError: 3657:0:(mdd_object.c:635:mdd_big_lmm_get()) ASSERTION( ma->ma_lmm_size > 0 ) failed:
Message from syslogd@mds02 at May 8 12:04:05 ...
kernel:LustreError: 3657:0:(mdd_object.c:635:mdd_big_lmm_get()) LBUG
Write failed: Broken pipe
To make the file system I did as the attached file weisshorn_mkfs.sh
The SSD Lun is built on a LSI SSD controller with RAID10.
Any suggestions or input that I can try to fix the problem?
Attached also the /var/log/messages with the kernel messages.