[LU-1384] MDS Kernel Panic when trying to mount the lustre file system Created: 08/May/12 Updated: 01/Jun/12 Resolved: 01/Jun/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0, Lustre 2.3.0 |
| Fix Version/s: | Lustre 2.3.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Fabio Verzelloni | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux 2.6.32-220.7.1.el6_lustre.g9c8f747.x86_64 #1 SMP Tue Apr 24 14:27:35 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4605 |
| Description |
|
After the mkfs of all the FS I was able to mount it, and do a simple 'dd' to create few files. Once that I mount it on 12 client with lustre 1.8.4 and trying to make IOR benchmark, using 2 nodes for a total of 12 cores the file system immediately hang and the MDS01 had a kernel panic, as follow: Message from syslogd@mds01 at May 8 12:00:59 ... The heartbeat tried to takeover but immediately had kernel panic too: Message from syslogd@mds02 at May 8 12:04:05 ... Message from syslogd@mds02 at May 8 12:04:05 ... To make the file system I did as the attached file weisshorn_mkfs.sh The SSD Lun is built on a LSI SSD controller with RAID10. Any suggestions or input that I can try to fix the problem? |
| Comments |
| Comment by Fabio Verzelloni [ 08/May/12 ] |
|
That's the moment of the kernel panic as soon as I mounted the lustre FS on the client with 1.8.4 |
| Comment by Fabio Verzelloni [ 08/May/12 ] |
|
The version of lustre on the client side which are killing the MDS are: lustre-modules-1.8.4-2.6.32.36_0.5_default_201202291115 cray-liblustreconfig0-1.0-1.0400.30000.6.18.gem |
| Comment by Peter Jones [ 10/May/12 ] |
|
Lai Could you please look into this one? Thanks Peter |
| Comment by Andreas Dilger [ 10/May/12 ] |
|
As a starting point, the client should never be able to crash the MDS. The MDS code needs to be updated to validate the incoming data and return an error if it is wrong. A separate case is that the 1.8.4 client will not work correctly with a 2.x server without several patches being applied. |
| Comment by Peter Jones [ 24/May/12 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Zhenyu Xu [ 25/May/12 ] |
|
patch tracking at http://review.whamcloud.com/2905
MDS get crashed when it is connected by unsupported 1.8.x client, kernel:LustreError: 3657:0:(mdd_object.c:635:mdd_big_lmm_get()) We need validate the incoming @ma lest old client crash the MDS. |
| Comment by Peter Jones [ 01/Jun/12 ] |
|
Landed for 2.3 |