[LU-4224] SLES11SP3 / ldiskfs mkfs.lustre oops Created: 07/Nov/13  Updated: 18/Jul/14  Resolved: 18/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Stephen Champion Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None
Environment:

lustre: 2.5.51
kernel: patchless_client
build: jenkins-arch=x86_64,build_type=server,distro=sles11sp3,ib_stack=inkernel-1751--PRISTINE-3.0.93-0.8_lustre-default


Severity: 3
Rank (Obsolete): 11492

 Description   
  1. mkfs.lustre --fsname=accfs1 --reformat --backfstype=ldiskfs --ost --mgsnode=n013-ib1@o2ib1 --mkfsoptions="-E stride=32" --index=0 --device-size=8000000 /dev/disk/by-id/scsi-360080e50001f7d80000006004a8efb3c

<6>[ 3178.817499] LDISKFS-fs (sdc): ldiskfs is supported in read-only mode only
<6>[ 3178.826654] LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro
<4>[ 4296.930514] ldiskfs: setting module read-write (unsupported)
<6>[ 4296.952857] LDISKFS-fs (sdc): allowing unsupported read-write mount.
<6>[ 4296.959752] LDISKFS-fs (sdc): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro
<1>[ 4296.960303] BUG: unable to handle kernel NULL pointer dereference at (null)
<1>[ 4296.968147] IP: [< (null)>] (null)
<4>[ 4296.973213] PGD 32bc44067 PUD 32bc5b067 PMD 0
<0>[ 4296.977798] Oops: 0010 1 SMP
<4>[ 4296.981120] CPU 8
<4>[ 4296.983015] Modules linked in: ldiskfs(FN) jbd2(F) crc16(F) binfmt_misc(F) nfs(F) lockd(F) fscache(F) auth_rpcgss(F) nfs_acl(F) sunrpc(F) rdma_ucm(F) ib_srp(F) scsi_transport_srp(F) ib_sdp(N) rdma_cm(F) iw_cm(F) ib_addr(F) cpufreq_conservative(F) cpufreq_userspace(F) cpufreq_powersave(F) acpi_cpufreq(F) mperf(F) ib_ipoib(F) ib_cm(F) ib_uverbs(F) ib_umad(F) iw_cxgb3(F) cxgb3(F) mdio(F) mlx4_en(F) mlx4_ib(F) ib_sa(F) ib_mthca(F) ib_mad(F) ib_core(F) xpmem(FX) xp(F) gru(F) xvma(FX) numatools(FX) joydev(F) usbhid(F) hid(F) uhci_hcd(F) ehci_hcd(F) qla2xxx(F) usbcore(F) processor(F) i7core_edac(F) ipv6(F) i2c_i801(F) iTCO_wdt(F) scsi_transport_fc(F) thermal_sys(F) usb_common(F) mlx4_core(F) ipv6_lib(F) i2c_core(F) scsi_tgt(F) iTCO_vendor_support(F) edac_core(F) pcspkr(F) hwmon(F) ioatdma(F) button(F) serio_raw(F) e1000(F) bnx2(F) ipg(F) igb(F) ptp(F) pps_core(F) mii(F) dca(F) af_packet(F) isci(FX) libsas(F) megaraid_mbox(F) megaraid_mm(F) megaraid_sas(F) mptctl(F) mpt2sas(F) mptsas(F) scsi_transport_sas(F) mptscsih(F) mptbase(F) raid_class(F) sg(F) sr_mod(F) sd_mod(F) ata_generic(F) ata_piix(F) ahci(F) libahci(F) libata(F) scsi_mod(F) ide_core(F) cdrom(F) dm_memcache(F) dm_log(F) dm_mod(F) loop(F) nls_utf8(F) vfat(F) xfs_dmapi(F) xfs(F) dmapi(F) fuse(F) ext3(F) jbd(F) mbcache(F) msdos(F) fat(F) autofs4(F) nls_cp437(F) rtc_cmos(F) crc_t10dif(F) [last unloaded: ldiskfs]
<4>[ 4297.107064] Supported: No, Unsupported modules are loaded
<4>[ 4297.112446]
<4>[ 4297.113990] Pid: 26175, comm: mkfs.lustre Tainted: GF NX 3.0.93-0.8_lustre-default #1 SGI.COM C1104-2TY9/X8DTT-IBQ
<4>[ 4297.125253] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
<4>[ 4297.132867] RSP: 0018:ffff8802d69f9c20 EFLAGS: 00010206
<4>[ 4297.138302] RAX: ffff8805c9e7ca30 RBX: ffff8805c9e7ca18 RCX: 00000000000152f8
<4>[ 4297.145509] RDX: 0000000000000010 RSI: 0000000000000001 RDI: ffff8805c9e7c980
<4>[ 4297.152642] RBP: ffff8805c9e7c980 R08: 0000000000008100 R09: 0000000000000001
<4>[ 4297.159845] R10: 0000000000000004 R11: ffff8802d69f9c98 R12: ffff8805c9e7c9e0
<4>[ 4297.167114] R13: ffff8805c9e7c9b0 R14: 00000000ffffffff R15: ffff8805ca003898
<4>[ 4297.174418] FS: 00002aaaab246f20(0000) GS:ffff88033fc80000(0000) knlGS:0000000000000000
<4>[ 4297.182498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 4297.188295] CR2: 0000000000000000 CR3: 00000002d6a08000 CR4: 00000000000007e0
<4>[ 4297.195590] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 4297.202737] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 4297.209944] Process mkfs.lustre (pid: 26175, threadinfo ffff8802d69f8000, task ffff8802ce3fa540)
<0>[ 4297.218855] Stack:
<4>[ 4297.220889] ffffffff811b060c ffff8802d69f9d08 ffff8805ca003878 0000000000000010
<4>[ 4297.228511] 0000000000000002 0000000000000006 ffffffff811b14f4 ffff8805c9e7c9b8
<4>[ 4297.236074] ffff8805ca0037b0 ffff8802ff440818 0000000000000000 000000d02c98f017
<0>[ 4297.243567] Call Trace:
<0>[ 4297.246121] Inexact backtrace:
<0>[ 4297.246122]
<4>[ 4297.250677] [<ffffffff811b060c>] ? dqput+0xac/0x1e0
<4>[ 4297.255668] [<ffffffff811b14f4>] ? __dquot_initialize+0x224/0x2a0
<4>[ 4297.262010] [<ffffffff8116fe15>] ? d_rehash+0x25/0x40
<4>[ 4297.267229] [<ffffffffa0a74339>] ? ldiskfs_create+0x29/0x190 [ldiskfs]
<4>[ 4297.273907] [<ffffffff81163bed>] ? generic_permission+0x1d/0xc0
<4>[ 4297.279971] [<ffffffff81166a44>] ? vfs_create+0xc4/0x130
<4>[ 4297.285447] [<ffffffff81166e94>] ? do_last+0x3e4/0x800
<4>[ 4297.290743] [<ffffffff81167f19>] ? path_openat+0xd9/0x420
<4>[ 4297.296229] [<ffffffff81120d2d>] ? handle_pte_fault+0x1cd/0x230
<4>[ 4297.302311] [<ffffffff8116839c>] ? do_filp_open+0x4c/0xc0
<4>[ 4297.307841] [<ffffffff81174cca>] ? alloc_fd+0x4a/0x140
<4>[ 4297.313172] [<ffffffff81158eef>] ? do_sys_open+0x17f/0x250
<4>[ 4297.318789] [<ffffffff81467d12>] ? system_call_fastpath+0x16/0x1b
<0>[ 4297.325022] Code: Bad RIP value.
<1>[ 4297.328379] RIP [< (null)>] (null)
<4>[ 4297.333622] RSP <ffff8802d69f9c20>
<0>[ 4297.337126] CR2: 0000000000000000



 Comments   
Comment by Stephen Champion [ 07/Nov/13 ]

Core will be in ftp://shell.sgi.com/collect/lu4224 within an hour and should remain for a week.
rw-rr- 1 root root 2085751161 Nov 7 01:06 LU-4224.tbz

This is using "lastSuccessfulBuild" on 2013.11.06.

My own builds have not had this problem, but there are two notable differences:
1) The last build I tested was a pull from about a week ago.
2) I have been building both server and client against the SuSE provided default kernel.

On a side note: Does automated testing for sles11sp[23] exist? If so, is there an easy way to check the test reports? I noticed that even SLES specific changes (ie http://review.whamcloud.com/#/c/7752/) are tested against CentOS.

Comment by John Fuchs-Chesney (Inactive) [ 17/Jul/14 ]

Hello Stephen,
Is there any further work required on this ticket from 2013?

Many thanks,
~ jfc.

Comment by Stephen Champion [ 18/Jul/14 ]

If you are actively testing sles11sp3 servers with ldiskfs, then this must be resolved.

Comment by John Fuchs-Chesney (Inactive) [ 18/Jul/14 ]

Good news indeed – thank you for letting us know this Stephen.
Best regards,
~ jfc.

Generated at Sat Feb 10 06:39:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.