Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.11.0, Lustre 2.10.4
-
None
-
SLES12 SP3 server/client
and SLES12 SP2 server/client
-
3
-
9223372036854775807
Description
parallel-scale-nfsv3 test_racer_on_nfs hangs for SLES12 SP3 and SLES12 SP2 server/client only.
From the MDS console:
[76114.715893] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 06:10:32 \(1516371032\) [76114.743657] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 06:10:32 (1516371032) [76114.921643] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [76114.922573] IP: [<ffffffffa13ec095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre] [76114.923403] PGD 0 [76114.923653] Oops: 0000 [#1] SMP [76114.924045] Modules linked in: nfsd(E) nfs_acl(E) lustre(OEN) lmv(OEN) mdc(OEN) osc(OEN) lov(OEN) osp(OEN) mdd(OEN) lod(OEN) mdt(OEN) lfsck(OEN) mgs(OEN) mgc(OEN) osd_ldiskfs(OEN) ldiskfs(OEN) lquota(OEN) fid(OEN) fld(OEN) ksocklnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) libcfs(OEN) loop(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) fscache(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rpcrdma(E) sunrpc(E) ib_isert(E) iscsi_target_mod(E) ib_iser(E) libiscsi(E) scsi_transport_iscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) scsi_transport_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) rdma_cm(E) configfs(E) ib_cm(E) iw_cm(E) ib_core(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ppdev(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 8139too(E) joydev(E) 8139cp(E) pcspkr(E) virtio_balloon(E) i2c_piix4(E) mii(E) parport_pc(E) parport(E) pvpanic(E) button(E) processor(E) ata_generic(E) ext4(E) crc16(E) jbd2(E) mbcache(E) ata_piix(E) ahci(E) libahci(E) virtio_blk(E) floppy(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) usb_common(E) libata(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E) [last unloaded: lnet_selftest] [76114.938788] Supported: No, Unsupported modules are loaded [76114.939342] CPU: 0 PID: 28809 Comm: nfsd Tainted: G OE N 4.4.103-6.33_lustre-default #1 [76114.940280] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [76114.940875] task: ffff88007c3254c0 ti: ffff88002c48c000 task.ti: ffff88002c48c000 [76114.941635] RIP: 0010:[<ffffffffa13ec095>] [<ffffffffa13ec095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre] [76114.942668] RSP: 0018:ffff88002c48fd90 EFLAGS: 00010246 [76114.943220] RAX: ffffffffa14094c0 RBX: 0000000000008000 RCX: 0000000000000000 [76114.943945] RDX: ffffffffa140b05b RSI: 0000000000000000 RDI: ffffffffa14094c0 [76114.944670] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [76114.945399] R10: 0000000000000000 R11: ffff88006a9320fb R12: ffff880065cfce50 [76114.946129] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa140b05b [76114.946859] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [76114.947677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [76114.948266] CR2: 0000000000000030 CR3: 000000007a2e6000 CR4: 00000000000406f0 [76114.948994] Stack: [76114.949215] ffffffffa13b499e 0000000000000000 ffff880065d1f000 ffff88002c48fe00 [76114.950059] ffff88003b8c0000 ffff88003b8c0008 ffff880065cfce50 ffff880078199900 [76114.950904] ffffffffa1479d8f ffff8800442dc000 ffff8800442dc000 ffff880044086018 [76114.951816] Call Trace: [76114.952142] [<ffffffffa13b499e>] ll_set_acl+0xee/0x370 [lustre] [76114.952790] [<ffffffffa1479d8f>] nfsd3_proc_setacl+0x19f/0x260 [nfsd] [76114.953482] [<ffffffffa1469e23>] nfsd_dispatch+0xc3/0x260 [nfsd] [76114.954139] [<ffffffffa05811f8>] svc_process_common+0x418/0x6a0 [sunrpc] [76114.954843] [<ffffffffa058157d>] svc_process+0xfd/0x1b0 [sunrpc] [76114.955483] [<ffffffffa146989a>] nfsd+0xea/0x160 [nfsd] [76114.956044] [<ffffffff8109bb57>] kthread+0xc7/0xe0 [76114.956552] [<ffffffff8160c67f>] ret_from_fork+0x3f/0x70 [76114.958488] DWARF2 unwinder stuck at ret_from_fork+0x3f/0x70 [76114.959116] [76114.959290] Leftover inexact backtrace: [76114.959290] [76114.959848] [<ffffffff8109ba90>] ? kthread_park+0x50/0x50 [76114.960412] Code: c0 c7 05 63 3e 04 00 00 00 04 00 e8 96 d0 3b ff 48 c7 c7 c0 fe 42 a1 e8 7a 26 3b ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> 8b 76 30 e9 32 ed ff ff 66 90 66 66 66 66 90 41 55 41 54 55 [76114.963792] RIP [<ffffffffa13ec095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre] [76114.964576] RSP <ffff88002c48fd90> [76114.964949] CR2: 0000000000000030
In a different test session, https://testing.hpdd.intel.com/test_sets/bba5459c-fc9f-11e7-a10a-52540065bddc , we see the same hang, but there is a little more output in the MDS console log:
[32570.101935] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 10:52:35 (1516301555) [32570.288540] LustreError: 18012:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x240000404:0x1:0x0] without object type: valid = 0x100000001 [32570.288547] LustreError: 18012:0:(llite_lib.c:2354:ll_prep_inode()) new_inode -fatal: rc -12 [32570.402366] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [32570.402366] IP: [<ffffffffa1461095>] ll_xattr_set_common_4_3+0x5/0x10 [lustre]
I can confirm that this test started hanging with this NULL pointer dereference on at least 2018-01-16. This test hangs so often that I can’t review all test hangs to see exactly when this started happening.
Logs for some of the failures are at
https://testing.hpdd.intel.com/test_sets/ebd266f6-fac8-11e7-bd00-52540065bddc
https://testing.hpdd.intel.com/test_sets/9dfe1932-fd36-11e7-a6ad-52540065bddc
https://testing.hpdd.intel.com/test_sets/ce6de31a-fe15-11e7-a10a-52540065bddc