[LU-11812] parallel-scale-nfsv4 test racer_on_nfs crashes with “BUG: unable to handle kernel NULL pointer dereference” Created: 19/Dec/18 Updated: 24/Jun/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
2.11.0 servers with 2.12.0 RC2 clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
parallel-scale-nfsv4 test_racer_on_nfs client crashes for 2.11.0 servers and 2.12.0 RC2 clients. Looking at the logs at https://testing.whamcloud.com/test_sets/d535d716-fd79-11e8-a97c-52540065bddc, from client 1 (vm10) we can see the stack trace [47194.303686] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 18:06:33 (1544551593) [47194.487648] Lustre: DEBUG MARKER: MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs [47277.681873] 2[25283]: segfault at 8 ip 00007f3fedceb718 sp 00007ffdbca831f0 error 4 in ld-2.17.so[7f3fedce0000+22000] [47414.009897] 15[3823]: segfault at 0 ip 00000000004043e0 sp 00007ffca5fa64e8 error 6 in 15[400000+6000] [47419.350008] 14[17578]: segfault at 8 ip 00007ff5d6baa718 sp 00007ffecd59c0f0 error 4 in ld-2.17.so[7ff5d6b9f000+22000] [47427.166966] 9[4070]: segfault at 8 ip 00007fe542938718 sp 00007ffeb0050000 error 4 in ld-2.17.so[7fe54292d000+22000] [47450.889695] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [47450.890808] IP: [<ffffffffc073d6a6>] nfs_advise_use_readdirplus+0x6/0x40 [nfs] [47450.891637] PGD 800000005bd1c067 PUD 7c171067 PMD 0 [47450.892238] Oops: 0000 [#1] SMP [47450.892634] Modules linked in: nfsv3 nfs_acl mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel sunrpc ppdev aesni_intel pcspkr joydev lrw gf128mul glue_helper ablk_helper cryptd virtio_balloon i2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix libata 8139too virtio_pci virtio_ring virtio 8139cp mii [last unloaded: lnet_selftest] [47450.901754] CPU: 0 PID: 20258 Comm: rm Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 [47450.902787] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [47450.903355] task: ffff8d5a828f0000 ti: ffff8d5a8ca50000 task.ti: ffff8d5a8ca50000 [47450.904097] RIP: 0010:[<ffffffffc073d6a6>] [<ffffffffc073d6a6>] nfs_advise_use_readdirplus+0x6/0x40 [nfs] [47450.905052] RSP: 0018:ffff8d5a8ca53df8 EFLAGS: 00010246 [47450.905560] RAX: ffff8d5aeca26000 RBX: ffff8d5acf009640 RCX: ffffff8000000000 [47450.906225] RDX: ffffff8100000000 RSI: ffffff8100000000 RDI: 0000000000000000 [47450.906903] RBP: ffff8d5a8ca53e40 R08: 0000000000000001 R09: 0000000000000000 [47450.907579] R10: 00007ffdb2737ca0 R11: 0000000000000246 R12: ffff8d5aeca26000 [47450.908233] R13: ffff8d5a80c236c0 R14: ffff8d5afa56d6a0 R15: ffff8d5a8ca53ec0 [47450.908910] FS: 00007f35ee41e740(0000) GS:ffff8d5affc00000(0000) knlGS:0000000000000000 [47450.909721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [47450.910284] CR2: 0000000000000028 CR3: 000000007b4f6000 CR4: 00000000000606f0 [47450.911063] Call Trace: [47450.911352] [<ffffffffc0743d19>] ? nfs_getattr+0xf9/0x250 [nfs] [47450.911987] [<ffffffffa2246aa9>] vfs_getattr+0x49/0x80 [47450.912539] [<ffffffffa2246b25>] vfs_fstat+0x45/0x80 [47450.913081] [<ffffffffa2247094>] SYSC_newfstat+0x24/0x60 [47450.913687] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.914387] [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146 [47450.915054] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.915724] [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146 [47450.916373] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.917022] [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146 [47450.917664] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.918279] [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146 [47450.918908] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.919551] [<ffffffffa2774d15>] ? system_call_after_swapgs+0xa2/0x146 [47450.920171] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.920798] [<ffffffffa224746e>] SyS_newfstat+0xe/0x10 [47450.921293] [<ffffffffa2774ddb>] system_call_fastpath+0x22/0x27 [47450.921879] [<ffffffffa2774d21>] ? system_call_after_swapgs+0xae/0x146 [47450.922527] Code: 89 8d 60 ff ff ff e8 81 df 01 e2 8b 8d 60 ff ff ff e9 02 fe ff ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 <48> 8b 47 28 48 89 e5 48 8b 80 50 03 00 00 f6 80 fc 02 00 00 01 [47450.925633] RIP [<ffffffffc073d6a6>] nfs_advise_use_readdirplus+0x6/0x40 [nfs] [47450.926354] RSP <ffff8d5a8ca53df8> [47450.926704] CR2: 0000000000000028 Similar issue at https://testing.whamcloud.com/test_sets/56a97530-beb1-11e8-b143-52540065bddc , with the following in the client 1 (vm5) console [ 4879.403401] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 21:00:54 (1537650054) [ 4879.583288] Lustre: DEBUG MARKER: MDSCOUNT=1 OSTCOUNT=7 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs [ 4951.358555] 13[22550]: segfault at 8 ip 00007f1a4c2b6958 sp 00007fff9954d570 error 4 in ld-2.17.so[7f1a4c2ab000+22000] [ 4964.483018] 1[23397]: segfault at 8 ip 00007f9b1355a958 sp 00007ffdc00ef9f0 error 4 in ld-2.17.so[7f9b1354f000+22000] [ 5088.407853] 5[30393]: segfault at 8 ip 00007f310575e958 sp 00007ffdbc27b700 error 4 in ld-2.17.so[7f3105753000+22000] [ 5103.960350] ------------[ cut here ]------------ [ 5103.960967] kernel BUG at fs/dcache.c:661! [ 5103.961381] invalid opcode: 0000 [#1] SMP [ 5103.961939] Modules linked in: nfsv3 nfs_acl lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc ppdev iosf_mbi i2c_piix4 i2c_core crc32_pclmul pcspkr joydev ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd parport_pc virtio_balloon parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common 8139too crc32c_intel serio_raw virtio_pci 8139cp virtio_ring virtio mii floppy [ 5103.970783] CPU: 1 PID: 2704 Comm: cp Kdump: loaded Tainted: G OE ------------ 3.10.0-862.9.1.el7.x86_64 #1 [ 5103.971789] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 5103.972336] task: ffff8c90f9fd8000 ti: ffff8c90b5e9c000 task.ti: ffff8c90b5e9c000 [ 5103.973032] RIP: 0010:[<ffffffffb5033bd2>] [<ffffffffb5033bd2>] dget_parent+0x72/0x80 [ 5103.973808] RSP: 0018:ffff8c90b5e9fde0 EFLAGS: 00010246 [ 5103.974305] RAX: 0000000000000000 RBX: ffff8c90f859acc0 RCX: 0000000000000000 [ 5103.974968] RDX: 0000000000000000 RSI: 0000000100000000 RDI: ffff8c90f859ad18 [ 5103.975633] RBP: ffff8c90b5e9fdf8 R08: 0000000000000000 R09: 90e45f4528000000 [ 5103.976302] R10: 00007ffeb1207460 R11: 0000000000000246 R12: ffff8c90f873e240 [ 5103.976973] R13: ffff8c90f859ad18 R14: ffff8c90fbdfb9a0 R15: ffff8c90b5e9fec0 [ 5103.977640] FS: 00007fb39631c840(0000) GS:ffff8c90ffd00000(0000) knlGS:0000000000000000 [ 5103.978391] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5103.978926] CR2: 0000000000406538 CR3: 000000007a4b2000 CR4: 00000000000606e0 [ 5103.979600] Call Trace: [ 5103.979885] [<ffffffffc07bbcad>] nfs_getattr+0xed/0x250 [nfs] [ 5103.980453] [<ffffffffb5020e09>] vfs_getattr+0x49/0x80 [ 5103.980943] [<ffffffffb5020e85>] vfs_fstat+0x45/0x80 [ 5103.981423] [<ffffffffb50215a4>] SYSC_newfstat+0x24/0x60 [ 5103.981927] [<ffffffffb502ccdd>] ? putname+0x3d/0x60 [ 5103.982528] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.983146] [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 5103.983775] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.984418] [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 5103.985048] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.985668] [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 5103.986277] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.986895] [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 5103.987518] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.988137] [<ffffffffb55206d5>] ? system_call_after_swapgs+0xa2/0x146 [ 5103.988759] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.989382] [<ffffffffb502179e>] SyS_newfstat+0xe/0x10 [ 5103.989880] [<ffffffffb5520795>] system_call_fastpath+0x1c/0x21 [ 5103.990451] [<ffffffffb55206e1>] ? system_call_after_swapgs+0xae/0x146 [ 5103.991062] Code: 4c 89 ef e8 71 2c 4e 00 49 3b 5c 24 18 75 1e 8b 53 5c 85 d2 74 15 83 c2 01 4c 89 ef 89 53 5c ff 14 25 d0 07 a3 b5 48 89 d8 eb bd <0f> 0b 4c 89 ef ff 14 25 d0 07 a3 b5 eb be 66 66 66 66 90 55 48 [ 5103.994142] RIP [<ffffffffb5033bd2>] dget_parent+0x72/0x80 [ 5103.994694] RSP <ffff8c90b5e9fde0> Similar crashes at |
| Comments |
| Comment by James Nunez (Inactive) [ 09/Apr/19 ] |
|
We are seeing this kernel crash for non-interop testing also; https://testing.whamcloud.com/test_sets/52455738-5af2-11e9-a256-52540065bddc |