[LU-11037] parallel-scale-nfsv4 test_racer_on_nfs: BUG: unable to handle kernel NULL pointer dereference Created: 20/May/18  Updated: 24/Mar/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4, Lustre 2.10.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1e658bea-5bcf-11e8-93e6-52540065bddc

test_racer_on_nfs failed with the following error:

Test crashed during parallel-scale-nfsv4 test_racer_on_nfs

 Env: 2.10.4-RC2 EL7.5 ldiskfs

[99369.354283] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 01:39:35 \(1526780375\)
[99369.557883] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 01:39:35 (1526780375)
[99406.945403] BUG: unable to handle kernel NULL pointer dereference at (null)
[99406.946849] IP: [< (null)>] (null)
[99406.947588] PGD 8000000038ce1067 PUD 61368067 PMD 0 
[99406.948349] Oops: 0010 [#1] SMP 
[99406.948900] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) osc(OE) mdc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) nfsv3 nfs_acl brd loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd parport_pc i2c_piix4 pcspkr parport i2c_core joydev virtio_balloon auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix libata crct10dif_pclmul crct10dif_common 8139too crc32c_intel
[99406.960607] serio_raw virtio_pci 8139cp virtio_ring mii virtio floppy [last unloaded: libcfs]
[99406.961831] CPU: 0 PID: 27439 Comm: mv Kdump: loaded Tainted: G W OE ------------ 3.10.0-862.2.3.el7.x86_64 #1
[99406.963296] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[99406.964083] task: ffff94fd95e5cf10 ti: ffff94fd95f48000 task.ti: ffff94fd95f48000
[99406.965079] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
[99406.966146] RSP: 0018:ffff94fd95f4bd08 EFLAGS: 00010246
[99406.966870] RAX: 0000000000000000 RBX: ffff94fdbbc9c540 RCX: ffff94fdbbc9c578
[99406.967841] RDX: 0000000000000800 RSI: ffff94fdbbc9c540 RDI: ffff94fdb8d10e10
[99406.968808] RBP: ffff94fd95f4bd20 R08: ffff94fdbbe700a0 R09: ffffffffbe0355d5
[99406.969761] R10: 0000000000000000 R11: 0000000000000000 R12: ffff94fdbbe70000
[99406.970728] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff94fd761bc000
[99406.971681] FS: 00007fdd44a27840(0000) GS:ffff94fdbfc00000(0000) knlGS:0000000000000000
[99406.972777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[99406.973573] CR2: 0000000000000000 CR3: 0000000038ca8000 CR4: 00000000000606f0
[99406.974540] Call Trace:
[99406.974994] [<ffffffffbe025953>] ? lookup_real+0x23/0x60
[99406.975753] [<ffffffffbe026362>] __lookup_hash+0x42/0x60
[99406.976505] [<ffffffffbe02d179>] SYSC_renameat2+0x3a9/0x5a0
[99406.977339] [<ffffffffbe51f6e1>] ? system_call_after_swapgs+0xae/0x146
[99406.978253] [<ffffffffbe51f6d5>] ? system_call_after_swapgs+0xa2/0x146
[99406.979158] [<ffffffffbe51f6e1>] ? system_call_after_swapgs+0xae/0x146
[99406.980040] [<ffffffffbe51f6d5>] ? system_call_after_swapgs+0xa2/0x146
[99406.980979] [<ffffffffbe02e1ee>] SyS_renameat2+0xe/0x10
[99406.981708] [<ffffffffbe02e22e>] SyS_rename+0x1e/0x20
[99406.982411] [<ffffffffbe51f795>] system_call_fastpath+0x1c/0x21
[99406.983241] [<ffffffffbe51f6e1>] ? system_call_after_swapgs+0xae/0x146
[99406.984144] Code: Bad RIP value.
[99406.984669] RIP [< (null)>] (null)
[99406.985396] RSP <ffff94fd95f4bd08>
[99406.985881] CR2: 0000000000000000

 

MDS console

[99277.931966] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 01:39:35 \(1526780375\)
[99278.152024] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 01:39:35 (1526780375)
[99279.975074] LustreError: 11900:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x200077e13:0x4562:0x0] get parent: rc = -2
[99279.976991] LustreError: 11900:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 11 previous similar messages
[99300.490404] LustreError: 11897:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000785e3:0x313a:0x0] get parent: rc = -2

<ConMan> Console [trevis-11vm8] disconnected from <trevis-11:6007> at 05-20 01:40.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
parallel-scale-nfsv4 test_racer_on_nfs - Test crashed during parallel-scale-nfsv4 test_racer_on_nfs



 Comments   
Comment by Sarah Liu [ 20/May/18 ]

Not sure if this is dup of LU-10541 since the MDS console and trace are different. Also, the same test passed in 2.10.4-RC1 testing (build 111)

Generated at Sat Feb 10 02:40:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.