Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
None
-
Lustre 2.13.0, Lustre 2.10.6, Lustre 2.10.7, Lustre 2.12.2, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6
-
RHEL 7.6 servers and RHEL 6.10 clients
-
3
-
9223372036854775807
Description
parallel-scale-nfsv3 test_racer_on_nfs crashes. Looking at the logs at https://testing.whamcloud.com/test_sets/0093481e-ef54-11e8-815b-52540065bddc , we see the following in the kernel-crash log
[112389.058995] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client == 17:20:35 \(1542993635\) [112389.244554] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client == 17:20:35 (1542993635) [112392.193506] BUG: unable to handle kernel paging request at ffffffc0acacf0ff [112392.195074] IP: [<ffffffc0acacf0ff>] 0xffffffc0acacf0ff [112392.196184] PGD 52a14067 PUD 0 [112392.196979] Oops: 0010 [#1] SMP [112392.197736] Modules linked in: nfsd nfs_acl osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul [112392.214016] crct10dif_common crc32c_intel serio_raw floppy 8139too ata_piix libata virtio_pci virtio_ring 8139cp mii virtio [last unloaded: lnet_selftest] [112392.216970] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [112392.219149] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [112392.220252] task: ffffffffb0618480 ti: ffffffffb0600000 task.ti: ffffffffb0600000 [112392.221596] RIP: 0010:[<ffffffc0acacf0ff>] [<ffffffc0acacf0ff>] 0xffffffc0acacf0ff [112392.222878] RSP: 0018:ffff8c913fc03eb8 EFLAGS: 00010286 [112392.223743] RAX: ffffffc0acacf0ff RBX: ffffffffb06784c0 RCX: 0000000009fdbc69 [112392.224907] RDX: ffff8c911d664827 RSI: fffff79281759900 RDI: ffff8c911d664827 [112392.226060] RBP: ffff8c913fc03f10 R08: 000000000001f0a0 R09: ffffffffafb5498c [112392.227239] R10: ffff8c913fc1f0a0 R11: fffff79281e4a1c0 R12: 000000000000000a [112392.228383] R13: 0000000000000013 R14: ff8c9113342e28ff R15: ffff8c913fc162c0 [112392.229533] FS: 0000000000000000(0000) GS:ffff8c913fc00000(0000) knlGS:0000000000000000 [112392.230832] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [112392.231763] CR2: ffffffc0acacf0ff CR3: 0000000077d72000 CR4: 00000000000606f0 [112392.232922] Call Trace: [112392.233358] <IRQ> [112392.233740] [<ffffffffafb54940>] ? rcu_process_callbacks+0x1e0/0x580 [112392.234872] [<ffffffffafaa0f05>] __do_softirq+0xf5/0x280 [112392.235779] [<ffffffffb017832c>] call_softirq+0x1c/0x30 [112392.236658] [<ffffffffafa2e675>] do_softirq+0x65/0xa0 [112392.237526] [<ffffffffafaa1285>] irq_exit+0x105/0x110 [112392.238384] [<ffffffffb01796c8>] smp_apic_timer_interrupt+0x48/0x60 [112392.239423] [<ffffffffb0175df2>] apic_timer_interrupt+0x162/0x170 [112392.240426] <EOI> [112392.240783] [<ffffffffafadafb0>] ? switched_to_idle+0x10/0x10 [112392.241802] [<ffffffffb0169a20>] ? __cpuidle_text_start+0x8/0x8 [112392.242778] [<ffffffffb0169c26>] ? native_safe_halt+0x6/0x10 [112392.243706] [<ffffffffb0169a3e>] default_idle+0x1e/0xc0 [112392.244588] [<ffffffffafa366f0>] arch_cpu_idle+0x20/0xc0 [112392.245487] [<ffffffffafafc3ba>] cpu_startup_entry+0x14a/0x1e0 [112392.246463] [<ffffffffb014feb7>] rest_init+0x77/0x80 [112392.247318] [<ffffffffb07861c6>] start_kernel+0x44b/0x46c [112392.248218] [<ffffffffb0785b7b>] ? repair_env_string+0x5c/0x5c [112392.249183] [<ffffffffb0785120>] ? early_idt_handler_array+0x120/0x120 [112392.250250] [<ffffffffb078572f>] x86_64_start_reservations+0x24/0x26 [112392.251295] [<ffffffffb0785885>] x86_64_start_kernel+0x154/0x177 [112392.252324] [<ffffffffafa000d5>] start_cpu+0x5/0x14 [112392.253143] Code: Bad RIP value. [112392.253787] RIP [<ffffffc0acacf0ff>] 0xffffffc0acacf0ff [112392.254730] RSP <ffff8c913fc03eb8> [112392.255323] CR2: ffffffc0acacf0ff
The last thing seen in the client test_log is
== parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client == 17:20:35 (1542993635) Running /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
Unfortunately, there’s not much else in the console logs.
There are similar crashes with similar call traces, but have ll_dir_get_parent_fid() errors before the crash; https://testing.whamcloud.com/test_sets/5a91cd96-e21f-11e8-b67f-52540065bddc . So, it’s not clear is this is the same issue or not.
Attachments
Issue Links
- is related to
-
LU-8584 parallel-scale-nfsv3 test_racer_on_nfs: BUG: unable to handle kernel paging request
- Resolved
-
LU-11222 parallel-scale-nfsv3 test racer_on_nfs crashes with ‘BUG: unable to handle kernel paging request at ffffffc09d0c20ff’
- Resolved
-
LU-11766 parallel-scale-nfsv3 test racer_on_nfs crash MDS
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...