Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11718

parallel-scale-nfsv3 test racer_on_nfs crashes with ‘BUG: unable to handle kernel paging request’

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.10.6, Lustre 2.10.7, Lustre 2.12.2, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6
    • None
    • RHEL 7.6 servers and RHEL 6.10 clients
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv3 test_racer_on_nfs crashes. Looking at the logs at https://testing.whamcloud.com/test_sets/0093481e-ef54-11e8-815b-52540065bddc , we see the following in the kernel-crash log

      [112389.058995] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client == 17:20:35 \(1542993635\)
      [112389.244554] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client == 17:20:35 (1542993635)
      [112392.193506] BUG: unable to handle kernel paging request at ffffffc0acacf0ff
      [112392.195074] IP: [<ffffffc0acacf0ff>] 0xffffffc0acacf0ff
      [112392.196184] PGD 52a14067 PUD 0 
      [112392.196979] Oops: 0010 [#1] SMP 
      [112392.197736] Modules linked in: nfsd nfs_acl osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc parport i2c_piix4 ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul
      [112392.214016]  crct10dif_common crc32c_intel serio_raw floppy 8139too ata_piix libata virtio_pci virtio_ring 8139cp mii virtio [last unloaded: lnet_selftest]
      [112392.216970] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.el7_lustre.x86_64 #1
      [112392.219149] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [112392.220252] task: ffffffffb0618480 ti: ffffffffb0600000 task.ti: ffffffffb0600000
      [112392.221596] RIP: 0010:[<ffffffc0acacf0ff>]  [<ffffffc0acacf0ff>] 0xffffffc0acacf0ff
      [112392.222878] RSP: 0018:ffff8c913fc03eb8  EFLAGS: 00010286
      [112392.223743] RAX: ffffffc0acacf0ff RBX: ffffffffb06784c0 RCX: 0000000009fdbc69
      [112392.224907] RDX: ffff8c911d664827 RSI: fffff79281759900 RDI: ffff8c911d664827
      [112392.226060] RBP: ffff8c913fc03f10 R08: 000000000001f0a0 R09: ffffffffafb5498c
      [112392.227239] R10: ffff8c913fc1f0a0 R11: fffff79281e4a1c0 R12: 000000000000000a
      [112392.228383] R13: 0000000000000013 R14: ff8c9113342e28ff R15: ffff8c913fc162c0
      [112392.229533] FS:  0000000000000000(0000) GS:ffff8c913fc00000(0000) knlGS:0000000000000000
      [112392.230832] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [112392.231763] CR2: ffffffc0acacf0ff CR3: 0000000077d72000 CR4: 00000000000606f0
      [112392.232922] Call Trace:
      [112392.233358]  <IRQ> 
      [112392.233740]  [<ffffffffafb54940>] ? rcu_process_callbacks+0x1e0/0x580
      [112392.234872]  [<ffffffffafaa0f05>] __do_softirq+0xf5/0x280
      [112392.235779]  [<ffffffffb017832c>] call_softirq+0x1c/0x30
      [112392.236658]  [<ffffffffafa2e675>] do_softirq+0x65/0xa0
      [112392.237526]  [<ffffffffafaa1285>] irq_exit+0x105/0x110
      [112392.238384]  [<ffffffffb01796c8>] smp_apic_timer_interrupt+0x48/0x60
      [112392.239423]  [<ffffffffb0175df2>] apic_timer_interrupt+0x162/0x170
      [112392.240426]  <EOI> 
      [112392.240783]  [<ffffffffafadafb0>] ? switched_to_idle+0x10/0x10
      [112392.241802]  [<ffffffffb0169a20>] ? __cpuidle_text_start+0x8/0x8
      [112392.242778]  [<ffffffffb0169c26>] ? native_safe_halt+0x6/0x10
      [112392.243706]  [<ffffffffb0169a3e>] default_idle+0x1e/0xc0
      [112392.244588]  [<ffffffffafa366f0>] arch_cpu_idle+0x20/0xc0
      [112392.245487]  [<ffffffffafafc3ba>] cpu_startup_entry+0x14a/0x1e0
      [112392.246463]  [<ffffffffb014feb7>] rest_init+0x77/0x80
      [112392.247318]  [<ffffffffb07861c6>] start_kernel+0x44b/0x46c
      [112392.248218]  [<ffffffffb0785b7b>] ? repair_env_string+0x5c/0x5c
      [112392.249183]  [<ffffffffb0785120>] ? early_idt_handler_array+0x120/0x120
      [112392.250250]  [<ffffffffb078572f>] x86_64_start_reservations+0x24/0x26
      [112392.251295]  [<ffffffffb0785885>] x86_64_start_kernel+0x154/0x177
      [112392.252324]  [<ffffffffafa000d5>] start_cpu+0x5/0x14
      [112392.253143] Code:  Bad RIP value.
      [112392.253787] RIP  [<ffffffc0acacf0ff>] 0xffffffc0acacf0ff
      [112392.254730]  RSP <ffff8c913fc03eb8>
      [112392.255323] CR2: ffffffc0acacf0ff
      

      The last thing seen in the client test_log is

      == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client == 17:20:35 (1542993635)
      Running /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds. CTRL-C to exit
      

      Unfortunately, there’s not much else in the console logs.

      There are similar crashes with similar call traces, but have ll_dir_get_parent_fid() errors before the crash; https://testing.whamcloud.com/test_sets/5a91cd96-e21f-11e8-b67f-52540065bddc . So, it’s not clear is this is the same issue or not.

      Attachments

        Issue Links

          Activity

            [LU-11718] parallel-scale-nfsv3 test racer_on_nfs crashes with ‘BUG: unable to handle kernel paging request’

            This is as much an NFS issue as it might be Lustre, so we do not plan to test or debug NFSv3 racer issues at this point.

            adilger Andreas Dilger added a comment - This is as much an NFS issue as it might be Lustre, so we do not plan to test or debug NFSv3 racer issues at this point.

            Similar crash with different call trace at https://testing.whamcloud.com/test_sets/c442450a-f785-11e9-b62b-52540065bddc

            [50466.671214] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 22:10:12 (1572041412)
            [50469.784819] LustreError: 21031:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x314d:0x0] get parent: rc = -2
            [50470.296114] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x314d:0x0] get parent: rc = -2
            [50470.297950] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 6 previous similar messages
            [50471.306730] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x314d:0x0] get parent: rc = -2
            [50471.308524] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 32 previous similar messages
            [50478.476518] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x31c9:0x0] get parent: rc = -2
            [50478.479936] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 11 previous similar messages
            [50484.152611] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x3291:0x0] get parent: rc = -2
            [50484.156389] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 2 previous similar messages
            [50492.474909] LustreError: 21030:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x32fa:0x0] get parent: rc = -2
            [50492.477103] LustreError: 21030:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 5 previous similar messages
            [50516.866079] BUG: unable to handle kernel paging request at ffffffc096309000
            [50516.867324] IP: [<ffffffc096309000>] 0xffffffc096309000
            [50516.868095] PGD 4d414067 PUD 0 
            [50516.868571] Oops: 0010 [#1] SMP 
            [50516.869065] Modules linked in: nfsd nfs_acl lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ppdev ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc i2c_piix4 parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix
            [50516.880168]  8139too crct10dif_pclmul crct10dif_common libata crc32c_intel virtio_pci virtio_ring serio_raw 8139cp virtio mii floppy [last unloaded: dm_flakey]
            [50516.882159] CPU: 1 PID: 21036 Comm: nfsd Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.1.1.el7_lustre.x86_64 #1
            [50516.883670] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [50516.884419] task: ffff9e3f24d61070 ti: ffff9e3f06b98000 task.ti: ffff9e3f06b98000
            [50516.885387] RIP: 0010:[<ffffffc096309000>]  [<ffffffc096309000>] 0xffffffc096309000
            [50516.886399] RSP: 0018:ffff9e3f3fd03eb8  EFLAGS: 00010292
            [50516.887093] RAX: ffffffc096309000 RBX: ffffffff86278b00 RCX: 0000000006730eca
            [50516.888005] RDX: ffff9e3edb99491f RSI: fffff676411ed680 RDI: ffff9e3edb99491f
            [50516.888938] RBP: ffff9e3f3fd03f10 R08: 000000000001f0a0 R09: ffffffff85758954
            [50516.889849] R10: ffff9e3f3fd1f0a0 R11: fffff676406ee1c0 R12: 000000000000000a
            [50516.890772] R13: 0000000000000004 R14: 00000000000000ff R15: ffff9e3f3fd162e0
            [50516.891693] FS:  0000000000000000(0000) GS:ffff9e3f3fd00000(0000) knlGS:0000000000000000
            [50516.892733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [50516.893482] CR2: ffffffc096309000 CR3: 000000007a190000 CR4: 00000000000606e0
            [50516.894422] Call Trace:
            [50516.894761]  <IRQ> 
            [50516.895062]  [<ffffffff85758908>] ? rcu_process_callbacks+0x1d8/0x570
            [50516.895957]  [<ffffffff856a41e5>] __do_softirq+0xf5/0x280
            [50516.896672]  [<ffffffff85d9042c>] call_softirq+0x1c/0x30
            [50516.897376]  [<ffffffff8562f675>] do_softirq+0x65/0xa0
            [50516.898081]  [<ffffffff856a4565>] irq_exit+0x105/0x110
            [50516.898752]  [<ffffffff85d917f8>] smp_apic_timer_interrupt+0x48/0x60
            [50516.899770]  [<ffffffff85d8defa>] apic_timer_interrupt+0x16a/0x170
            [50516.901136]  <EOI> 
            [50516.901527]  [<ffffffff8598b409>] ? number.isra.2+0x269/0x360
            [50516.902940]  [<ffffffffc1398bbb>] ? osc_lru_alloc+0x3b/0x3a0 [osc]
            [50516.904288]  [<ffffffff8598c849>] pointer.isra.19+0x1c9/0x4d0
            [50516.905540]  [<ffffffff8598cd84>] ? vsnprintf+0x234/0x6a0
            [50516.906811]  [<ffffffff8598ccda>] vsnprintf+0x18a/0x6a0
            [50516.907638]  [<ffffffffc0982282>] lu_cdebug_printer+0xb2/0x160 [obdclass]
            [50516.908572]  [<ffffffffc09821d0>] ? lu_cache_shrink+0x2d0/0x2d0 [obdclass]
            [50516.909502]  [<ffffffffc098bc22>] cl_page_print+0x52/0xd0 [obdclass]
            [50516.910430]  [<ffffffffc147d8cc>] ll_write_end+0x2bc/0x5f0 [lustre]
            [50516.911261]  [<ffffffff857bb944>] generic_file_buffered_write+0x164/0x270
            [50516.912146]  [<ffffffff857be152>] __generic_file_aio_write+0x1e2/0x400
            [50516.913011]  [<ffffffffc148bebb>] __generic_file_write_iter+0xcb/0x340 [lustre]
            [50516.913984]  [<ffffffffc14908f6>] vvp_io_write_start+0x796/0x800 [lustre]
            [50516.914881]  [<ffffffffc0990497>] ? cl_lock_request+0x67/0x1f0 [obdclass]
            [50516.915783]  [<ffffffffc09922b8>] cl_io_start+0x68/0x130 [obdclass]
            [50516.916662]  [<ffffffffc099444c>] cl_io_loop+0xcc/0x1c0 [obdclass]
            [50516.917518]  [<ffffffffc1440b2a>] ll_file_io_generic+0x5da/0xaf0 [lustre]
            [50516.918409]  [<ffffffffc14416dd>] ll_file_aio_write+0x40d/0x6c0 [lustre]
            [50516.919309]  [<ffffffff85847c6b>] do_sync_readv_writev+0x7b/0xd0
            [50516.920095]  [<ffffffff858498ae>] do_readv_writev+0xce/0x260
            [50516.920837]  [<ffffffffc14412d0>] ? ll_file_splice_read+0x290/0x290 [lustre]
            [50516.921758]  [<ffffffffc1441990>] ? ll_file_aio_write+0x6c0/0x6c0 [lustre]
            [50516.922662]  [<ffffffff85927723>] ? ima_get_action+0x23/0x30
            [50516.923407]  [<ffffffff85926c3e>] ? process_measurement+0x8e/0x250
            [50516.924216]  [<ffffffff85845f8a>] ? do_dentry_open+0x24a/0x2c0
            [50516.924980]  [<ffffffff85849ad5>] vfs_writev+0x35/0x60
            [50516.925681]  [<ffffffffc1508e95>] nfsd_vfs_write+0xb5/0x390 [nfsd]
            [50516.926489]  [<ffffffffc1509529>] nfsd_write+0x189/0x1e0 [nfsd]
            [50516.927264]  [<ffffffffc1511a8b>] nfsd3_proc_write+0xfb/0x180 [nfsd]
            [50516.928092]  [<ffffffffc1503810>] nfsd_dispatch+0xe0/0x290 [nfsd]
            [50516.928938]  [<ffffffffc04a2323>] svc_process_common+0x3d3/0x7c0 [sunrpc]
            [50516.929820]  [<ffffffffc04a2813>] svc_process+0x103/0x190 [sunrpc]
            [50516.930636]  [<ffffffffc150316f>] nfsd+0xdf/0x150 [nfsd]
            [50516.931338]  [<ffffffffc1503090>] ? nfsd_destroy+0x80/0x80 [nfsd]
            [50516.932139]  [<ffffffff856c50d1>] kthread+0xd1/0xe0
            [50516.932780]  [<ffffffff856c5000>] ? insert_kthread_work+0x40/0x40
            [50516.933577]  [<ffffffff85d8cd37>] ret_from_fork_nospec_begin+0x21/0x21
            [50516.934428]  [<ffffffff856c5000>] ? insert_kthread_work+0x40/0x40
            [50516.935217] Code:  Bad RIP value.
            [50516.935719] RIP  [<ffffffc096309000>] 0xffffffc096309000
            [50516.936456]  RSP <ffff9e3f3fd03eb8>
            [50516.936923] CR2: ffffffc096309000
            
            jamesanunez James Nunez (Inactive) added a comment - Similar crash with different call trace at https://testing.whamcloud.com/test_sets/c442450a-f785-11e9-b62b-52540065bddc [50466.671214] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 22:10:12 (1572041412) [50469.784819] LustreError: 21031:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x314d:0x0] get parent: rc = -2 [50470.296114] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x314d:0x0] get parent: rc = -2 [50470.297950] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 6 previous similar messages [50471.306730] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x314d:0x0] get parent: rc = -2 [50471.308524] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 32 previous similar messages [50478.476518] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x31c9:0x0] get parent: rc = -2 [50478.479936] LustreError: 21035:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 11 previous similar messages [50484.152611] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x3291:0x0] get parent: rc = -2 [50484.156389] LustreError: 21033:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 2 previous similar messages [50492.474909] LustreError: 21030:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) lustre: failure inode [0x200075703:0x32fa:0x0] get parent: rc = -2 [50492.477103] LustreError: 21030:0:(llite_nfs.c:348:ll_dir_get_parent_fid()) Skipped 5 previous similar messages [50516.866079] BUG: unable to handle kernel paging request at ffffffc096309000 [50516.867324] IP: [<ffffffc096309000>] 0xffffffc096309000 [50516.868095] PGD 4d414067 PUD 0 [50516.868571] Oops: 0010 [#1] SMP [50516.869065] Modules linked in: nfsd nfs_acl lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) dm_flakey osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod iosf_mbi crc32_pclmul ppdev ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr virtio_balloon parport_pc i2c_piix4 parport ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk ata_piix [50516.880168] 8139too crct10dif_pclmul crct10dif_common libata crc32c_intel virtio_pci virtio_ring serio_raw 8139cp virtio mii floppy [last unloaded: dm_flakey] [50516.882159] CPU: 1 PID: 21036 Comm: nfsd Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.1.1.el7_lustre.x86_64 #1 [50516.883670] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [50516.884419] task: ffff9e3f24d61070 ti: ffff9e3f06b98000 task.ti: ffff9e3f06b98000 [50516.885387] RIP: 0010:[<ffffffc096309000>] [<ffffffc096309000>] 0xffffffc096309000 [50516.886399] RSP: 0018:ffff9e3f3fd03eb8 EFLAGS: 00010292 [50516.887093] RAX: ffffffc096309000 RBX: ffffffff86278b00 RCX: 0000000006730eca [50516.888005] RDX: ffff9e3edb99491f RSI: fffff676411ed680 RDI: ffff9e3edb99491f [50516.888938] RBP: ffff9e3f3fd03f10 R08: 000000000001f0a0 R09: ffffffff85758954 [50516.889849] R10: ffff9e3f3fd1f0a0 R11: fffff676406ee1c0 R12: 000000000000000a [50516.890772] R13: 0000000000000004 R14: 00000000000000ff R15: ffff9e3f3fd162e0 [50516.891693] FS: 0000000000000000(0000) GS:ffff9e3f3fd00000(0000) knlGS:0000000000000000 [50516.892733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [50516.893482] CR2: ffffffc096309000 CR3: 000000007a190000 CR4: 00000000000606e0 [50516.894422] Call Trace: [50516.894761] <IRQ> [50516.895062] [<ffffffff85758908>] ? rcu_process_callbacks+0x1d8/0x570 [50516.895957] [<ffffffff856a41e5>] __do_softirq+0xf5/0x280 [50516.896672] [<ffffffff85d9042c>] call_softirq+0x1c/0x30 [50516.897376] [<ffffffff8562f675>] do_softirq+0x65/0xa0 [50516.898081] [<ffffffff856a4565>] irq_exit+0x105/0x110 [50516.898752] [<ffffffff85d917f8>] smp_apic_timer_interrupt+0x48/0x60 [50516.899770] [<ffffffff85d8defa>] apic_timer_interrupt+0x16a/0x170 [50516.901136] <EOI> [50516.901527] [<ffffffff8598b409>] ? number.isra.2+0x269/0x360 [50516.902940] [<ffffffffc1398bbb>] ? osc_lru_alloc+0x3b/0x3a0 [osc] [50516.904288] [<ffffffff8598c849>] pointer.isra.19+0x1c9/0x4d0 [50516.905540] [<ffffffff8598cd84>] ? vsnprintf+0x234/0x6a0 [50516.906811] [<ffffffff8598ccda>] vsnprintf+0x18a/0x6a0 [50516.907638] [<ffffffffc0982282>] lu_cdebug_printer+0xb2/0x160 [obdclass] [50516.908572] [<ffffffffc09821d0>] ? lu_cache_shrink+0x2d0/0x2d0 [obdclass] [50516.909502] [<ffffffffc098bc22>] cl_page_print+0x52/0xd0 [obdclass] [50516.910430] [<ffffffffc147d8cc>] ll_write_end+0x2bc/0x5f0 [lustre] [50516.911261] [<ffffffff857bb944>] generic_file_buffered_write+0x164/0x270 [50516.912146] [<ffffffff857be152>] __generic_file_aio_write+0x1e2/0x400 [50516.913011] [<ffffffffc148bebb>] __generic_file_write_iter+0xcb/0x340 [lustre] [50516.913984] [<ffffffffc14908f6>] vvp_io_write_start+0x796/0x800 [lustre] [50516.914881] [<ffffffffc0990497>] ? cl_lock_request+0x67/0x1f0 [obdclass] [50516.915783] [<ffffffffc09922b8>] cl_io_start+0x68/0x130 [obdclass] [50516.916662] [<ffffffffc099444c>] cl_io_loop+0xcc/0x1c0 [obdclass] [50516.917518] [<ffffffffc1440b2a>] ll_file_io_generic+0x5da/0xaf0 [lustre] [50516.918409] [<ffffffffc14416dd>] ll_file_aio_write+0x40d/0x6c0 [lustre] [50516.919309] [<ffffffff85847c6b>] do_sync_readv_writev+0x7b/0xd0 [50516.920095] [<ffffffff858498ae>] do_readv_writev+0xce/0x260 [50516.920837] [<ffffffffc14412d0>] ? ll_file_splice_read+0x290/0x290 [lustre] [50516.921758] [<ffffffffc1441990>] ? ll_file_aio_write+0x6c0/0x6c0 [lustre] [50516.922662] [<ffffffff85927723>] ? ima_get_action+0x23/0x30 [50516.923407] [<ffffffff85926c3e>] ? process_measurement+0x8e/0x250 [50516.924216] [<ffffffff85845f8a>] ? do_dentry_open+0x24a/0x2c0 [50516.924980] [<ffffffff85849ad5>] vfs_writev+0x35/0x60 [50516.925681] [<ffffffffc1508e95>] nfsd_vfs_write+0xb5/0x390 [nfsd] [50516.926489] [<ffffffffc1509529>] nfsd_write+0x189/0x1e0 [nfsd] [50516.927264] [<ffffffffc1511a8b>] nfsd3_proc_write+0xfb/0x180 [nfsd] [50516.928092] [<ffffffffc1503810>] nfsd_dispatch+0xe0/0x290 [nfsd] [50516.928938] [<ffffffffc04a2323>] svc_process_common+0x3d3/0x7c0 [sunrpc] [50516.929820] [<ffffffffc04a2813>] svc_process+0x103/0x190 [sunrpc] [50516.930636] [<ffffffffc150316f>] nfsd+0xdf/0x150 [nfsd] [50516.931338] [<ffffffffc1503090>] ? nfsd_destroy+0x80/0x80 [nfsd] [50516.932139] [<ffffffff856c50d1>] kthread+0xd1/0xe0 [50516.932780] [<ffffffff856c5000>] ? insert_kthread_work+0x40/0x40 [50516.933577] [<ffffffff85d8cd37>] ret_from_fork_nospec_begin+0x21/0x21 [50516.934428] [<ffffffff856c5000>] ? insert_kthread_work+0x40/0x40 [50516.935217] Code: Bad RIP value. [50516.935719] RIP [<ffffffc096309000>] 0xffffffc096309000 [50516.936456] RSP <ffff9e3f3fd03eb8> [50516.936923] CR2: ffffffc096309000

            Similar issue at https://testing.whamcloud.com/test_sets/8848c9b8-7939-11e9-af1f-52540065bddc with the following in the crash log

            [104670.957517] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 05:36:42 (1558157802)
            [104675.485217] LustreError: 10159:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3135:0x0] get parent: rc = -2
            [104685.400677] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3139:0x0] get parent: rc = -2
            [104685.403322] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message
            [104693.625389] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x31d8:0x0] get parent: rc = -2
            [104693.627916] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message
            [104700.788764] LustreError: 10155:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3217:0x0] get parent: rc = -2
            [104700.792043] LustreError: 10155:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 8 previous similar messages
            [104721.931000] LustreError: 10156:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x318d:0x0] get parent: rc = -2
            [104733.470104] LustreError: 10152:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x31e7:0x0] get parent: rc = -2
            [104733.473052] LustreError: 10152:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 5 previous similar messages
            [104751.466331] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3314:0x0] get parent: rc = -2
            [104751.468955] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 7 previous similar messages
            [104790.875773] LustreError: 10158:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x33d4:0x0] get parent: rc = -2
            [104790.878380] LustreError: 10158:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 12 previous similar messages
            [104857.119682] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3440:0x0] get parent: rc = -2
            [104857.122268] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 34 previous similar messages
            [104878.860396] BUG: unable to handle kernel paging request at ffffffc0aa65a0ff
            [104878.861895] IP: [<ffffffc0aa65a0ff>] 0xffffffc0aa65a0ff
            [104878.862964] PGD 42014067 PUD 0 
            [104878.863583] Oops: 0010 [#1] SMP 
            [104878.864223] Modules linked in: nfsd nfs_acl lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod zunicode(POE) zavl(POE) icp(POE) ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) joydev spl(OE) parport_pc parport virtio_balloon pcspkr i2c_piix4
            [104878.878232]  ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix 8139too libata virtio_pci 8139cp virtio_ring virtio mii [last unloaded: obdecho]
            [104878.881930] CPU: 0 PID: 30227 Comm: ll_agl_10159 Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.10.1.el7_lustre.x86_64 #1
            [104878.883922] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [104878.884861] task: ffff96933ab04100 ti: ffff969339a80000 task.ti: ffff969339a80000
            [104878.886074] RIP: 0010:[<ffffffc0aa65a0ff>]  [<ffffffc0aa65a0ff>] 0xffffffc0aa65a0ff
            [104878.887365] RSP: 0018:ffff96933fc03e98  EFLAGS: 00010286
            [104878.888253] RAX: ffffffc0aa65a0ff RBX: ffffffff990784c0 RCX: 000000000001f0a0
            [104878.889417] RDX: ffff9692e041d6a7 RSI: 0000000000000006 RDI: ffff9692e041d6a7
            [104878.890577] RBP: ffff96933fc03ef0 R08: ffff9692fbe4afc0 R09: 000000018040003f
            [104878.891747] R10: 0000000000000001 R11: ffffde0cc0ef9280 R12: 000000000000000a
            [104878.892909] R13: 0000000000000007 R14: ff9692e041dd68ff R15: ffff96933fc162c0
            [104878.894074] FS:  0000000000000000(0000) GS:ffff96933fc00000(0000) knlGS:0000000000000000
            [104878.895376] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [104878.896317] CR2: ffffffc0aa65a0ff CR3: 000000007b070000 CR4: 00000000000606f0
            [104878.897479] Call Trace:
            [104878.897911]  <IRQ> 
            [104878.898257]  [<ffffffff98554c70>] ? rcu_process_callbacks+0x1e0/0x580
            [104878.899377]  [<ffffffff984a0f45>] __do_softirq+0xf5/0x280
            [104878.900261]  [<ffffffff98b7932c>] call_softirq+0x1c/0x30
            [104878.901140]  [<ffffffff9842e675>] do_softirq+0x65/0xa0
            [104878.902000]  [<ffffffff984a12c5>] irq_exit+0x105/0x110
            [104878.902850]  [<ffffffff98b7a5e6>] do_IRQ+0x56/0xf0
            [104878.903651]  [<ffffffff98b6c362>] common_interrupt+0x162/0x162
            [104878.904607]  <EOI> 
            [104878.904955]  [<ffffffff984d08c4>] ? finish_task_switch+0x54/0x1c0
            [104878.906012]  [<ffffffff984d08c0>] ? finish_task_switch+0x50/0x1c0
            [104878.907008]  [<ffffffff98b6878f>] __schedule+0x3ff/0x890
            [104878.907898]  [<ffffffff9865f6fb>] ? iput+0x3b/0x190
            [104878.908711]  [<ffffffff98b68c49>] schedule+0x29/0x70
            [104878.909553]  [<ffffffffc12752ce>] ll_agl_thread+0x2de/0x4e0 [lustre]
            [104878.910605]  [<ffffffff984d67f0>] ? wake_up_state+0x20/0x20
            [104878.911542]  [<ffffffffc1274ff0>] ? ll_agl_trigger+0x520/0x520 [lustre]
            [104878.912636]  [<ffffffff984c1c71>] kthread+0xd1/0xe0
            [104878.913434]  [<ffffffff984c1ba0>] ? insert_kthread_work+0x40/0x40
            [104878.914437]  [<ffffffff98b75c37>] ret_from_fork_nospec_begin+0x21/0x21
            [104878.915506]  [<ffffffff984c1ba0>] ? insert_kthread_work+0x40/0x40
            [104878.916504] Code:  Bad RIP value.
            [104878.917144] RIP  [<ffffffc0aa65a0ff>] 0xffffffc0aa65a0ff
            [104878.918092]  RSP <ffff96933fc03e98>
            [104878.918689] CR2: ffffffc0aa65a0ff
            
            jamesanunez James Nunez (Inactive) added a comment - Similar issue at https://testing.whamcloud.com/test_sets/8848c9b8-7939-11e9-af1f-52540065bddc with the following in the crash log [104670.957517] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 05:36:42 (1558157802) [104675.485217] LustreError: 10159:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3135:0x0] get parent: rc = -2 [104685.400677] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3139:0x0] get parent: rc = -2 [104685.403322] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message [104693.625389] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x31d8:0x0] get parent: rc = -2 [104693.627916] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message [104700.788764] LustreError: 10155:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3217:0x0] get parent: rc = -2 [104700.792043] LustreError: 10155:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 8 previous similar messages [104721.931000] LustreError: 10156:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x318d:0x0] get parent: rc = -2 [104733.470104] LustreError: 10152:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x31e7:0x0] get parent: rc = -2 [104733.473052] LustreError: 10152:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 5 previous similar messages [104751.466331] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3314:0x0] get parent: rc = -2 [104751.468955] LustreError: 10157:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 7 previous similar messages [104790.875773] LustreError: 10158:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x33d4:0x0] get parent: rc = -2 [104790.878380] LustreError: 10158:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 12 previous similar messages [104857.119682] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x20006e1d3:0x3440:0x0] get parent: rc = -2 [104857.122268] LustreError: 10153:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 34 previous similar messages [104878.860396] BUG: unable to handle kernel paging request at ffffffc0aa65a0ff [104878.861895] IP: [<ffffffc0aa65a0ff>] 0xffffffc0aa65a0ff [104878.862964] PGD 42014067 PUD 0 [104878.863583] Oops: 0010 [#1] SMP [104878.864223] Modules linked in: nfsd nfs_acl lustre(OE) lmv(OE) mdc(OE) osc(OE) lov(OE) ofd(OE) ost(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) zfs(POE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core sunrpc dm_mod zunicode(POE) zavl(POE) icp(POE) ppdev iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) joydev spl(OE) parport_pc parport virtio_balloon pcspkr i2c_piix4 [104878.878232] ip_tables ext4 mbcache jbd2 virtio_blk ata_generic pata_acpi crct10dif_pclmul crct10dif_common crc32c_intel serio_raw floppy ata_piix 8139too libata virtio_pci 8139cp virtio_ring virtio mii [last unloaded: obdecho] [104878.881930] CPU: 0 PID: 30227 Comm: ll_agl_10159 Kdump: loaded Tainted: P OE ------------ 3.10.0-957.10.1.el7_lustre.x86_64 #1 [104878.883922] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [104878.884861] task: ffff96933ab04100 ti: ffff969339a80000 task.ti: ffff969339a80000 [104878.886074] RIP: 0010:[<ffffffc0aa65a0ff>] [<ffffffc0aa65a0ff>] 0xffffffc0aa65a0ff [104878.887365] RSP: 0018:ffff96933fc03e98 EFLAGS: 00010286 [104878.888253] RAX: ffffffc0aa65a0ff RBX: ffffffff990784c0 RCX: 000000000001f0a0 [104878.889417] RDX: ffff9692e041d6a7 RSI: 0000000000000006 RDI: ffff9692e041d6a7 [104878.890577] RBP: ffff96933fc03ef0 R08: ffff9692fbe4afc0 R09: 000000018040003f [104878.891747] R10: 0000000000000001 R11: ffffde0cc0ef9280 R12: 000000000000000a [104878.892909] R13: 0000000000000007 R14: ff9692e041dd68ff R15: ffff96933fc162c0 [104878.894074] FS: 0000000000000000(0000) GS:ffff96933fc00000(0000) knlGS:0000000000000000 [104878.895376] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [104878.896317] CR2: ffffffc0aa65a0ff CR3: 000000007b070000 CR4: 00000000000606f0 [104878.897479] Call Trace: [104878.897911] <IRQ> [104878.898257] [<ffffffff98554c70>] ? rcu_process_callbacks+0x1e0/0x580 [104878.899377] [<ffffffff984a0f45>] __do_softirq+0xf5/0x280 [104878.900261] [<ffffffff98b7932c>] call_softirq+0x1c/0x30 [104878.901140] [<ffffffff9842e675>] do_softirq+0x65/0xa0 [104878.902000] [<ffffffff984a12c5>] irq_exit+0x105/0x110 [104878.902850] [<ffffffff98b7a5e6>] do_IRQ+0x56/0xf0 [104878.903651] [<ffffffff98b6c362>] common_interrupt+0x162/0x162 [104878.904607] <EOI> [104878.904955] [<ffffffff984d08c4>] ? finish_task_switch+0x54/0x1c0 [104878.906012] [<ffffffff984d08c0>] ? finish_task_switch+0x50/0x1c0 [104878.907008] [<ffffffff98b6878f>] __schedule+0x3ff/0x890 [104878.907898] [<ffffffff9865f6fb>] ? iput+0x3b/0x190 [104878.908711] [<ffffffff98b68c49>] schedule+0x29/0x70 [104878.909553] [<ffffffffc12752ce>] ll_agl_thread+0x2de/0x4e0 [lustre] [104878.910605] [<ffffffff984d67f0>] ? wake_up_state+0x20/0x20 [104878.911542] [<ffffffffc1274ff0>] ? ll_agl_trigger+0x520/0x520 [lustre] [104878.912636] [<ffffffff984c1c71>] kthread+0xd1/0xe0 [104878.913434] [<ffffffff984c1ba0>] ? insert_kthread_work+0x40/0x40 [104878.914437] [<ffffffff98b75c37>] ret_from_fork_nospec_begin+0x21/0x21 [104878.915506] [<ffffffff984c1ba0>] ? insert_kthread_work+0x40/0x40 [104878.916504] Code: Bad RIP value. [104878.917144] RIP [<ffffffc0aa65a0ff>] 0xffffffc0aa65a0ff [104878.918092] RSP <ffff96933fc03e98> [104878.918689] CR2: ffffffc0aa65a0ff

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: