Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8584

parallel-scale-nfsv3 test_racer_on_nfs: BUG: unable to handle kernel paging request

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0, Lustre 2.10.2, Lustre 2.10.4, Lustre 2.12.1
    • None
    • Interop: 2.8.0 EL7.2 Server/EL7.2 Client
      Server: b2_8, build# 12
      Client: master, build# 3431
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e91156c4-7276-11e6-8afd-5254006e85c2.

      The sub-test test_racer_on_nfs failed with the following error:

      test failed to respond and timed out
      

      test log:

      ./file_exec.sh: line 13: 23336 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13: 25007 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13: 26378 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13: 31986 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13:  1896 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13:  1981 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13:  2975 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      ./file_exec.sh: line 13:  6421 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
      racer cleanup
      sleeping 5 sec ...
      file_create.sh: no process found
      dir_create.sh: no process found
      file_rm.sh: no process found
      file_rename.sh: no process found
      file_link.sh: no process found
      file_symlink.sh: no process found
      file_list.sh: no process found
      file_concat.sh: no process found
      file_exec.sh: no process found
      file_chown.sh: no process found
      file_chmod.sh: no process found
      file_mknod.sh: no process found
      file_truncate.sh: no process found
      file_delxattr.sh: no process found
      file_getxattr.sh: no process found
      file_setxattr.sh: no process found
      there should be NO racer processes:
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      

      Client dmesg:

      df              D ffff88005170fa80     0  6802  12502 0x00000080
      [ 7070.189011]  ffff88005170fa08 0000000000000082 ffff8800799c5c00 ffff88005170ffd8
      [ 7070.189011]  ffff88005170ffd8 ffff88005170ffd8 ffff8800799c5c00 ffff8800799c5c00
      [ 7070.189011]  ffff88007ff66c58 0000000000000082 ffffffffa0225d20 ffff88005170fa80
      [ 7070.189011] Call Trace:
      [ 7070.189011]  [<ffffffffa0225d20>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
      [ 7070.189011]  [<ffffffff8163bc59>] schedule+0x29/0x70
      [ 7070.189011]  [<ffffffffa0225d55>] rpc_wait_bit_killable+0x35/0x80 [sunrpc]
      [ 7070.189011]  [<ffffffff81639ad0>] __wait_on_bit+0x60/0x90
      [ 7070.189011]  [<ffffffffa0225d20>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
      [ 7070.189011]  [<ffffffff81639b87>] out_of_line_wait_on_bit+0x87/0xb0
      [ 7070.189011]  [<ffffffff810a6ba0>] ? wake_atomic_t_function+0x40/0x40
      [ 7070.189011]  [<ffffffffa021cfe0>] ? call_bind_status+0x290/0x290 [sunrpc]
      [ 7070.189011]  [<ffffffffa021cfe0>] ? call_bind_status+0x290/0x290 [sunrpc]
      [ 7070.189011]  [<ffffffffa0226c44>] __rpc_execute+0x154/0x420 [sunrpc]
      [ 7070.189011]  [<ffffffff810a6ae5>] ? wake_up_bit+0x25/0x30
      [ 7070.189011]  [<ffffffffa022877e>] rpc_execute+0x5e/0xa0 [sunrpc]
      [ 7070.189011]  [<ffffffffa021e240>] rpc_run_task+0x70/0x90 [sunrpc]
      [ 7070.189011]  [<ffffffffa021e2b0>] rpc_call_sync+0x50/0xc0 [sunrpc]
      [ 7070.189011]  [<ffffffffa06bf5cb>] nfs3_rpc_wrapper.constprop.10+0x6b/0xb0 [nfsv3]
      [ 7070.189011]  [<ffffffffa06c02a6>] nfs3_proc_getattr+0x56/0xb0 [nfsv3]
      [ 7070.189011]  [<ffffffffa0514f7f>] __nfs_revalidate_inode+0xbf/0x310 [nfs]
      [ 7070.189011]  [<ffffffffa0515722>] nfs_revalidate_inode+0x22/0x60 [nfs]
      [ 7070.189011]  [<ffffffff811e9bdc>] ? lookup_fast+0xcc/0x2e0
      [ 7070.189011]  [<ffffffffa050cc5b>] nfs_weak_revalidate+0x4b/0xf0 [nfs]
      [ 7070.189011]  [<ffffffff811e9257>] complete_walk+0x87/0xe0
      [ 7070.189011]  [<ffffffff811ec21f>] do_last+0x46f/0x1270
      [ 7070.189011]  [<ffffffff812f96f2>] ? radix_tree_lookup_slot+0x22/0x50
      [ 7070.189011]  [<ffffffff811eed02>] path_openat+0xc2/0x490
      [ 7070.189011]  [<ffffffff811f04cb>] do_filp_open+0x4b/0xb0
      [ 7070.189011]  [<ffffffff811fd057>] ? __alloc_fd+0xa7/0x130
      [ 7070.189011]  [<ffffffff811dde73>] do_sys_open+0xf3/0x1f0
      [ 7070.189011]  [<ffffffff811ddf8e>] SyS_open+0x1e/0x20
      [ 7070.189011]  [<ffffffff81646c49>] system_call_fastpath+0x16/0x1b
      

      MDS console:

      20:40:23:[ 3465.447651] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 20:35:50 (1472960150)
      20:40:23:[ 3499.433180] LustreError: 10949:0:(mdt_handler.c:893:mdt_getattr_internal()) lustre-MDT0000: getattr error for [0x200000bd2:0x1800:0x0]: rc = -2
      20:40:23:[ 3499.438171] LustreError: 10949:0:(mdt_handler.c:893:mdt_getattr_internal()) Skipped 1 previous similar message
      20:40:23:[ 3735.618177] BUG: unable to handle kernel paging request at ffffffa07d7040ff
      20:40:23:[ 3735.619144] IP: [<ffffffa07d7040ff>] 0xffffffa07d7040fe
      20:40:23:[ 3735.619144] PGD 194d067 PUD 0 
      20:40:23:[ 3735.619144] Oops: 0010 [#1] SMP 
      20:40:23:[ 3735.619144] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic crct10dif_common ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt virtio_blk drm_kms_helper ttm 8139too ata_piix libata drm serio_raw virtio_pci virtio_ring virtio 8139cp mii i2c_core floppy
      20:40:23:[ 3735.619144] CPU: 0 PID: 13 Comm: rcuos/1 Tainted: G           OE  ------------   3.10.0-327.3.1.el7_lustre.x86_64 #1
      20:40:23:[ 3735.619144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      20:40:23:[ 3735.619144] task: ffff88007c058b80 ti: ffff88007c060000 task.ti: ffff88007c060000
      20:40:23:[ 3735.619144] RIP: 0010:[<ffffffa07d7040ff>]  [<ffffffa07d7040ff>] 0xffffffa07d7040fe
      20:40:23:[ 3735.619144] RSP: 0018:ffff88007c063e10  EFLAGS: 00010286
      20:40:23:[ 3735.619144] RAX: ffff88007c063fd8 RBX: ffff880040551227 RCX: 0000000180400035
      20:40:23:[ 3735.619144] RDX: ffffffa07d7040ff RSI: ffffea000188a4c0 RDI: ffff880040551227
      20:40:23:[ 3735.619144] RBP: ffff88007c063ec0 R08: ffff880062293840 R09: 0000000180400035
      20:40:23:[ 3735.619144] R10: ffffea000188a4c0 R11: ffffffff81123f70 R12: 0000000000000001
      20:40:23:[ 3735.619144] R13: ff880008399070ff R14: 0000000000000039 R15: ffff880039ef1750
      20:40:23:[ 3735.619144] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      20:40:23:[ 3735.619144] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      20:40:23:[ 3735.619144] CR2: ffffffa07d7040ff CR3: 00000000364cf000 CR4: 00000000000006f0
      20:40:23:[ 3735.619144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      20:40:23:[ 3735.619144] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      

      This issue was first seen on 08/23/2016

      Attachments

        Issue Links

          Activity

            [LU-8584] parallel-scale-nfsv3 test_racer_on_nfs: BUG: unable to handle kernel paging request

            This is as much an NFS issue as it might be Lustre, so we do not plan to test or debug NFSv3 racer issues at this point.

            adilger Andreas Dilger added a comment - This is as much an NFS issue as it might be Lustre, so we do not plan to test or debug NFSv3 racer issues at this point.
            standan Saurabh Tandan (Inactive) added a comment - +1 in 2.10.3 https://testing.hpdd.intel.com/test_sets/46461c1a-3d4d-11e8-8f8a-52540065bddc
            mdiep Minh Diep added a comment - +1 on 2.10.2 https://testing.hpdd.intel.com/test_sets/096d7c0c-f3f1-11e7-854b-52540065bddc
            sarah Sarah Liu added a comment - - edited

            another one on 2.10.1 , for RHEL7.4 server/client ldiskfs
            https://testing.hpdd.intel.com/test_sets/bfce3ad8-9c2e-11e7-b778-5254006e85c2

            crash dump of MDS node
            /scratch/dumps/trevis-46vm3.trevis.hpdd.intel.com/10.9.6.30-2017-09-18-02:47:50

            [35950.500759] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 02:46:50 \(1505702810\)
            [35950.698782] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 02:46:50 (1505702810)
            [35961.771804] LustreError: 1836:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x31fa:0x0] get parent: rc = -2
            [35976.340533] LustreError: 1837:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x338d:0x0] get parent: rc = -2
            [35976.344929] LustreError: 1837:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message
            [35980.978075] LustreError: 1839:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x32bb:0x0] get parent: rc = -2
            [35992.178172] LustreError: 1839:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x34c9:0x0] get parent: rc = -2
            [35992.182851] LustreError: 1839:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message
            [36001.007252] LustreError: 1835:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x3650:0x0] get parent: rc = -2
            [36001.012130] LustreError: 1835:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 4 previous similar messages
            [36005.130074] BUG: unable to handle kernel paging request at ffffff810b7290ff
            [36005.131016] IP: [<ffffff810b7290ff>] 0xffffff810b7290fe
            [36005.131016] PGD 19f5067 PUD 0 
            [36005.131016] Oops: 0010 [#1] SMP 
            [36005.131016] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon joydev i2c_piix4 parport_pc nfsd parport nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper
            [36005.131016]  syscopyarea sysfillrect sysimgblt virtio_blk fb_sys_fops ata_piix ttm crct10dif_pclmul crct10dif_common drm crc32c_intel libata 8139too serio_raw virtio_pci virtio_ring floppy virtio 8139cp i2c_core mii [last unloaded: lnet_selftest]
            [36005.131016] CPU: 1 PID: 13 Comm: ksoftirqd/1 Tainted: G           OE  ------------   3.10.0-693.2.2.el7_lustre.x86_64 #1
            [36005.131016] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            [36005.131016] task: ffff88007c09cf10 ti: ffff88007c0b0000 task.ti: ffff88007c0b0000
            [36005.131016] RIP: 0010:[<ffffff810b7290ff>]  [<ffffff810b7290ff>] 0xffffff810b7290fe
            [36005.131016] RSP: 0018:ffff88007c0b3db0  EFLAGS: 00010286
            [36005.131016] RAX: ffffff810b7290ff RBX: ffffffff81a550c0 RCX: ffff88007fd10240
            [36005.131016] RDX: ffff880051ad4ecf RSI: ffff8800198a1028 RDI: ffff880051ad4ecf
            [36005.131016] RBP: ffff88007c0b3e08 R08: 0000000000019bc0 R09: ffff88007fd10240
            [36005.131016] R10: ffff88007fd19bc0 R11: ffffea00013ff040 R12: 000000000000000a
            [36005.131016] R13: 0000000000000000 R14: ff88007b9c68d0ff R15: ffff88007fd10260
            [36005.131016] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
            [36005.131016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
            [36005.131016] CR2: ffffff810b7290ff CR3: 0000000079f5d000 CR4: 00000000000406e0
            [36005.131016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            [36005.131016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            [36005.131016] Stack:
            [36005.131016]  ffffffff81139dad ffff8800198a1028 0000000000000000 ffff88007fd10288
            [36005.131016]  ffff88007c09cf10 ffff88007c0b3fd8 0000000000000009 0000000000000009
            [36005.131016]  ffffffff81a27440 ffffffff819eb108 0000000000000001 ffff88007c0b3e78
            [36005.131016] Call Trace:
            [36005.131016]  [<ffffffff81139dad>] ? rcu_process_callbacks+0x1dd/0x550
            [36005.131016]  [<ffffffff81090b3f>] __do_softirq+0xef/0x280
            [36005.131016]  [<ffffffff81090d08>] run_ksoftirqd+0x38/0x50
            [36005.131016]  [<ffffffff810b909f>] smpboot_thread_fn+0x12f/0x180
            [36005.131016]  [<ffffffff810b8f70>] ? lg_double_unlock+0x40/0x40
            [36005.131016]  [<ffffffff810b098f>] kthread+0xcf/0xe0
            [36005.131016]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
            [36005.131016]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
            [36005.131016]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
            [36005.131016] Code:  Bad RIP value.
            [36005.131016] RIP  [<ffffff810b7290ff>] 0xffffff810b7290fe
            [36005.131016]  RSP <ffff88007c0b3db0>
            [36005.131016] CR2: ffffff810b7290ff
            
            
            sarah Sarah Liu added a comment - - edited another one on 2.10.1 , for RHEL7.4 server/client ldiskfs https://testing.hpdd.intel.com/test_sets/bfce3ad8-9c2e-11e7-b778-5254006e85c2 crash dump of MDS node /scratch/dumps/trevis-46vm3.trevis.hpdd.intel.com/10.9.6.30-2017-09-18-02:47:50 [35950.500759] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 02:46:50 \(1505702810\) [35950.698782] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 02:46:50 (1505702810) [35961.771804] LustreError: 1836:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x31fa:0x0] get parent: rc = -2 [35976.340533] LustreError: 1837:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x338d:0x0] get parent: rc = -2 [35976.344929] LustreError: 1837:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message [35980.978075] LustreError: 1839:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x32bb:0x0] get parent: rc = -2 [35992.178172] LustreError: 1839:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x34c9:0x0] get parent: rc = -2 [35992.182851] LustreError: 1839:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 1 previous similar message [36001.007252] LustreError: 1835:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) lustre: failure inode [0x2000255c2:0x3650:0x0] get parent: rc = -2 [36001.012130] LustreError: 1835:0:(llite_nfs.c:336:ll_dir_get_parent_fid()) Skipped 4 previous similar messages [36005.130074] BUG: unable to handle kernel paging request at ffffff810b7290ff [36005.131016] IP: [<ffffff810b7290ff>] 0xffffff810b7290fe [36005.131016] PGD 19f5067 PUD 0 [36005.131016] Oops: 0010 [#1] SMP [36005.131016] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) ldiskfs(OE) libcfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon joydev i2c_piix4 parport_pc nfsd parport nfs_acl lockd auth_rpcgss grace sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi cirrus drm_kms_helper [36005.131016] syscopyarea sysfillrect sysimgblt virtio_blk fb_sys_fops ata_piix ttm crct10dif_pclmul crct10dif_common drm crc32c_intel libata 8139too serio_raw virtio_pci virtio_ring floppy virtio 8139cp i2c_core mii [last unloaded: lnet_selftest] [36005.131016] CPU: 1 PID: 13 Comm: ksoftirqd/1 Tainted: G OE ------------ 3.10.0-693.2.2.el7_lustre.x86_64 #1 [36005.131016] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [36005.131016] task: ffff88007c09cf10 ti: ffff88007c0b0000 task.ti: ffff88007c0b0000 [36005.131016] RIP: 0010:[<ffffff810b7290ff>] [<ffffff810b7290ff>] 0xffffff810b7290fe [36005.131016] RSP: 0018:ffff88007c0b3db0 EFLAGS: 00010286 [36005.131016] RAX: ffffff810b7290ff RBX: ffffffff81a550c0 RCX: ffff88007fd10240 [36005.131016] RDX: ffff880051ad4ecf RSI: ffff8800198a1028 RDI: ffff880051ad4ecf [36005.131016] RBP: ffff88007c0b3e08 R08: 0000000000019bc0 R09: ffff88007fd10240 [36005.131016] R10: ffff88007fd19bc0 R11: ffffea00013ff040 R12: 000000000000000a [36005.131016] R13: 0000000000000000 R14: ff88007b9c68d0ff R15: ffff88007fd10260 [36005.131016] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [36005.131016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [36005.131016] CR2: ffffff810b7290ff CR3: 0000000079f5d000 CR4: 00000000000406e0 [36005.131016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [36005.131016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [36005.131016] Stack: [36005.131016] ffffffff81139dad ffff8800198a1028 0000000000000000 ffff88007fd10288 [36005.131016] ffff88007c09cf10 ffff88007c0b3fd8 0000000000000009 0000000000000009 [36005.131016] ffffffff81a27440 ffffffff819eb108 0000000000000001 ffff88007c0b3e78 [36005.131016] Call Trace: [36005.131016] [<ffffffff81139dad>] ? rcu_process_callbacks+0x1dd/0x550 [36005.131016] [<ffffffff81090b3f>] __do_softirq+0xef/0x280 [36005.131016] [<ffffffff81090d08>] run_ksoftirqd+0x38/0x50 [36005.131016] [<ffffffff810b909f>] smpboot_thread_fn+0x12f/0x180 [36005.131016] [<ffffffff810b8f70>] ? lg_double_unlock+0x40/0x40 [36005.131016] [<ffffffff810b098f>] kthread+0xcf/0xe0 [36005.131016] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [36005.131016] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 [36005.131016] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 [36005.131016] Code: Bad RIP value. [36005.131016] RIP [<ffffff810b7290ff>] 0xffffff810b7290fe [36005.131016] RSP <ffff88007c0b3db0> [36005.131016] CR2: ffffff810b7290ff
            sarah Sarah Liu added a comment -

            Amir, we have core dump configed on the cluster, please refer to this wiki for location:
            https://wiki.hpdd.intel.com/display/TEI/Core+dump+location+for+autotest+nodes

            sarah Sarah Liu added a comment - Amir, we have core dump configed on the cluster, please refer to this wiki for location: https://wiki.hpdd.intel.com/display/TEI/Core+dump+location+for+autotest+nodes

            can we get access to the core and vmlinux for this crash. I'll need that to investigate further.

            ashehata Amir Shehata (Inactive) added a comment - can we get access to the core and vmlinux for this crash. I'll need that to investigate further.
            sarah Sarah Liu added a comment -

            another instance seen on master: https://testing.hpdd.intel.com/test_sets/8ba3a5f4-f882-11e6-887f-5254006e85c2

            The OSS console

            18:24:42:[47407.437891] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 18:24:15 (1487701455)
            18:24:42:
            18:24:42:[47411.114011] BUG: unable to handle kernel paging request at ffffffa0864ae0ff
            18:24:42:[47411.114011] IP: [<ffffffa0864ae0ff>] 0xffffffa0864ae0fe
            18:24:42:[47411.114011] PGD 19bd067 PUD 0 
            18:24:42:[47411.114011] Oops: 0010 [#1] SMP 
            18:24:42:[47411.114011] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too cirrus drm_kms_helper crct10dif_pclmul crct10dif_common syscopyarea sysfillrect crc32c_intel serio_raw virtio_pci virtio_ring virtio 8139cp mii sysimgblt fb_sys_fops ttm ata_piix drm libata i2c_core floppy [last unloaded: lnet_selftest]
            18:24:42:[47411.114011] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE  ------------   3.10.0-514.6.1.el7_lustre.x86_64 #1
            18:24:42:[47411.114011] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            18:24:42:[47411.114011] task: ffffffff819c1460 ti: ffffffff819ac000 task.ti: ffffffff819ac000
            18:24:42:[47411.114011] RIP: 0010:[<ffffffa0864ae0ff>]  [<ffffffa0864ae0ff>] 0xffffffa0864ae0fe
            18:24:42:[47411.114011] RSP: 0018:ffff88007fc03e78  EFLAGS: 00010286
            18:24:42:[47411.114011] RAX: ffffffa0864ae0ff RBX: ffffffff81a1a580 RCX: 0000000003090de6
            18:24:42:[47411.114011] RDX: ffff880057f9d227 RSI: ffffea000028bc00 RDI: ffff880057f9d227
            18:24:42:[47411.114011] RBP: ffff88007fc03ed0 R08: 0000000000019a80 R09: ffffffff81139719
            
            sarah Sarah Liu added a comment - another instance seen on master: https://testing.hpdd.intel.com/test_sets/8ba3a5f4-f882-11e6-887f-5254006e85c2 The OSS console 18:24:42:[47407.437891] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test racer_on_nfs: racer on NFS client ======================================= 18:24:15 (1487701455) 18:24:42: 18:24:42:[47411.114011] BUG: unable to handle kernel paging request at ffffffa0864ae0ff 18:24:42:[47411.114011] IP: [<ffffffa0864ae0ff>] 0xffffffa0864ae0fe 18:24:42:[47411.114011] PGD 19bd067 PUD 0 18:24:42:[47411.114011] Oops: 0010 [#1] SMP 18:24:42:[47411.114011] Modules linked in: osc(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) lquota(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) sha512_ssse3 sha512_generic crypto_null libcfs(OE) ldiskfs(OE) dm_mod rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr virtio_balloon i2c_piix4 parport_pc parport nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_blk 8139too cirrus drm_kms_helper crct10dif_pclmul crct10dif_common syscopyarea sysfillrect crc32c_intel serio_raw virtio_pci virtio_ring virtio 8139cp mii sysimgblt fb_sys_fops ttm ata_piix drm libata i2c_core floppy [last unloaded: lnet_selftest] 18:24:42:[47411.114011] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------------ 3.10.0-514.6.1.el7_lustre.x86_64 #1 18:24:42:[47411.114011] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 18:24:42:[47411.114011] task: ffffffff819c1460 ti: ffffffff819ac000 task.ti: ffffffff819ac000 18:24:42:[47411.114011] RIP: 0010:[<ffffffa0864ae0ff>] [<ffffffa0864ae0ff>] 0xffffffa0864ae0fe 18:24:42:[47411.114011] RSP: 0018:ffff88007fc03e78 EFLAGS: 00010286 18:24:42:[47411.114011] RAX: ffffffa0864ae0ff RBX: ffffffff81a1a580 RCX: 0000000003090de6 18:24:42:[47411.114011] RDX: ffff880057f9d227 RSI: ffffea000028bc00 RDI: ffff880057f9d227 18:24:42:[47411.114011] RBP: ffff88007fc03ed0 R08: 0000000000019a80 R09: ffffffff81139719
            sarah Sarah Liu added a comment -

            it also failed under regular config, RHEL7.2 server and client
            https://testing.hpdd.intel.com/test_sessions/8003512e-724d-11e6-8a8c-5254006e85c2

            sarah Sarah Liu added a comment - it also failed under regular config, RHEL7.2 server and client https://testing.hpdd.intel.com/test_sessions/8003512e-724d-11e6-8a8c-5254006e85c2

            People

              ashehata Amir Shehata (Inactive)
              standan Saurabh Tandan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: