Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1474

Test failure on test suite parallel-scale-nfsv3, subtest test_iorssf

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • None
    • Lustre 2.1.2, Lustre 2.1.3
    • None
    • 3
    • 4172

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c496b4f4-aecd-11e1-b0f7-52540035b04c.

      The sub-test test_iorssf failed with the following error:

      test failed to respond and timed out

      Info required for matching: parallel-scale-nfsv3 iorssf

      Console log on the MDS/Lustre Client/NFS server showed that:

      17:34:14:BUG: unable to handle kernel paging request at fffffffb8a512d10
      17:34:14:IP: [<ffffffff81052994>] update_curr+0x144/0x1f0
      17:34:14:PGD 1a87067 PUD 0 
      17:34:14:Thread overran stack, or stack corrupted
      17:34:14:Oops: 0000 [#1] SMP 
      17:34:14:last sysfs file: /sys/module/nfsd/initstate
      17:34:14:CPU 0 
      17:34:14:Modules linked in: lustre(U) obdfilter(U) ost(U) osd_ldiskfs(U) cmm(U) fsfilt_ldiskfs(U) mdt(U) mdd(U) mds(U) mgs(U) ldiskfs(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs fscache jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest]
      17:34:14:
      17:34:14:Pid: 16667, comm: nfsd Not tainted 2.6.32-220.17.1.el6_lustre.g636ddbf.x86_64 #1 Red Hat KVM
      17:34:15:RIP: 0010:[<ffffffff81052994>]  [<ffffffff81052994>] update_curr+0x144/0x1f0
      17:34:15:RSP: 0018:ffff880002203db8  EFLAGS: 00010086
      17:34:15:RAX: ffff880071a40b40 RBX: ffffffff811237e6 RCX: ffff88007faa60c0
      17:34:15:RDX: 0000000000018b48 RSI: ffff880071a97538 RDI: ffff880071a40b78
      17:34:15:RBP: ffff880002203de8 R08: ffffffff8160b6a5 R09: 0000000000000000
      17:34:15:R10: 0000000000000010 R11: 0000000000000000 R12: ffff880002215fe8
      17:34:15:R13: 00000000000f469a R14: 000054a5051b5d6b R15: ffff880071a40b40
      17:34:15:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      17:34:15:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      17:34:15:CR2: fffffffb8a512d10 CR3: 0000000037697000 CR4: 00000000000006f0
      17:34:15:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      17:34:15:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      17:34:15:Process nfsd (pid: 16667, threadinfo ffff880068a62000, task ffff880071a40b40)
      17:34:15:Stack:
      17:34:15: ffff880002203dc8 0000000000000000 ffff880071a40b78 ffff880002215fe8
      17:34:15:<0> 0000000000000000 0000000000000000 ffff880002203e18 ffffffff81052f4b
      17:34:17:<0> ffff880002215f80 0000000000000000 0000000000015f80 0000000000000000
      17:34:17:Call Trace:
      17:34:17: <IRQ> 
      17:34:17: [<ffffffff81052f4b>] task_tick_fair+0xdb/0x160
      17:34:17:BUG: unable to handle kernel NULL pointer dereference at 00000000000009e9
      17:34:17:IP: [<ffffffff8100f5ad>] print_context_stack+0xad/0x140
      17:34:17:PGD 0 
      17:34:17:Thread overran stack, or stack corrupted
      17:34:17:Oops: 0000 [#2] SMP 
      17:34:17:last sysfs file: /sys/module/nfsd/initstate
      17:34:17:CPU 0 
      17:34:17:Modules linked in: lustre(U) obdfilter(U) ost(U) osd_ldiskfs(U) cmm(U) fsfilt_ldiskfs(U) mdt(U) mdd(U) mds(U) mgs(U) ldiskfs(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs fscache jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest]
      17:34:17:
      17:34:18:Pid: 16667, comm: nfsd Not tainted 2.6.32-220.17.1.el6_lustre.g636ddbf.x86_64 #1 Red Hat KVM
      17:34:18:RIP: 0010:[<ffffffff8100f5ad>]  [<ffffffff8100f5ad>] print_context_stack+0xad/0x140
      17:34:18:RSP: 0018:ffff8800022038c8  EFLAGS: 00010002
      17:34:18:RAX: 0000000000000001 RBX: ffff880002203df0 RCX: 0000000000000000
      17:34:18:RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046
      17:34:18:RBP: ffff880002203928 R08: ffffffff81c00740 R09: 0000000000000000
      17:34:18:R10: 0000000000000002 R11: 0000000000000001 R12: ffff880002203e18
      17:34:18:R13: ffff880068a62000 R14: ffffffff81600460 R15: ffff880002203fc0
      17:34:18:FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      17:34:18:CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      17:34:18:CR2: 00000000000009e9 CR3: 0000000037697000 CR4: 00000000000006f0
      17:34:18:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      17:34:18:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      17:34:18:Process nfsd (pid: 16667, threadinfo ffff880068a62000, task ffff880071a40b40)
      17:34:18:Stack:
      17:34:18: ffff880000000018 ffff880068a63ff8 ffff880002203e18 ffff880002201fc0
      17:34:19:<0> ffffffff81779900 ffffffff81052f4b ffffffff817c60ba ffff880002203db8
      17:34:19:<0> 000000000000cbe0 ffffffff81600460 ffffffff81779900 ffff880002203fc0
      17:34:19:Call Trace:
      17:34:19: <IRQ> 
      17:34:19: [<ffffffff81052f4b>] ? task_tick_fair+0xdb/0x160
      

      Console log on NFS client showed that:

      17:37:09:INFO: task IOR:7271 blocked for more than 120 seconds.
      17:37:09:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      17:37:09:IOR           D 0000000000000000     0  7271   7268 0x00000080
      17:37:09: ffff880079e21c88 0000000000000082 ffff880077682078 ffff880002215f80
      17:37:09: ffff880079e21c48 ffffffff81053190 ffff880079e21c28 ffffffff8106196b
      17:37:09: ffff8800750b3b38 ffff880079e21fd8 000000000000f4e8 ffff8800750b3b38
      17:37:09:Call Trace:
      17:37:09: [<ffffffff81053190>] ? check_preempt_wakeup+0x1c0/0x260
      17:37:09: [<ffffffff8106196b>] ? enqueue_task_fair+0xfb/0x100
      17:37:09: [<ffffffff8104da7c>] ? check_preempt_curr+0x7c/0x90
      17:37:09: [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
      17:37:09: [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
      17:37:09: [<ffffffff81113239>] generic_file_aio_write+0x59/0xe0
      17:37:09: [<ffffffffa03b4ffe>] nfs_file_write+0xde/0x1f0 [nfs]
      17:37:09: [<ffffffff8117661a>] do_sync_write+0xfa/0x140
      17:37:09: [<ffffffff81090d30>] ? autoremove_wake_function+0x0/0x40
      17:37:09: [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
      17:37:09: [<ffffffff81176918>] vfs_write+0xb8/0x1a0
      17:37:11: [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
      17:37:11: [<ffffffff81177321>] sys_write+0x51/0x90
      17:37:11: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      17:38:47:nfs: server client-23vm3 not responding, still trying
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: