Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.1.2, Lustre 2.1.3
-
None
-
Lustre Tag: v2_1_2_RC2
Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/87/
Distro/Arch: RHEL6.2/x86_64 (kernel version: 2.6.32-220.17.1.el6)
Network: TCP (1GigE)
ENABLE_QUOTA=yes
-
3
-
4172
Description
This issue was created by maloo for yujian <yujian@whamcloud.com>
This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/c496b4f4-aecd-11e1-b0f7-52540035b04c.
The sub-test test_iorssf failed with the following error:
test failed to respond and timed out
Info required for matching: parallel-scale-nfsv3 iorssf
Console log on the MDS/Lustre Client/NFS server showed that:
17:34:14:BUG: unable to handle kernel paging request at fffffffb8a512d10 17:34:14:IP: [<ffffffff81052994>] update_curr+0x144/0x1f0 17:34:14:PGD 1a87067 PUD 0 17:34:14:Thread overran stack, or stack corrupted 17:34:14:Oops: 0000 [#1] SMP 17:34:14:last sysfs file: /sys/module/nfsd/initstate 17:34:14:CPU 0 17:34:14:Modules linked in: lustre(U) obdfilter(U) ost(U) osd_ldiskfs(U) cmm(U) fsfilt_ldiskfs(U) mdt(U) mdd(U) mds(U) mgs(U) ldiskfs(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs fscache jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest] 17:34:14: 17:34:14:Pid: 16667, comm: nfsd Not tainted 2.6.32-220.17.1.el6_lustre.g636ddbf.x86_64 #1 Red Hat KVM 17:34:15:RIP: 0010:[<ffffffff81052994>] [<ffffffff81052994>] update_curr+0x144/0x1f0 17:34:15:RSP: 0018:ffff880002203db8 EFLAGS: 00010086 17:34:15:RAX: ffff880071a40b40 RBX: ffffffff811237e6 RCX: ffff88007faa60c0 17:34:15:RDX: 0000000000018b48 RSI: ffff880071a97538 RDI: ffff880071a40b78 17:34:15:RBP: ffff880002203de8 R08: ffffffff8160b6a5 R09: 0000000000000000 17:34:15:R10: 0000000000000010 R11: 0000000000000000 R12: ffff880002215fe8 17:34:15:R13: 00000000000f469a R14: 000054a5051b5d6b R15: ffff880071a40b40 17:34:15:FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 17:34:15:CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 17:34:15:CR2: fffffffb8a512d10 CR3: 0000000037697000 CR4: 00000000000006f0 17:34:15:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 17:34:15:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 17:34:15:Process nfsd (pid: 16667, threadinfo ffff880068a62000, task ffff880071a40b40) 17:34:15:Stack: 17:34:15: ffff880002203dc8 0000000000000000 ffff880071a40b78 ffff880002215fe8 17:34:15:<0> 0000000000000000 0000000000000000 ffff880002203e18 ffffffff81052f4b 17:34:17:<0> ffff880002215f80 0000000000000000 0000000000015f80 0000000000000000 17:34:17:Call Trace: 17:34:17: <IRQ> 17:34:17: [<ffffffff81052f4b>] task_tick_fair+0xdb/0x160 17:34:17:BUG: unable to handle kernel NULL pointer dereference at 00000000000009e9 17:34:17:IP: [<ffffffff8100f5ad>] print_context_stack+0xad/0x140 17:34:17:PGD 0 17:34:17:Thread overran stack, or stack corrupted 17:34:17:Oops: 0000 [#2] SMP 17:34:17:last sysfs file: /sys/module/nfsd/initstate 17:34:17:CPU 0 17:34:17:Modules linked in: lustre(U) obdfilter(U) ost(U) osd_ldiskfs(U) cmm(U) fsfilt_ldiskfs(U) mdt(U) mdd(U) mds(U) mgs(U) ldiskfs(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs fscache jbd2 nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon 8139too 8139cp mii i2c_piix4 i2c_core ext3 jbd mbcache virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: lnet_selftest] 17:34:17: 17:34:18:Pid: 16667, comm: nfsd Not tainted 2.6.32-220.17.1.el6_lustre.g636ddbf.x86_64 #1 Red Hat KVM 17:34:18:RIP: 0010:[<ffffffff8100f5ad>] [<ffffffff8100f5ad>] print_context_stack+0xad/0x140 17:34:18:RSP: 0018:ffff8800022038c8 EFLAGS: 00010002 17:34:18:RAX: 0000000000000001 RBX: ffff880002203df0 RCX: 0000000000000000 17:34:18:RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 17:34:18:RBP: ffff880002203928 R08: ffffffff81c00740 R09: 0000000000000000 17:34:18:R10: 0000000000000002 R11: 0000000000000001 R12: ffff880002203e18 17:34:18:R13: ffff880068a62000 R14: ffffffff81600460 R15: ffff880002203fc0 17:34:18:FS: 0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 17:34:18:CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b 17:34:18:CR2: 00000000000009e9 CR3: 0000000037697000 CR4: 00000000000006f0 17:34:18:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 17:34:18:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 17:34:18:Process nfsd (pid: 16667, threadinfo ffff880068a62000, task ffff880071a40b40) 17:34:18:Stack: 17:34:18: ffff880000000018 ffff880068a63ff8 ffff880002203e18 ffff880002201fc0 17:34:19:<0> ffffffff81779900 ffffffff81052f4b ffffffff817c60ba ffff880002203db8 17:34:19:<0> 000000000000cbe0 ffffffff81600460 ffffffff81779900 ffff880002203fc0 17:34:19:Call Trace: 17:34:19: <IRQ> 17:34:19: [<ffffffff81052f4b>] ? task_tick_fair+0xdb/0x160
Console log on NFS client showed that:
17:37:09:INFO: task IOR:7271 blocked for more than 120 seconds. 17:37:09:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 17:37:09:IOR D 0000000000000000 0 7271 7268 0x00000080 17:37:09: ffff880079e21c88 0000000000000082 ffff880077682078 ffff880002215f80 17:37:09: ffff880079e21c48 ffffffff81053190 ffff880079e21c28 ffffffff8106196b 17:37:09: ffff8800750b3b38 ffff880079e21fd8 000000000000f4e8 ffff8800750b3b38 17:37:09:Call Trace: 17:37:09: [<ffffffff81053190>] ? check_preempt_wakeup+0x1c0/0x260 17:37:09: [<ffffffff8106196b>] ? enqueue_task_fair+0xfb/0x100 17:37:09: [<ffffffff8104da7c>] ? check_preempt_curr+0x7c/0x90 17:37:09: [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180 17:37:09: [<ffffffff814ee59b>] mutex_lock+0x2b/0x50 17:37:09: [<ffffffff81113239>] generic_file_aio_write+0x59/0xe0 17:37:09: [<ffffffffa03b4ffe>] nfs_file_write+0xde/0x1f0 [nfs] 17:37:09: [<ffffffff8117661a>] do_sync_write+0xfa/0x140 17:37:09: [<ffffffff81090d30>] ? autoremove_wake_function+0x0/0x40 17:37:09: [<ffffffff8120c646>] ? security_file_permission+0x16/0x20 17:37:09: [<ffffffff81176918>] vfs_write+0xb8/0x1a0 17:37:11: [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0 17:37:11: [<ffffffff81177321>] sys_write+0x51/0x90 17:37:11: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b 17:38:47:nfs: server client-23vm3 not responding, still trying
Attachments
Issue Links
- Trackbacks
-
Lustre 2.1.2 release testing tracker
Lustre 2.1.2 RC2 Tag: v212RC2 Build:
-
Lustre 2.1.3 release testing tracker
Lustre 2.1.3 RC1 Tag: v213RC1 Build: