[LU-2940] Oops in vvp_pgcache_show() Created: 09/Mar/13 Updated: 17/Jan/14 Resolved: 10/Aug/13 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | John Hammond | Assignee: | Emoly Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llite, patch, procfs | ||
| Severity: | 3 |
| Rank (Obsolete): | 7058 |
| Description |
|
To reproduce start racer and run while true; do cat /proc/fs/lustre/llite/*/dump_page_cache > /dev/null; done crash> bt -l
PID: 8892 TASK: ffff880118dc8080 CPU: 1 COMMAND: "cat"
#0 [ffff88013a59b920] machine_kexec at ffffffff81031f7b
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/kernel/machine_kexec_64.c: 336
#1 [ffff88013a59b980] crash_kexec at ffffffff810b8c22
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/kernel/kexec.c: 1106
#2 [ffff88013a59ba50] oops_end at ffffffff814eef80
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/kernel/dumpstack.c: 242
#3 [ffff88013a59ba80] no_context at ffffffff81042a0b
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/mm/fault.c: 698
#4 [ffff88013a59bad0] __bad_area_nosemaphore at ffffffff81042c95
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/mm/fault.c: 764
#5 [ffff88013a59bb20] bad_area at ffffffff81042dbe
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/mm/fault.c: 793
#6 [ffff88013a59bb50] __do_page_fault at ffffffff81043570
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/mm/fault.c: 1140
#7 [ffff88013a59bc70] do_page_fault at ffffffff814f0f5e
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/mm/fault.c: 1208
#8 [ffff88013a59bca0] page_fault at ffffffff814ee315
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86_64/kernel/entry.S
[exception RIP: vvp_pgcache_show+263]
RIP: ffffffffa0df5ab7 RSP: ffff88013a59bd58 RFLAGS: 00010246
RAX: ffff88010f064ac0 RBX: ffffea00039cdd28 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffffa0e02b12 RDI: ffff88010f064a00
RBP: ffff88013a59be18 R8: 00000000fffffffb R9: 00000000000035bb
R10: 0000000000000002 R11: 000000000000000f R12: ffff880132b22440
R13: ffff88015b589db8 R14: ffff88014ff48948 R15: ffff88010f064a00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88013a59be20] seq_read at ffffffff8119a0a9
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/fs/seq_file.c: 235
#10 [ffff88013a59bea0] proc_reg_read at ffffffff811dcbde
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/fs/proc/inode.c: 202
#11 [ffff88013a59bef0] vfs_read at ffffffff811782b5
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/fs/read_write.c: 295
#12 [ffff88013a59bf30] sys_read at ffffffff811783f1
/usr/src/debug/kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/fs/read_write.c: 369
#13 [ffff88013a59bf80] system_call_fastpath at ffffffff8100b072
/usr/src/debug///////kernel-2.6.32-279.19.1.el6/linux-2.6.32-279.19.1.el6_lustre_gcov.x86_64/arch/x86/kernel/entry_64.S: 489
RIP: 0000003282eda360 RSP: 00007fffaf2f4050 RFLAGS: 00010206
RAX: 0000000000000000 RBX: ffffffff8100b072 RCX: 0000000000b68030
RDX: 0000000000008000 RSI: 0000000000b60000 RDI: 0000000000000003
RBP: 0000000000b60000 R8: 000000328318dee8 R9: 0000000000000001
R10: 0000000000008fff R11: 0000000000000246 R12: ffffffffffff8000
R13: 0000000000000003 R14: 0000000000008000 R15: 0000000000000003
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
crash> dis -l vvp_pgcache_show+263
/root/lustre-release/lustre/llite/vvp_dev.c: 420
0xffffffffa0df5ab7 <vvp_pgcache_show+263>: mov (%rdx),%rcx
|
| Comments |
| Comment by Gaurav Mahajan (Inactive) [ 29/Jul/13 ] |
|
Hi John, I tried reproducing the bug but could not reproduce it. Tried following:
Can you please add more information about how to reproduce this bug? |
| Comment by John Hammond [ 29/Jul/13 ] |
|
I think you need to create a workload with more page teardown: while true; do cat /proc/fs/lustre/llite/*/dump_page_cache &> /dev/null; done & MOUNT_2=y ./lustre/tests/llmount.sh sh ./lustre/tests/racer.sh |
| Comment by Peter Jones [ 31/Jul/13 ] |
|
I just noticed this patch in gerrit - http://review.whamcloud.com/#/c/7192/ |
| Comment by John Hammond [ 10/Aug/13 ] |
|
Patch landed to master. |
| Comment by Jian Yu [ 17/Jan/14 ] |
|
Lustre client build: http://build.whamcloud.com/job/lustre-b2_4/70/ (2.4.2) sanity test 118a hung: == sanity test 118a: verify O_SYNC works ============ 01:54:01 (1389779641) 7+0 records in 7+0 records out 458752 bytes (459 kB) copied, 0.00423288 s, 108 MB/s Console log on client: 01:54:20:Lustre: DEBUG MARKER: == sanity test 118a: verify O_SYNC works ============ 01:54:01 (1389779641) 01:54:21:BUG: unable to handle kernel NULL pointer dereference at (null) 01:54:21:IP: [<ffffffffa0ac68d7>] vvp_pgcache_show+0x107/0x370 [lustre] 01:54:21:PGD 66733067 PUD 19fc1067 PMD 0 01:54:21:Oops: 0000 [#1] SMP 01:54:22:last sysfs file: /sys/devices/system/cpu/possible 01:54:22:CPU 1 01:54:22:Modules linked in: ext2 lustre(U) obdecho(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) sha512_generic sha256_generic libcfs(U) nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core microcode virtio_balloon i2c_piix4 i2c_core 8139too 8139cp mii ext3 jbd mbcache virtio_blk pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] 01:54:23: 01:54:23:Pid: 28122, comm: lctl Not tainted 2.6.32-358.23.2.el6.x86_64 #1 Red Hat KVM 01:54:23:RIP: 0010:[<ffffffffa0ac68d7>] [<ffffffffa0ac68d7>] vvp_pgcache_show+0x107/0x370 [lustre] 01:54:24:RSP: 0018:ffff88007a96fd58 EFLAGS: 00010246 01:54:24:RAX: ffff88007a9960c0 RBX: ffffea0000eac3d0 RCX: 0000000000000000 01:54:24:RDX: 0000000000000000 RSI: ffffffffa0ad3c90 RDI: ffff88007a996000 01:54:24:RBP: ffff88007a96fe18 R08: 00000000ffffff20 R09: 0000000000000121 01:54:24:R10: 0000000000000002 R11: 0000000000000008 R12: ffff88004b06b940 01:54:25:R13: ffff880064057a70 R14: ffff88006665f840 R15: ffff88007a996000 01:54:25:FS: 00007fe431f94700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 01:54:25:CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 01:54:26:CR2: 0000000000000000 CR3: 000000003b217000 CR4: 00000000000006e0 01:54:26:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 01:54:26:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 01:54:27:Process lctl (pid: 28122, threadinfo ffff88007a96e000, task ffff880079e26080) 01:54:27:Stack: 01:54:27: ffff88007a96fd68 ffffffffa03d8b2e ffff88007a96fd88 ffff880064057a68 01:54:27:<d> ffff880064057aa8 ffff880064057a70 ffff88004b06b940 ffff88007a96fe58 01:54:28:<d> ffff88007a96fda8 ffffffffa03d8a4e ffff88007a96fdd8 ffffffffa058d03f 01:54:28:Call Trace: 01:54:28: [<ffffffffa03d8b2e>] ? cfs_free+0xe/0x10 [libcfs] 01:54:29: [<ffffffffa03d8a4e>] ? cfs_mem_cache_free+0xe/0x10 [libcfs] 01:54:29: [<ffffffffa058d03f>] ? cl_env_put+0x20f/0x370 [obdclass] 01:54:29: [<ffffffff811a54e9>] seq_read+0x289/0x400 01:54:29: [<ffffffff81057520>] ? __dequeue_entity+0x30/0x50 01:54:30: [<ffffffff811e99ae>] proc_reg_read+0x7e/0xc0 01:54:30: [<ffffffff81181ac5>] vfs_read+0xb5/0x1a0 01:54:30: [<ffffffff81181c01>] sys_read+0x51/0x90 01:54:30: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290 01:54:30: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Maloo report: https://maloo.whamcloud.com/test_sets/039a8440-7e7b-11e3-925a-52540035b04c The patch http://review.whamcloud.com/7192 needs to be back-ported to Lustre b2_4 branch. |