[LU-5198] general protection fault on rhel7 kernel Created: 15/Jun/14  Updated: 27/Jul/15  Resolved: 27/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Yang Sheng Assignee: Yang Sheng
Resolution: Duplicate Votes: 0
Labels: None

Attachments: Text File shadow-41vm4.log.txt    
Severity: 3
Rank (Obsolete): 14519

 Description   

During rhel7 server test, encountered this issue. I believe they are same issue relate to kernel rather than lustre.

[23222.689609] LustreError: Skipped 3 previous similar messages
[23223.888043] general protection fault: 0000 [#1] SMP 
[23223.888158] Modules linked in: lustre(OF) ofd(OF) osp(OF) lod(OF) ost(OF) mdt(OF) mdd(OF) mgs(OF) nodemap(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) lfsck(OF) obdecho(OF) mgc(OF) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) loop mbcache jbd2 sha512_generic netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg i2c_piix4 virtio_balloon pcspkr virtio_console mperf dm_mirror dm_region_hash dm_log dm_mod serio_raw xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net qxl drm_kms_helper ttm drm ata_piix virtio_pci virtio_ring virtio libata i2c_core floppy [last unloaded: libcfs]
[23223.889011] CPU: 1 PID: 29758 Comm: umount Tainted: GF          O--------------   3.10.0-121.el7.x86_64 #1
[23223.889011] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[23223.889011] task: ffff88001e11db00 ti: ffff88001e59c000 task.ti: ffff88001e59c000
[23223.889011] RIP: 0010:[<ffffffff81194905>]  [<ffffffff81194905>] __kmalloc+0x95/0x230
[23223.889011] RSP: 0018:ffff88001e59dab8  EFLAGS: 00010282
[23223.889011] RAX: 0000000000000000 RBX: ffff88002032dfa0 RCX: 00000000000114bf
[23223.889011] RDX: 00000000000114be RSI: 0000000000000000 RDI: 0000000000000001
[23223.889011] RBP: ffff88001e59dae8 R08: 00000000000172a0 R09: ffffffffa0c5a9c8
[23223.889011] R10: ffff88003e001d00 R11: 000000000000000f R12: 0000000000008050
[23223.889011] R13: 697463615f6d7368 R14: 000000000000000e R15: ffff88003e001d00
[23223.889011] FS:  00007f0345e41880(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[23223.889011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[23223.889011] CR2: 00007f0d215b6600 CR3: 000000001e106000 CR4: 00000000000006e0
[23223.889011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[23223.889011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[23223.889011] Stack:
[23223.889011]  ffffffffa0c5a9c8 ffff88002032dfa0 000000000000000e 000000000000000e
[23223.889011]  ffff88001e59dc28 ffff88001e59db90 ffff88001e59db10 ffffffffa0c5a9c8
[23223.889011]  ffff88002bd4e156 ffffffffa0c16f00 ffff88002bd4c458 ffff88001e59db60
[23223.889011] Call Trace:
[23223.889011]  [<ffffffffa0c5a9c8>] ? mgs_direntry_alloc+0x78/0x400 [mgs]
[23223.889011]  [<ffffffffa0c5a9c8>] mgs_direntry_alloc+0x78/0x400 [mgs]
[23223.889011]  [<ffffffffa0c674db>] class_dentry_readdir+0x8b/0x310 [mgs]
[23223.889011]  [<ffffffffa0c6c9b6>] mgs_erase_logs+0x66/0x490 [mgs]
[23223.889011]  [<ffffffffa0c6d355>] mgs_params_fsdb_cleanup+0x15/0x20 [mgs]
[23223.889011]  [<ffffffffa0c50bc4>] mgs_device_fini+0x94/0x580 [mgs]
[23223.889011]  [<ffffffffa0537497>] class_cleanup+0x8c7/0xca0 [obdclass]
[23223.889011]  [<ffffffffa053d746>] class_process_config+0x1df6/0x2990 [obdclass]
[23223.889011]  [<ffffffffa0487ff4>] ? libcfs_log_return+0x24/0x30 [libcfs]
[23223.889011]  [<ffffffffa053e3cf>] class_manual_cleanup+0xef/0x6b0 [obdclass]
[23223.889011]  [<ffffffffa05795f7>] server_put_super+0x737/0xe30 [obdclass]
[23223.889011]  [<ffffffff811b1f16>] generic_shutdown_super+0x56/0xe0
[23223.889011]  [<ffffffff811b2182>] kill_anon_super+0x12/0x20
[23223.889011]  [<ffffffffa0541742>] lustre_kill_super+0x32/0x50 [obdclass]
[23223.889011]  [<ffffffff811b259d>] deactivate_locked_super+0x3d/0x60
[23223.889011]  [<ffffffff811b2606>] deactivate_super+0x46/0x60
[23223.889011]  [<ffffffff811cf395>] mntput_no_expire+0xc5/0x120
[23223.889011]  [<ffffffff811d04cf>] SyS_umount+0x9f/0x3c0
[23223.889011]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
[23223.889011] Code: dc 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 30 01 00 00 48 85 c0 0f 84 27 01 00 00 49 63 42 20 48 8d 4a 01 4d 8b 02 <49> 8b 5c 05 00 4c 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 74 b8 49 
[23223.889011] RIP  [<ffffffff81194905>] __kmalloc+0x95/0x230
[23223.889011]  RSP <ffff88001e59dab8>

[ 2566.503039] BUG: unable to handle kernel paging request at ffff880030310064
[ 2566.503039] IP: [<ffffffff812c624d>] memcpy+0xd/0x110
[ 2566.503039] PGD 1e2f067 PUD 1e30067 PMD 30288063 PTE 8000000030310161
[ 2566.503039] Oops: 0003 [#1] SMP 
[ 2566.503039] Modules linked in: lustre(OF) ofd(OF) osp(OF) lod(OF) ost(OF) mdt(OF) mdd(OF) mgs(OF) nodemap(OF) osd_ldiskfs(OF) ldiskfs(OF) lquota(OF) lfsck(OF) obdecho(OF) mgc(OF) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 binfmt_misc loop mbcache jbd2 sha512_generic netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg serio_raw pcspkr i2c_piix4 virtio_balloon virtio_console dm_mirror dm_region_hash dm_log mperf dm_mod xfs libcrc32c sr_mod cdrom ata_generic pata_acpi qxl drm_kms_helper virtio_blk virtio_net ata_piix ttm drm floppy libata virtio_pci virtio_ring virtio i2c_core [last unloaded: libcfs]
[ 2566.503039] CPU: 1 PID: 240 Comm: systemd-journal Tainted: GF       W  O--------------   3.10.0-121.el7.x86_64 #1
[ 2566.503039] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2566.503039] task: ffff88003ac0ad80 ti: ffff88003c4ae000 task.ti: ffff88003c4ae000
[ 2566.503039] RIP: 0010:[<ffffffff812c624d>]  [<ffffffff812c624d>] memcpy+0xd/0x110
[ 2566.503039] RSP: 0018:ffff88003c4afc70  EFLAGS: 00010202
[ 2566.503039] RAX: ffff880030310064 RBX: ffff88003c9270f8 RCX: 0000000000000001
[ 2566.503039] RDX: 0000000000000007 RSI: ffff88003c9270f8 RDI: ffff880030310064
[ 2566.503039] RBP: ffff88003c4afc90 R08: 00000000000172a0 R09: ffff88003e001d00
[ 2566.503039] R10: ffff8800375bb200 R11: 0000000000000002 R12: 000000000000000f
[ 2566.503039] R13: 00000000000000d0 R14: ffff88000c149f00 R15: ffff88003bb15d90
[ 2566.503039] FS:  00007fb76e32c840(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 2566.503039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2566.503039] CR2: ffff880030310064 CR3: 000000003bb32000 CR4: 00000000000006e0
[ 2566.503039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2566.503039] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2566.503039] Stack:
[ 2566.503039]  ffffffff8115d6a9 0000000000000002 ffff88003bb15ae8 ffff88003c9270f8
[ 2566.503039]  ffff88003c4afce0 ffffffff811f280e ffff88003c4afcc8 ffffffff811cf30e
[ 2566.503039]  0000000008000002 ffff88003db945a0 0000000000000002 0000000000000002
[ 2566.503039] Call Trace:
[ 2566.503039]  [<ffffffff8115d6a9>] ? kstrdup+0x49/0x60
[ 2566.503039]  [<ffffffff811f280e>] fsnotify_create_event+0x8e/0x1a0
[ 2566.503039]  [<ffffffff811cf30e>] ? mntput_no_expire+0x3e/0x120
[ 2566.503039]  [<ffffffff811f1952>] send_to_group+0x192/0x230
[ 2566.503039]  [<ffffffff811bf16d>] ? do_last+0x1ed/0x1220
[ 2566.503039]  [<ffffffff811f1d05>] fsnotify+0x315/0x350
[ 2566.503039]  [<ffffffff811f1f6d>] __fsnotify_parent+0xdd/0xf0
[ 2566.503039]  [<ffffffff811cc3d0>] notify_change+0x300/0x3d0
[ 2566.503039]  [<ffffffff811adb43>] do_truncate+0x73/0xc0
[ 2566.503039]  [<ffffffff811b22f8>] ? __sb_start_write+0x58/0x110
[ 2566.503039]  [<ffffffff811aded4>] do_sys_ftruncate.constprop.15+0x114/0x170
[ 2566.503039]  [<ffffffff811adf6e>] SyS_ftruncate+0xe/0x10
[ 2566.503039]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
[ 2566.503039] Code: 43 4e 5b 5d c3 66 0f 1f 84 00 00 00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 
[ 2566.503039] RIP  [<ffffffff812c624d>] memcpy+0xd/0x110
[ 2566.503039]  RSP <ffff88003c4afc70>
[ 2566.503039] CR2: ffff880030310064

[ 2294.477604] general protection fault: 0000 [#1] SMP 
[ 2294.477859] Modules linked in: mgc(OF+) lov(OF) osc(OF) mdc(OF) lmv(OF) fid(OF) fld(OF) ptlrpc(OF) obdclass(OF) ksocklnd(OF) lnet(OF) libcfs(OF) loop mbcache jbd2 sha512_generic netconsole ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg serio_raw i2c_piix4 pcspkr virtio_balloon virtio_console dm_mirror dm_region_hash mperf dm_log dm_mod xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net qxl drm_kms_helper ttm drm virtio_pci virtio_ring virtio ata_piix libata i2c_core floppy [last unloaded: libcfs]
[ 2294.478015] CPU: 1 PID: 26425 Comm: modprobe Tainted: GF       W  O--------------   3.10.0-121.el7.x86_64 #1
[ 2294.478015] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2294.478015] task: ffff88002fc5a220 ti: ffff8800279f8000 task.ti: ffff8800279f8000
[ 2294.478015] RIP: 0010:[<ffffffff81197605>]  [<ffffffff81197605>] __kmalloc_track_caller+0x95/0x230
[ 2294.478015] RSP: 0018:ffff8800279f9d30  EFLAGS: 00010286
[ 2294.478015] RAX: 0000000000000000 RBX: ffffc90008292799 RCX: 000000000000908f
[ 2294.478015] RDX: 000000000000908e RSI: 0000000000000000 RDI: 0000000000000001
[ 2294.478015] RBP: ffff8800279f9d68 R08: 00000000000172a0 R09: ffff88003e001d00
[ 2294.478015] R10: ffff88003e001400 R11: ffffc90008292747 R12: 00000000000000d0
[ 2294.478015] R13: 2e617461646f722e R14: 000000000000000b R15: ffff88003e001d00
[ 2294.478015] FS:  00007f79ce9af740(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000
[ 2294.478015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2294.478015] CR2: 00007f79ce58ad90 CR3: 00000000279ff000 CR4: 00000000000006e0
[ 2294.478015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2294.478015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2294.478015] Stack:
[ 2294.478015]  ffff88003e001d00 ffffffff810cad84 ffffc90008292799 000000000000000b
[ 2294.478015]  00000000000000d0 ffff8800279f9ef0 ffffffffa045dc40 ffff8800279f9d90
[ 2294.478015]  ffffffff8115d691 ffffffffa045dc58 ffffffffa045dc90 ffff88002769e000
[ 2294.478015] Call Trace:
[ 2294.478015]  [<ffffffff810cad84>] ? load_module+0x1824/0x1a90
[ 2294.478015]  [<ffffffff8115d691>] kstrdup+0x31/0x60
[ 2294.478015]  [<ffffffff810cad84>] load_module+0x1824/0x1a90
[ 2294.478015]  [<ffffffff812da3d0>] ? ddebug_proc_write+0xf0/0xf0
[ 2294.478015]  [<ffffffff810c7133>] ? copy_module_from_fd.isra.43+0x53/0x150
[ 2294.478015]  [<ffffffff810cb1a6>] SyS_finit_module+0xa6/0xd0
[ 2294.478015]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
[ 2294.478015] Code: dc 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 0f 84 20 01 00 00 48 85 c0 0f 84 17 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 5c 05 00 4c 89 e8 65 49 0f c7 08 0f 94 c0 84 c0 74 b8 49 
[ 2294.478015] RIP  [<ffffffff81197605>] __kmalloc_track_caller+0x95/0x230
[ 2294.478015]  RSP <ffff8800279f9d30>




 Comments   
Comment by Yang Sheng [ 23/Jan/15 ]

I still hit this issue until rhel7.1 released. So i have start working to figure out the root cause. The issue still cannot reproduce stably. I need narrow down the a test case set to trigger it.

Comment by James A Simmons [ 15/Apr/15 ]

Is this problem still present?

Comment by Yang Sheng [ 26/Jun/15 ]

I don't hit it in recent rhel7 tests. So close it first.

Comment by Jian Yu [ 27/Jul/15 ]

The patch for LU-6395 resolves the issue in this ticket.

Generated at Sat Feb 10 01:49:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.