[LU-3638] GPF crash in osp_key_exit Created: 25/Jul/13 Updated: 09/Jan/20 Resolved: 09/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9371 |
| Description |
|
Just hit this running recent master: <4>[113366.463322] Lustre: DEBUG MARKER: == replay-single test 0c: check replay-barrier == 10:40:38 (1374676838) <3>[113367.223979] LustreError: 22867:0:(osd_handler.c:1191:osd_ro()) *** setting lustre-MDT0000 read-only *** <4>[113367.225361] Turning device loop0 (0x700000) read-only <4>[113367.303612] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 <4>[113367.345628] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 <4>[113367.526705] Lustre: Unmounted lustre-client <4>[113367.907894] Lustre: Failing over lustre-MDT0000 <1>[113368.092078] BUG: unable to handle kernel paging request at ffff8800b6966c68 <1>[113368.092813] IP: [<ffffffffa0936019>] osp_key_exit+0x9/0x20 [osp] <4>[113368.093486] PGD 1a26063 PUD 501067 PMD 6b6067 PTE 80000000b6966060 <4>[113368.094166] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC <4>[113368.094774] last sysfs file: /sys/devices/system/cpu/possible <4>[113368.095424] CPU 2 <4>[113368.095515] Modules linked in: <3>[113368.096298] LustreError: 11-0: lustre-MDT0000-lwp-OST0001: Communicating with 0@lo, operation obd_ping failed with -107. <4>[113368.096304] Lustre: lustre-MDT0000-lwp-OST0001: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete <4>[113368.096034] lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd <3>[113368.116346] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target) <4>[113368.096034] mgs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon i2c_piix4 i2c_core virtio_console virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[113368.096034] <4>[113368.096034] Pid: 22149, comm: mdt00_002 Not tainted 2.6.32-rhe6.4-debug #2 Red Hat KVM <4>[113368.096034] RIP: 0010:[<ffffffffa0936019>] [<ffffffffa0936019>] osp_key_exit+0x9/0x20 [osp] <4>[113368.096034] RSP: 0018:ffff8800973dfe10 EFLAGS: 00010282 <4>[113368.096034] RAX: ffffffffa0936010 RBX: 00000000000000c8 RCX: 0000000000000000 <4>[113368.096034] RDX: ffff8800b6966bf0 RSI: ffffffffa095ac00 RDI: ffff8800b7121610 <4>[113368.096034] RBP: ffff8800973dfe10 R08: 0000000000000001 R09: 0000000000000000 <4>[113368.096034] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800b7121610 <4>[113368.096034] R13: ffff88008a49af30 R14: ffff88006bbafef0 R15: ffff8800b52aac20 <4>[113368.096034] FS: 0000000000000000(0000) GS:ffff880006280000(0000) knlGS:0000000000000000 <4>[113368.096034] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[113368.096034] CR2: ffff8800b6966c68 CR3: 0000000001a25000 CR4: 00000000000006e0 <4>[113368.096034] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[113368.096034] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[113368.096034] Process mdt00_002 (pid: 22149, threadinfo ffff8800973de000, task ffff8800973dc380) <4>[113368.096034] Stack: <4>[113368.096034] ffff8800973dfe30 ffffffffa0ff4488 ffff8800b69677f0 ffff8800b52aabf0 <4>[113368.096034] <d> ffff8800973dfed0 ffffffffa1190899 ffff8800973dfe50 ffff880000000000 <4>[113368.096034] <d> ffff8800b52aac80 00000000973dffd8 ffff88006bbafef0 00000000973dc938 <4>[113368.096034] Call Trace: <4>[113368.096034] [<ffffffffa0ff4488>] lu_context_exit+0x58/0xa0 [obdclass] <4>[113368.096034] [<ffffffffa1190899>] ptlrpc_main+0x9d9/0x1650 [ptlrpc] <4>[113368.096034] [<ffffffffa118fec0>] ? ptlrpc_main+0x0/0x1650 [ptlrpc] <4>[113368.096034] [<ffffffff81094606>] kthread+0x96/0xa0 <4>[113368.096034] [<ffffffff8100c10a>] child_rip+0xa/0x20 <4>[113368.096034] [<ffffffff81094570>] ? kthread+0x0/0xa0 <4>[113368.096034] [<ffffffff8100c100>] ? child_rip+0x0/0x20 <4>[113368.096034] Code: <48> c7 42 78 00 00 00 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 Crashdump and modules are in /exports/crashdumps/192.168.10.221-2013-07-24-10\:40\:42 |
| Comments |
| Comment by Oleg Drokin [ 25/Jul/13 ] |
|
Hm, apparently I hit it twice, here's another report: <4>[113381.495165] Lustre: DEBUG MARKER: == replay-single test 13: open chmod 0 |x| write close == 10:40:54 (1374676854) <4>[113382.445236] Turning device loop0 (0x700000) read-only <4>[113382.590732] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 <4>[113382.610038] Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 <1>[113383.065720] BUG: unable to handle kernel paging request at ffff880094800c68 <1>[113383.068910] IP: [<ffffffffa0944019>] osp_key_exit+0x9/0x20 [osp] <4>[113383.069122] PGD 1a26063 PUD 501067 PMD 5a6067 PTE 8000000094800060 <4>[113383.069122] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC <4>[113383.069122] last sysfs file: /sys/devices/system/cpu/possible <4>[113383.069122] CPU 3 <4>[113383.069122] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[113383.069122] <4>[113383.069122] Pid: 30405, comm: mdt00_001 Not tainted 2.6.32-rhe6.4-debug #2 Red Hat KVM <4>[113383.069122] RIP: 0010:[<ffffffffa0944019>] [<ffffffffa0944019>] osp_key_exit+0x9/0x20 [osp] <4>[113383.069122] RSP: 0018:ffff8800954c7e10 EFLAGS: 00010282 <4>[113383.069122] RAX: ffffffffa0944010 RBX: 00000000000000c8 RCX: 0000000000000000 <4>[113383.069122] RDX: ffff880094800bf0 RSI: ffffffffa0968c00 RDI: ffff8800b3328df8 <4>[113383.069122] RBP: ffff8800954c7e10 R08: ffff88009a39c7f8 R09: 000000000000007c <4>[113383.069122] R10: 0000000000000001 R11: 20736e6172742029 R12: ffff8800b3328df8 <4>[113383.069122] R13: ffff8800b3d9df30 R14: ffff8800846fcef0 R15: ffff880077d93c20 <4>[113383.069122] FS: 0000000000000000(0000) GS:ffff8800062c0000(0000) knlGS:0000000000000000 <4>[113383.069122] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[113383.069122] CR2: ffff880094800c68 CR3: 0000000001a25000 CR4: 00000000000006e0 <4>[113383.069122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[113383.069122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[113383.069122] Process mdt00_001 (pid: 30405, threadinfo ffff8800954c6000, task ffff8800b03882c0) <4>[113383.069122] Stack: <4>[113383.069122] ffff8800954c7e30 ffffffffa1000488 0000000000000001 ffff880077d93bf0 <4>[113383.069122] <d> ffff8800954c7ed0 ffffffffa119c8eb 00000000000f4240 ffff880000000000 <4>[113383.069122] <d> ffff880077d93c80 00000000954c7fd8 ffff8800846fcef0 00000000b0388878 <4>[113383.069122] Call Trace: <4>[113383.069122] [<ffffffffa1000488>] lu_context_exit+0x58/0xa0 [obdclass] <4>[113383.069122] [<ffffffffa119c8eb>] ptlrpc_main+0xa2b/0x1650 [ptlrpc] <4>[113383.069122] [<ffffffffa119bec0>] ? ptlrpc_main+0x0/0x1650 [ptlrpc] <4>[113383.069122] [<ffffffff81094606>] kthread+0x96/0xa0 <4>[113383.069122] [<ffffffff8100c10a>] child_rip+0xa/0x20 <4>[113383.069122] [<ffffffff81094570>] ? kthread+0x0/0xa0 <4>[113383.069122] [<ffffffff8100c100>] ? child_rip+0x0/0x20 <4>[113383.069122] Code: <48> c7 42 78 00 00 00 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 crashdump is in /exports/crashdumps/192.168.10.222-2013-07-24-10\:40\:58 |
| Comment by Oleg Drokin [ 01/Apr/14 ] |
|
This is still happening, I just got two more crashes there in current master |
| Comment by Andreas Dilger [ 09/Jan/20 ] |
|
Close old bug |