[LU-3252] MDT crash in lu_object_put+0x1d8 Created: 01/May/13 Updated: 28/Feb/20 Resolved: 28/Feb/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 8054 |
| Description |
|
Running racer on a recent master, it crashed after about 21 hours with: [76336.978485] BUG: unable to handle kernel paging request at ffff880079ed5ea8 [76336.978811] IP: [<ffffffffa0dc22f8>] lu_object_put+0x1d8/0x330 [obdclass] [76336.979138] PGD 1a26063 PUD 300067 PMD 4d0067 PTE 8000000079ed5060 [76336.979443] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [76336.979704] last sysfs file: /sys/devices/system/cpu/possible [76336.979980] CPU 3 [76336.980018] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] [76336.982446] [76336.982446] Pid: 5799, comm: mdt00_008 Not tainted 2.6.32-rhe6.4-debug #2 Bochs Bochs [76336.982446] RIP: 0010:[<ffffffffa0dc22f8>] [<ffffffffa0dc22f8>] lu_object_put+0x1d8/0x330 [obdclass] [76336.982446] RSP: 0018:ffff880082f49a00 EFLAGS: 00010246 [76336.982446] RAX: 0000000000000000 RBX: ffff880079ed5ea8 RCX: 0000000000000002 [76336.982446] RDX: 0000000000000002 RSI: ffffc900015ca000 RDI: 0000000000000001 [76336.982446] RBP: ffff880082f49a60 R08: 0000000000000400 R09: 0000000000000ffa [76336.982446] R10: 0000000000000693 R11: cc00000000000000 R12: ffff880010703668 [76336.982446] R13: ffff880079ed5f00 R14: ffff8800b738c168 R15: ffff880082f49a20 [76336.982446] FS: 00007fd3d883b700(0000) GS:ffff8800062c0000(0000) knlGS:0000000000000000 [76336.982446] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [76336.982446] CR2: ffff880079ed5ea8 CR3: 000000008ca47000 CR4: 00000000000006e0 [76336.982446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [76336.982446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [76336.982446] Process mdt00_008 (pid: 5799, threadinfo ffff880082f48000, task ffff880096508040) [76336.982446] Stack: [76336.982446] ffffc90008672f78 ffff88004ce3af30 ffffc900015da028 ffffc900015ca000 [76336.982446] <d> ffffc900015ca000 0000000000000967 ffff880082f49a60 ffff880079ed5ea8 [76336.982446] <d> ffff880010703668 00000000fffffffe 0000000200010001 0000000000000000 [76336.982446] Call Trace: [76336.982446] [<ffffffffa070df4d>] mdt_object_unlock_put+0x3d/0x110 [mdt] [76336.982446] [<ffffffffa074019f>] mdt_reint_open+0x95f/0x20c0 [mdt] [76336.982446] [<ffffffffa0cb9b3f>] ? upcall_cache_get_entry+0x3bf/0x870 [libcfs] [76336.982446] [<ffffffffa115c78c>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] [76336.982446] [<ffffffffa0de21f0>] ? lu_ucred+0x20/0x30 [obdclass] [76336.982446] [<ffffffffa072b621>] mdt_reint_rec+0x41/0xe0 [mdt] [76336.982446] [<ffffffffa0724ae3>] mdt_reint_internal+0x4e3/0x7d0 [mdt] [76336.982446] [<ffffffffa072509d>] mdt_intent_reint+0x1ed/0x520 [mdt] [76336.982446] [<ffffffffa0720c6e>] mdt_intent_policy+0x3ae/0x750 [mdt] [76336.982446] [<ffffffffa111470a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc] [76336.982446] [<ffffffffa113ae67>] ldlm_handle_enqueue0+0x4f7/0x10b0 [ptlrpc] [76336.982446] [<ffffffffa0721146>] mdt_enqueue+0x46/0x110 [mdt] [76336.982446] [<ffffffffa0712d18>] mdt_handle_common+0x648/0x1660 [mdt] [76336.982446] [<ffffffffa074ede5>] mds_regular_handle+0x15/0x20 [mdt] [76336.982446] [<ffffffffa116c898>] ptlrpc_server_handle_request+0x3a8/0xc70 [ptlrpc] [76336.982446] [<ffffffffa0c9d5ee>] ? cfs_timer_arm+0xe/0x10 [libcfs] [76336.982446] [<ffffffffa0caee9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [76336.982446] [<ffffffffa1163fe1>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [76336.982446] [<ffffffff81054613>] ? __wake_up+0x53/0x70 [76336.982446] [<ffffffffa116db95>] ptlrpc_main+0xa35/0x1640 [ptlrpc] [76336.982446] [<ffffffffa116d160>] ? ptlrpc_main+0x0/0x1640 [ptlrpc] [76336.982446] [<ffffffff8100c10a>] child_rip+0xa/0x20 [76336.982446] [<ffffffffa116d160>] ? ptlrpc_main+0x0/0x1640 [ptlrpc] [76336.982446] [<ffffffffa116d160>] ? ptlrpc_main+0x0/0x1640 [ptlrpc] [76336.982446] [<ffffffff8100c100>] ? child_rip+0x0/0x20 [76336.982446] Code: b0 48 8b 70 10 48 83 c2 08 e8 75 56 4c e0 49 8b 06 be 01 00 00 00 48 8b 7d c0 48 8b 40 20 ff 50 18 e9 da fe ff ff 0f 1f 44 00 00 <f6> 03 01 0f 84 cc fe ff ff 48 8b 7d b0 48 83 c7 18 e8 22 b4 ed [76336.982446] RIP [<ffffffffa0dc22f8>] lu_object_put+0x1d8/0x330 [obdclass] [76336.982446] RSP <ffff880082f49a00> [76336.982446] CR2: ffff880079ed5ea8 Crashdump and modules are in /exports/crashdumps/192.168.10.220-2013-04-30-16:12:30 lu_object_put+0x1d8 is lustre/obdclass/lu_object.c:107 107 if (lu_object_is_dying(top)) {
Tag in my tree is master-20130430 |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 13/May/13 ] |
|
1st crash analysis steps indicate the page containing the concerned mdt_object is un-mapped but the mdt_obj Slab still references it !!... |
| Comment by Bruno Faccini (Inactive) [ 24/May/13 ] |
|
Ok, in fact what made me think this problem could be VM-related was due to the fact the page containing the object beeing accessed had its PAGE_PRESENT bit clear which caused the exception. But this always done, even if the page is still physically present, when running with DEBUG_PAGEALLOC mode configured (which is the case since full debug is enabled in the booted Kernel here), this to trap any access to freed kmem/slab areas. So the problem is that the mdt_object[_header] beeing accessed has been already freed, and thus the containing SLAB page emptied (since there is only one object per-Slab/Page!) and then its PAGE_PRESENT bit unset ... The good news is that since we are running with full debug, the stack of the thread responsible (and running on an other CPU, ie #1 vs #3 for the panic'ing thread) of the object free has been dumped/unwind in place of the object itself and can be reconstruct as following : <kmem_cache_free+114> <cfs_mem_cache_free+14> <mdt_object_free+244> <lu_object_free+291> <lu_object_put+173> <mdt_object_put+63> <mdt_reint_unlink+1417> <mdt_reint_rec+65> <mdt_reint_internal+1251> <mdt_reint+68> <mdt_handle_common+1608> <mds_regular_handle+21> <ptlrpc_server_handle_request+936> <ptlrpc_main+2613> So, seems that we face a race around mdt_object between an unlink/rmdir (MDS_REINT) and an open+lock (LDLM_ENQUEUE) requests competing. Also, where can I find your git tree and the master-20130430 tag ?? |
| Comment by Bruno Faccini (Inactive) [ 27/May/13 ] |
|
Unfortunately, Lustre trace being extracted from crash-dump does not allow for more fine debugging on the timing sequence that may lead to the crash. At least D_INODE/D_MALLOC/D_DENTRY traces are missing for this case. But more debugging of the current/panic'ing thread context clearly indicates that the parent object was freed during mdt_reint_open() execution. More to come. |
| Comment by Oleg Drokin [ 27/May/13 ] |
|
My tree (with tags) is in /exports/centos6-nfsroot/home/green/git/lustre-release |
| Comment by Bruno Faccini (Inactive) [ 31/May/13 ] |
|
Thank's Oleg. I don't see any difference with current master tree in the Server LDLM/MDT layers involved for this problem, so this crash is likely to re-occur, even if I strongly feel it is a very unfrequent one ... Actually I am stuck since the Lustre log does not allow me to find the timing sequence about how this can happen, and the concerned source code reading which tells me this simply can't happen due to locking and others protections which take place. I am also trying to find a reproducer based on what I already found about the protagonists and the used configuration. |
| Comment by Bruno Faccini (Inactive) [ 05/Jun/13 ] |
|
My attempts to find a reproducer have been unsuccessful until now. Oleg, do you remember the way you configured+ran racer during 21 hours before to trigger the problem ?? |
| Comment by Oleg Drokin [ 07/Oct/13 ] |
|
I just had another similar crash, but in another place: <1>[181878.026545] BUG: unable to handle kernel paging request at ffff880010f60e90 <1>[181878.027555] IP: [<ffffffffa04e62d8>] lu_object_put+0x1d8/0x330 [obdclass] <4>[181878.028283] PGD 1a26063 PUD 1a2a063 PMD 187067 PTE 10f60060 <4>[181878.028949] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC <4>[181878.029562] last sysfs file: /sys/devices/system/cpu/possible <4>[181878.030198] CPU 4 <4>[181878.030293] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[181878.030534] <4>[181878.030534] Pid: 5775, comm: mdt_rdpg01_001 Not tainted 2.6.32-rhe6.4-debug #2 Red Hat KVM <4>[181878.030534] RIP: 0010:[<ffffffffa04e62d8>] [<ffffffffa04e62d8>] lu_object_put+0x1d8/0x330 [obdclass] <4>[181878.030534] RSP: 0018:ffff880031001c60 EFLAGS: 00010246 <4>[181878.030534] RAX: 0000000000000000 RBX: ffff880010f60e90 RCX: 0000000000000002 <4>[181878.030534] RDX: 0000000000000002 RSI: ffffc90004116000 RDI: 0000000000000001 <4>[181878.030534] RBP: ffff880031001cc0 R08: 0000000000000401 R09: 0000000000000be5 <4>[181878.030534] R10: 0000000000000693 R11: cc00000000000000 R12: ffff88001525f878 <4>[181878.030534] R13: ffff880010f60ee8 R14: ffff8800b7bc4160 R15: ffff880031001c80 <4>[181878.030534] FS: 0000000000000000(0000) GS:ffff880006300000(0000) knlGS:0000000000000000 <4>[181878.030534] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[181878.030534] CR2: ffff880010f60e90 CR3: 00000000643d8000 CR4: 00000000000006e0 <4>[181878.030534] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[181878.030534] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[181878.030534] Process mdt_rdpg01_001 (pid: 5775, threadinfo ffff880031000000, task ffff880096e76440) <4>[181878.030534] Stack: <4>[181878.030534] ffff880031001cb0 ffff88006c146f30 ffffc90004126028 ffffc90004116000 <4>[181878.030534] <d> ffffc90004116000 ffffffff00000552 0000000000000000 ffff88006462f000 <4>[181878.030534] <d> ffff880010f60e90 ffff88001525f878 ffff88000b35ef90 0000000000000000 <4>[181878.030534] Call Trace: <4>[181878.030534] [<ffffffffa0cbb38f>] mdt_thread_info_fini+0x9f/0x190 [mdt] <4>[181878.030534] [<ffffffffa0cc3aa3>] mdt_handle_common+0x653/0x1470 [mdt] <4>[181878.030534] [<ffffffffa0cfcfe5>] mds_readpage_handle+0x15/0x20 [mdt] <4>[181878.030534] [<ffffffffa086b3f5>] ptlrpc_server_handle_request+0x395/0xc20 [ptlrpc] <4>[181878.030534] [<ffffffffa07a835f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4>[181878.030534] [<ffffffffa0862d61>] ? ptlrpc_wait_event+0xc1/0x2e0 [ptlrpc] <4>[181878.030534] [<ffffffffa086c6da>] ptlrpc_main+0xa5a/0x1690 [ptlrpc] <4>[181878.030534] [<ffffffffa086bc80>] ? ptlrpc_main+0x0/0x1690 [ptlrpc] <4>[181878.030534] [<ffffffff81094606>] kthread+0x96/0xa0 <4>[181878.030534] [<ffffffff8100c10a>] child_rip+0xa/0x20 <4>[181878.030534] [<ffffffff81094570>] ? kthread+0x0/0xa0 <4>[181878.030534] [<ffffffff8100c100>] ? child_rip+0x0/0x20 <4>[181878.030534] Code: b0 48 8b 70 10 48 83 c2 08 e8 95 16 da e0 49 8b 06 be 01 00 00 00 48 8b 7d c0 48 8b 40 20 ff 50 18 e9 da fe ff ff 0f 1f 44 00 00 <f6> 03 01 0f 84 cc fe ff ff 48 8b 7d b0 31 c9 31 d2 be 03 00 00 <1>[181878.030534] RIP [<ffffffffa04e62d8>] lu_object_put+0x1d8/0x330 [obdclass] crashdump in /exports/crashdumps/192.168.10.210-2013-10-06-17:31:50 This is how I run my racer tests: while :; do rm -rf /tmp/* ; SLOW=yes REFORMAT=yes DURATION=$((900*3)) PTLDEBUG="vfstrace rpctrace dlmtrace neterror ha config ioctl super cache" DEBUG_SIZE=100 sh racer.sh ; sh llmountcleanup.sh ; done |
| Comment by Bruno Faccini (Inactive) [ 15/Oct/13 ] |
|
Sorry Oleg I am late ... I will look into this new crash-dump and see If I can get more from this one. |
| Comment by Oleg Drokin [ 02/Jul/14 ] |
|
I think this has hit again, with yet different, but similar stacktrace: <1>[100578.943075] BUG: unable to handle kernel paging request at ffff880095e6dea8 <1>[100578.944975] IP: [<ffffffffa100a048>] lu_object_put+0x1d8/0x330 [obdclass] <4>[100578.945716] PGD 1a26063 PUD 501067 PMD 5b1067 PTE 8000000095e6d060 <4>[100578.946250] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC <4>[100578.946641] last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/virtio0/net/eth0/broadcast <4>[100578.947062] CPU 7 <4>[100578.947062] Modules linked in: lustre ofd osp lod ost mdt mdd mgs nodemap osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_balloon virtio_console i2c_piix4 i2c_core virtio_net virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[100578.947062] <4>[100578.947062] Pid: 21227, comm: mdt_rdpg01_004 Not tainted 2.6.32-rhe6.5-debug #2 Red Hat KVM <4>[100578.947062] RIP: 0010:[<ffffffffa100a048>] [<ffffffffa100a048>] lu_object_put+0x1d8/0x330 [obdclass] <4>[100578.947062] RSP: 0018:ffff88001dd3dcd0 EFLAGS: 00010246 <4>[100578.947062] RAX: 0000000000000000 RBX: ffff880095e6de98 RCX: 0000000000000002 <4>[100578.947062] RDX: 0000000000000002 RSI: ffffc90006482000 RDI: 0000000000000001 <4>[100578.947062] RBP: ffff88001dd3dd30 R08: 0000000000000401 R09: 0000000000000dfd <4>[100578.947062] R10: 0000000000000052 R11: cc00000000000000 R12: ffff88007ba29c40 <4>[100578.947062] R13: ffff880095e6dee8 R14: ffff88007d52c0c8 R15: ffff88001dd3dcf0 <4>[100578.947062] FS: 0000000000000000(0000) GS:ffff8800063c0000(0000) knlGS:0000000000000000 <4>[100578.947062] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[100578.947062] CR2: ffff880095e6dea8 CR3: 000000005cc42000 CR4: 00000000000006e0 <4>[100578.947062] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[100578.947062] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[100578.947062] Process mdt_rdpg01_004 (pid: 21227, threadinfo ffff88001dd3c000, task ffff880019fb6300) <4>[100578.947062] Stack: <4>[100578.947062] ffff880089793b68 ffff880094e95f30 ffffc90006492028 ffffc90006482000 <4>[100578.947062] <d> ffffc90006482000 0000000000000dab ffff88002f40f840 ffff88002f40f840 <4>[100578.947062] <d> ffff88008913ff30 0000000000000104 ffffffffa0684520 0000000000000000 <4>[100578.947062] Call Trace: <4>[100578.947062] [<ffffffffa154681d>] tgt_request_handle+0x34d/0xac0 [ptlrpc] <4>[100578.947062] [<ffffffffa14f7778>] ptlrpc_main+0xcc8/0x1950 [ptlrpc] <4>[100578.947062] [<ffffffffa14f6ab0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] <4>[100578.947062] [<ffffffff81098c06>] kthread+0x96/0xa0 <4>[100578.947062] [<ffffffff8100c24a>] child_rip+0xa/0x20 <4>[100578.947062] [<ffffffff81098b70>] ? kthread+0x0/0xa0 <4>[100578.947062] [<ffffffff8100c240>] ? child_rip+0x0/0x20 <4>[100578.947062] Code: 55 b0 48 8b 70 10 48 83 c2 08 e8 c4 8f 28 e0 49 8b 06 be 01 00 00 00 48 8b 7d c0 48 8b 40 20 ff 50 18 e9 d9 fe ff ff 0f 1f 40 00 <f6> 43 10 01 0f 84 cb fe ff ff 48 8b 7d b0 31 c9 31 d2 be 03 00 <1>[100578.947062] RIP [<ffffffffa100a048>] lu_object_put+0x1d8/0x330 [obdclass] <4>[100578.947062] RSP <ffff88001dd3dcd0> <4>[100578.947062] CR2: ffff880095e6dea8 Crashdump is in /exports/crashdumps/192.168.10.210-2014-07-02-03\:25\:14 |
| Comment by Oleg Drokin [ 10/Jun/15 ] |
|
And I hit this once more today: <3>[38977.244122] LustreError: 13269:0:(lcommon_cl.c:189:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0xb650:0x0]: -16 <1>[39066.074447] BUG: unable to handle kernel paging request at ffff8800979d8ee8 <1>[39066.074813] IP: [<ffffffffa0dccda8>] lu_object_put+0x1d8/0x310 [obdclass] <4>[39066.075143] PGD 1a26063 PUD 501067 PMD 5be067 PTE 80000000979d8060 <4>[39066.075409] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC <4>[39066.075648] last sysfs file: /sys/devices/system/cpu/possible <4>[39066.075896] CPU 1 <4>[39066.075925] Modules linked in: lustre ofd osp lod ost mdt mdd mgs osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net pata_acpi ata_generic ata_piix virtio_pci virtio_ring virtio dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[39066.078005] <4>[39066.078005] Pid: 1668, comm: mdt00_008 Tainted: P --------------- 2.6.32-rhe6.6-debug #1 Red Hat KVM <4>[39066.078005] RIP: 0010:[<ffffffffa0dccda8>] [<ffffffffa0dccda8>] lu_object_put+0x1d8/0x310 [obdclass] <4>[39066.078005] RSP: 0018:ffff880072da9b60 EFLAGS: 00010246 <4>[39066.078005] RAX: 0000000000000000 RBX: ffff8800979d8ed8 RCX: 0000000000000002 <4>[39066.078005] RDX: 0000000000000002 RSI: ffffc900045c6000 RDI: 0000000000000001 <4>[39066.078005] RBP: ffff880072da9bc0 R08: 0000000000000401 R09: 0000000000000879 <4>[39066.078005] R10: 00000000000001a4 R11: cc00000000000000 R12: ffff880072da9be0 <4>[39066.078005] R13: ffff8800979d8f28 R14: ffff88008214a0b0 R15: ffff880072da9b80 <4>[39066.078005] FS: 0000000000000000(0000) GS:ffff880006240000(0000) knlGS:0000000000000000 <4>[39066.078005] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[39066.078005] CR2: ffff8800979d8ee8 CR3: 0000000077b81000 CR4: 00000000000006e0 <4>[39066.078005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[39066.078005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[39066.078005] Process mdt00_008 (pid: 1668, threadinfo ffff880072da8000, task ffff88008bda25c0) <4>[39066.078005] Stack: <4>[39066.078005] ffff880072da9b70 ffff88002b772f30 ffffc900045d6028 ffffc900045c6000 <4>[39066.078005] <d> ffffc900045c6000 ffff8800000006d5 ffff880073a79000 ffff8800979d8ed8 <4>[39066.078005] <d> ffff880073a79000 0000000000000050 ffff880084140e68 ffff880072da9be0 <4>[39066.078005] Call Trace: <4>[39066.078005] [<ffffffffa05e6d8c>] mdt_lvbo_fill+0x55c/0x800 [mdt] <4>[39066.078005] [<ffffffffa12ca7b4>] ldlm_lvbo_fill+0x64/0x2c0 [ptlrpc] <4>[39066.078005] [<ffffffffa12d0f8e>] ldlm_handle_enqueue0+0xe3e/0x13e0 [ptlrpc] <4>[39066.078005] [<ffffffffa134f8c1>] tgt_enqueue+0x61/0x230 [ptlrpc] <4>[39066.078005] [<ffffffffa13505ae>] tgt_request_handle+0x95e/0x10b0 [ptlrpc] <4>[39066.078005] [<ffffffffa1301614>] ptlrpc_main+0xdf4/0x1940 [ptlrpc] <4>[39066.078005] [<ffffffffa1300820>] ? ptlrpc_main+0x0/0x1940 [ptlrpc] <4>[39066.078005] [<ffffffff8109ce4e>] kthread+0x9e/0xc0 <4>[39066.078005] [<ffffffff8100c24a>] child_rip+0xa/0x20 <4>[39066.078005] [<ffffffff8109cdb0>] ? kthread+0x0/0xc0 <4>[39066.078005] [<ffffffff8100c240>] ? child_rip+0x0/0x20 <4>[39066.078005] Code: e8 1e f0 4c e0 48 8b 4d b0 be 01 00 00 00 48 83 01 01 49 8b 06 48 8b 7d c0 48 8b 40 20 ff 50 18 e9 db fe ff ff 66 0f 1f 44 00 00 <f6> 43 10 01 0f 84 cb fe ff ff 48 8b 7d b0 31 c9 31 d2 be 03 00 <1>[39066.078005] RIP [<ffffffffa0dccda8>] lu_object_put+0x1d8/0x310 [obdclass] <4>[39066.078005] RSP <ffff880072da9b60> <4>[39066.078005] CR2: ffff8800979d8ee8 crashdump is in /exports/crashdumps/192.168.10.211-2015-06-09-12\:45\:41/ |
| Comment by Oleg Drokin [ 03/Jul/15 ] |
|
And once more <1>[13618.149942] BUG: unable to handle kernel paging request at ffff880093637f38 <1>[13618.150425] IP: [<ffffffffa152f178>] lu_object_put+0x1d8/0x310 [obdclass] <4>[13618.150834] PGD 1a26063 PUD 501067 PMD 59d067 PTE 8000000093637060 <4>[13618.151154] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC <4>[13618.151639] last sysfs file: /sys/devices/system/cpu/possible <4>[13618.152659] CPU 0 <4>[13618.152698] Modules linked in: lustre ofd osp lod ost mdt mdd mgs osd_ldiskfs ldiskfs lquota lfsck obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass ksocklnd lnet libcfs zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate exportfs jbd sha512_generic sha256_generic ext4 jbd2 mbcache virtio_balloon virtio_console i2c_piix4 i2c_core virtio_net virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] <4>[13618.153072] <4>[13618.153072] Pid: 31385, comm: ptlrpcd_4 Tainted: P --------------- 2.6.32-rhe6.6-debug #1 Red Hat KVM <4>[13618.153072] RIP: 0010:[<ffffffffa152f178>] [<ffffffffa152f178>] lu_object_put+0x1d8/0x310 [obdclass] <4>[13618.153072] RSP: 0018:ffff8800ad931ab0 EFLAGS: 00010246 <4>[13618.153072] RAX: 0000000000000000 RBX: ffff880093637f28 RCX: 0000000000000009 <4>[13618.153072] RDX: 0000000000000009 RSI: ffffc90010cbb000 RDI: 0000000000000008 <4>[13618.153072] RBP: ffff8800ad931b10 R08: 0000000000000402 R09: 0000000000000935 <4>[13618.153072] R10: 0000000000001a4f R11: cc00000000000000 R12: ffff8800ad931e00 <4>[13618.153072] R13: ffff880093637fa0 R14: ffff8800b40deef0 R15: ffff8800ad931ad0 <4>[13618.153072] FS: 0000000000000000(0000) GS:ffff880006200000(0000) knlGS:0000000000000000 <4>[13618.153072] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>[13618.153072] CR2: ffff880093637f38 CR3: 000000004026d000 CR4: 00000000000006f0 <4>[13618.153072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[13618.153072] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>[13618.153072] Process ptlrpcd_4 (pid: 31385, threadinfo ffff8800ad930000, task ffff88003f8c2580) <4>[13618.153072] Stack: <4>[13618.153072] ffff8800b66ca500 ffff88006e736f30 ffffc90010ccb028 ffffc90010cbb000 <4>[13618.153072] <d> ffffc90010cbb000 ffff880000000ee6 ffff8800ad931e00 ffff88007137ba30 <4>[13618.153072] <d> ffff8800ad931e00 0000000000000000 ffff88007137ba60 ffff8800501fcec0 <4>[13618.153072] Call Trace: <4>[13618.153072] [<ffffffffa15360fe>] cl_object_put+0xe/0x10 [obdclass] <4>[13618.153072] [<ffffffffa153f4d4>] cl_req_completion+0xe4/0x680 [obdclass] <4>[13618.153072] [<ffffffffa0736ba8>] brw_interpret+0x9e8/0x2330 [osc] <4>[13618.153072] [<ffffffff8152245e>] ? _spin_unlock+0xe/0x10 <4>[13618.153072] [<ffffffff8152245e>] ? _spin_unlock+0xe/0x10 <4>[13618.153072] [<ffffffffa05863e7>] ? ptlrpc_unregister_bulk+0xb7/0xae0 [ptlrpc] <4>[13618.153072] [<ffffffff8152245e>] ? _spin_unlock+0xe/0x10 <4>[13618.153072] [<ffffffffa057f2a3>] ptlrpc_check_set+0x613/0x1bf0 [ptlrpc] <4>[13618.153072] [<ffffffffa05ad863>] ptlrpcd_check+0x3e3/0x630 [ptlrpc] <4>[13618.153072] [<ffffffffa05ae0cb>] ptlrpcd+0x35b/0x430 [ptlrpc] <4>[13618.153072] [<ffffffff81061630>] ? default_wake_function+0x0/0x20 <4>[13618.153072] [<ffffffffa05add70>] ? ptlrpcd+0x0/0x430 [ptlrpc] <4>[13618.153072] [<ffffffff8109ce4e>] kthread+0x9e/0xc0 <4>[13618.153072] [<ffffffff8100c24a>] child_rip+0xa/0x20 <4>[13618.153072] [<ffffffff8109cdb0>] ? kthread+0x0/0xc0 <4>[13618.153072] [<ffffffff8100c240>] ? child_rip+0x0/0x20 <4>[13618.153072] Code: e8 4e cc d6 df 48 8b 4d b0 be 01 00 00 00 48 83 01 01 49 8b 06 48 8b 7d c0 48 8b 40 20 ff 50 18 e9 db fe ff ff 66 0f 1f 44 00 00 <f6> 43 10 01 0f 84 cb fe ff ff 48 8b 7d b0 31 c9 31 d2 be 03 00 <1>[13618.196141] RIP [<ffffffffa152f178>] lu_object_put+0x1d8/0x310 [obdclass] <4>[13618.196141] RSP <ffff8800ad931ab0> <4>[13618.196141] CR2: ffff880093637f38 |
| Comment by Andreas Dilger [ 28/Feb/20 ] |
|
Close old bug that hasn't been seen in a long time. |