[LU-2521] sanity test 60a crash Created: 22/Dec/12 Updated: 08/Nov/17 Resolved: 08/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 5937 |
| Description |
|
I have been hitting hangs/crashes in sanity test 60a for quite a while and thought these are OOM related, but today I got it happening on a bigger memory box and it crashed like this: [500819.731843] Lustre: 14303:0:(llog-test.c:864:llog_test_7()) 7e: test llog_changelog_rec [500821.282423] BUG: unable to handle kernel paging request at ffff880011cfced0 [500821.282912] IP: [<ffffffffa06e45ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] [500821.283618] PGD 1a26063 PUD 1a2a063 PMD 18e067 PTE 11cfc160 [500821.284042] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [500821.284422] last sysfs file: /sys/devices/virtual/block/loop6/queue/scheduler [500821.284517] CPU 1 [500821.284517] Modules linked in: llog_test lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] [500821.284517] [500821.284517] Pid: 24793, comm: jbd2/loop3-8 Not tainted 2.6.32-debug #6 Bochs Bochs [500821.284517] RIP: 0010:[<ffffffffa06e45ce>] [<ffffffffa06e45ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] [500821.284517] RSP: 0018:ffff88002190fcd0 EFLAGS: 00010202 [500821.284517] RAX: ffff88001d951f40 RBX: ffff88001d951f40 RCX: ffff880011cfced0 [500821.284517] RDX: ffff88006337ff40 RSI: 000000001d953160 RDI: ffff8800a209cb60 [500821.284517] RBP: ffff88002190fd10 R08: 0000000000000001 R09: ffff880000000000 [500821.284517] R10: ffff880023d28000 R11: 0000000087654321 R12: ffff88006337ff40 [500821.284517] R13: ffff8800a209cb60 R14: ffff8800419b0bf0 R15: ffff880011cfced0 [500821.284517] FS: 0000000000000000(0000) GS:ffff880006280000(0000) knlGS:0000000000000000 [500821.284517] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [500821.284517] CR2: ffff880011cfced0 CR3: 0000000001a25000 CR4: 00000000000006e0 [500821.284517] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [500821.284517] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [500821.284517] Process jbd2/loop3-8 (pid: 24793, threadinfo ffff88002190e000, task ffff8800203a4180) [500821.284517] Stack: [500821.284517] ffff88002190fce0 00000000810376d9 ffff88002190fd10 ffff8800256b2c58 [500821.284517] <d> ffff880011cfcdf0 ffff8800256b27f0 0000000000000000 00000000000000fc [500821.284517] <d> ffff88002190fe50 ffffffffa0391d37 ffff88002190fd80 ffffffff81009310 [500821.284517] Call Trace: [500821.284517] [<ffffffffa0391d37>] jbd2_journal_commit_transaction+0x13d7/0x16e0 [jbd2] [500821.284517] [<ffffffff81009310>] ? __switch_to+0xd0/0x320 [500821.284517] [<ffffffff8107c65b>] ? try_to_del_timer_sync+0x7b/0xe0 [500821.284517] [<ffffffffa0397627>] kjournald2+0xb7/0x210 [jbd2] [500821.284517] [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40 [500821.284517] [<ffffffffa0397570>] ? kjournald2+0x0/0x210 [jbd2] [500821.284517] [<ffffffff8108fa16>] kthread+0x96/0xa0 [500821.284517] [<ffffffff8100c14a>] child_rip+0xa/0x20 [500821.284517] [<ffffffff8108f980>] ? kthread+0x0/0xa0 [500821.284517] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [500821.284517] Code: 00 00 00 49 81 c7 e0 00 00 00 4c 39 fb 4c 8b 23 48 89 d8 74 48 4c 89 e2 eb 06 0f 1f 00 49 89 d4 48 8b 4b 08 4c 89 ef 48 89 4a 08 <48> 89 11 48 89 03 48 89 43 08 e8 53 69 e1 e0 8b 55 cc 48 89 de I have crashdump with modules in /exports/crashdumps/192.168.10.219-2012-12-22-07:51:12 |
| Comments |
| Comment by Oleg Drokin [ 23/Dec/12 ] |
|
Just hit is again [48129.462822] Lustre: 7823:0:(llog-test.c:541:llog_test_5()) 5c: Cancel 65536 records, see one log zapped [48131.392359] Lustre: 7823:0:(llog-test.c:549:llog_test_5()) 5c: print the catalog entries.. we expect 1 [48133.896995] BUG: unable to handle kernel paging request at ffff88009485bed0 [48133.897043] IP: [<ffffffffa06d35ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] [48133.897043] PGD 1a26063 PUD 501067 PMD 5a6067 PTE 800000009485b160 [48133.897043] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [48133.897043] last sysfs file: /sys/devices/virtual/block/loop4/queue/scheduler [48133.897043] CPU 2 [48133.897043] Modules linked in: llog_test lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] [48133.897043] [48133.897043] Pid: 18270, comm: jbd2/loop1-8 Not tainted 2.6.32-debug #6 Bochs Bochs [48133.897043] RIP: 0010:[<ffffffffa06d35ce>] [<ffffffffa06d35ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] [48133.897043] RSP: 0018:ffff8800598b3cd0 EFLAGS: 00010283 [48133.897043] RAX: ffff88002cc58f40 RBX: ffff88002cc58f40 RCX: ffff88009485bed0 [48133.897043] RDX: ffff880020836f40 RSI: 000000002cc5a160 RDI: ffff8800094f4b60 [48133.897043] RBP: ffff8800598b3d10 R08: 0000000000000001 R09: ffff880000000000 [48133.897043] R10: ffff880050ea6000 R11: 0000000087654321 R12: ffff880020836f40 [48133.897043] R13: ffff8800094f4b60 R14: ffff88009ba5abf0 R15: ffff88009485bed0 [48133.897043] FS: 0000000000000000(0000) GS:ffff880006300000(0000) knlGS:0000000000000000 [48133.897043] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [48133.897043] CR2: ffff88009485bed0 CR3: 0000000001a25000 CR4: 00000000000006e0 [48133.897043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [48133.897043] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [48133.897043] Process jbd2/loop1-8 (pid: 18270, threadinfo ffff8800598b2000, task ffff88006965c140) [48133.897043] Stack: [48133.897043] ffff8800598b3ce0 00000000810376d9 ffff8800598b3d10 ffff88000c996c58 [48133.897043] <d> ffff88009485bdf0 ffff88000c9967f0 0000000000000000 00000000000000f8 [48133.897043] <d> ffff8800598b3e50 ffffffffa03a3d37 ffff8800598b3d80 ffffffff81009310 [48133.897043] Call Trace: [48133.897043] [<ffffffffa03a3d37>] jbd2_journal_commit_transaction+0x13d7/0x16e0 [jbd2] [48133.897043] [<ffffffff81009310>] ? __switch_to+0xd0/0x320 [48133.897043] [<ffffffff8107c65b>] ? try_to_del_timer_sync+0x7b/0xe0 [48133.897043] [<ffffffffa03a9627>] kjournald2+0xb7/0x210 [jbd2] [48133.897043] [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40 [48133.897043] [<ffffffffa03a9570>] ? kjournald2+0x0/0x210 [jbd2] [48133.897043] [<ffffffff8108fa16>] kthread+0x96/0xa0 [48133.897043] [<ffffffff8100c14a>] child_rip+0xa/0x20 [48133.897043] [<ffffffff8108f980>] ? kthread+0x0/0xa0 [48133.897043] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [48133.897043] Code: 00 00 00 49 81 c7 e0 00 00 00 4c 39 fb 4c 8b 23 48 89 d8 74 48 4c 89 e2 eb 06 0f 1f 00 49 89 d4 48 8b 4b 08 4c 89 ef 48 89 4a 08 <48> 89 11 48 89 03 48 89 43 08 e8 53 79 e2 e0 8b 55 cc 48 89 de [48133.897043] RIP [<ffffffffa06d35ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] crashdump is in /exports/crashdumps/192.168.10.219-2012-12-23-06\:02\:21/ |
| Comment by Andreas Dilger [ 08/Nov/17 ] |
|
No reports of this issue in a long time. |