Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.4.0
-
None
-
3
-
5937
Description
I have been hitting hangs/crashes in sanity test 60a for quite a while and thought these are OOM related, but today I got it happening on a bigger memory box and it crashed like this:
[500819.731843] Lustre: 14303:0:(llog-test.c:864:llog_test_7()) 7e: test llog_changelog_rec [500821.282423] BUG: unable to handle kernel paging request at ffff880011cfced0 [500821.282912] IP: [<ffffffffa06e45ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] [500821.283618] PGD 1a26063 PUD 1a2a063 PMD 18e067 PTE 11cfc160 [500821.284042] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [500821.284422] last sysfs file: /sys/devices/virtual/block/loop6/queue/scheduler [500821.284517] CPU 1 [500821.284517] Modules linked in: llog_test lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] [500821.284517] [500821.284517] Pid: 24793, comm: jbd2/loop3-8 Not tainted 2.6.32-debug #6 Bochs Bochs [500821.284517] RIP: 0010:[<ffffffffa06e45ce>] [<ffffffffa06e45ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs] [500821.284517] RSP: 0018:ffff88002190fcd0 EFLAGS: 00010202 [500821.284517] RAX: ffff88001d951f40 RBX: ffff88001d951f40 RCX: ffff880011cfced0 [500821.284517] RDX: ffff88006337ff40 RSI: 000000001d953160 RDI: ffff8800a209cb60 [500821.284517] RBP: ffff88002190fd10 R08: 0000000000000001 R09: ffff880000000000 [500821.284517] R10: ffff880023d28000 R11: 0000000087654321 R12: ffff88006337ff40 [500821.284517] R13: ffff8800a209cb60 R14: ffff8800419b0bf0 R15: ffff880011cfced0 [500821.284517] FS: 0000000000000000(0000) GS:ffff880006280000(0000) knlGS:0000000000000000 [500821.284517] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [500821.284517] CR2: ffff880011cfced0 CR3: 0000000001a25000 CR4: 00000000000006e0 [500821.284517] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [500821.284517] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [500821.284517] Process jbd2/loop3-8 (pid: 24793, threadinfo ffff88002190e000, task ffff8800203a4180) [500821.284517] Stack: [500821.284517] ffff88002190fce0 00000000810376d9 ffff88002190fd10 ffff8800256b2c58 [500821.284517] <d> ffff880011cfcdf0 ffff8800256b27f0 0000000000000000 00000000000000fc [500821.284517] <d> ffff88002190fe50 ffffffffa0391d37 ffff88002190fd80 ffffffff81009310 [500821.284517] Call Trace: [500821.284517] [<ffffffffa0391d37>] jbd2_journal_commit_transaction+0x13d7/0x16e0 [jbd2] [500821.284517] [<ffffffff81009310>] ? __switch_to+0xd0/0x320 [500821.284517] [<ffffffff8107c65b>] ? try_to_del_timer_sync+0x7b/0xe0 [500821.284517] [<ffffffffa0397627>] kjournald2+0xb7/0x210 [jbd2] [500821.284517] [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40 [500821.284517] [<ffffffffa0397570>] ? kjournald2+0x0/0x210 [jbd2] [500821.284517] [<ffffffff8108fa16>] kthread+0x96/0xa0 [500821.284517] [<ffffffff8100c14a>] child_rip+0xa/0x20 [500821.284517] [<ffffffff8108f980>] ? kthread+0x0/0xa0 [500821.284517] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [500821.284517] Code: 00 00 00 49 81 c7 e0 00 00 00 4c 39 fb 4c 8b 23 48 89 d8 74 48 4c 89 e2 eb 06 0f 1f 00 49 89 d4 48 8b 4b 08 4c 89 ef 48 89 4a 08 <48> 89 11 48 89 03 48 89 43 08 e8 53 69 e1 e0 8b 55 cc 48 89 de
I have crashdump with modules in /exports/crashdumps/192.168.10.219-2012-12-22-07:51:12