[LU-2521] sanity test 60a crash Created: 22/Dec/12  Updated: 08/Nov/17  Resolved: 08/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 5937

 Description   

I have been hitting hangs/crashes in sanity test 60a for quite a while and thought these are OOM related, but today I got it happening on a bigger memory box and it crashed like this:

[500819.731843] Lustre: 14303:0:(llog-test.c:864:llog_test_7()) 7e: test llog_changelog_rec
[500821.282423] BUG: unable to handle kernel paging request at ffff880011cfced0
[500821.282912] IP: [<ffffffffa06e45ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs]
[500821.283618] PGD 1a26063 PUD 1a2a063 PMD 18e067 PTE 11cfc160
[500821.284042] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[500821.284422] last sysfs file: /sys/devices/virtual/block/loop6/queue/scheduler
[500821.284517] CPU 1 
[500821.284517] Modules linked in: llog_test lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
[500821.284517] 
[500821.284517] Pid: 24793, comm: jbd2/loop3-8 Not tainted 2.6.32-debug #6 Bochs Bochs
[500821.284517] RIP: 0010:[<ffffffffa06e45ce>]  [<ffffffffa06e45ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs]
[500821.284517] RSP: 0018:ffff88002190fcd0  EFLAGS: 00010202
[500821.284517] RAX: ffff88001d951f40 RBX: ffff88001d951f40 RCX: ffff880011cfced0
[500821.284517] RDX: ffff88006337ff40 RSI: 000000001d953160 RDI: ffff8800a209cb60
[500821.284517] RBP: ffff88002190fd10 R08: 0000000000000001 R09: ffff880000000000
[500821.284517] R10: ffff880023d28000 R11: 0000000087654321 R12: ffff88006337ff40
[500821.284517] R13: ffff8800a209cb60 R14: ffff8800419b0bf0 R15: ffff880011cfced0
[500821.284517] FS:  0000000000000000(0000) GS:ffff880006280000(0000) knlGS:0000000000000000
[500821.284517] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[500821.284517] CR2: ffff880011cfced0 CR3: 0000000001a25000 CR4: 00000000000006e0
[500821.284517] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[500821.284517] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[500821.284517] Process jbd2/loop3-8 (pid: 24793, threadinfo ffff88002190e000, task ffff8800203a4180)
[500821.284517] Stack:
[500821.284517]  ffff88002190fce0 00000000810376d9 ffff88002190fd10 ffff8800256b2c58
[500821.284517] <d> ffff880011cfcdf0 ffff8800256b27f0 0000000000000000 00000000000000fc
[500821.284517] <d> ffff88002190fe50 ffffffffa0391d37 ffff88002190fd80 ffffffff81009310
[500821.284517] Call Trace:
[500821.284517]  [<ffffffffa0391d37>] jbd2_journal_commit_transaction+0x13d7/0x16e0 [jbd2]
[500821.284517]  [<ffffffff81009310>] ? __switch_to+0xd0/0x320
[500821.284517]  [<ffffffff8107c65b>] ? try_to_del_timer_sync+0x7b/0xe0
[500821.284517]  [<ffffffffa0397627>] kjournald2+0xb7/0x210 [jbd2]
[500821.284517]  [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40
[500821.284517]  [<ffffffffa0397570>] ? kjournald2+0x0/0x210 [jbd2]
[500821.284517]  [<ffffffff8108fa16>] kthread+0x96/0xa0
[500821.284517]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[500821.284517]  [<ffffffff8108f980>] ? kthread+0x0/0xa0
[500821.284517]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
[500821.284517] Code: 00 00 00 49 81 c7 e0 00 00 00 4c 39 fb 4c 8b 23 48 89 d8 74 48 4c 89 e2 eb 06 0f 1f 00 49 89 d4 48 8b 4b 08 4c 89 ef 48 89 4a 08 <48> 89 11 48 89 03 48 89 43 08 e8 53 69 e1 e0 8b 55 cc 48 89 de 

I have crashdump with modules in /exports/crashdumps/192.168.10.219-2012-12-22-07:51:12



 Comments   
Comment by Oleg Drokin [ 23/Dec/12 ]

Just hit is again

[48129.462822] Lustre: 7823:0:(llog-test.c:541:llog_test_5()) 5c: Cancel 65536 records, see one log zapped
[48131.392359] Lustre: 7823:0:(llog-test.c:549:llog_test_5()) 5c: print the catalog entries.. we expect 1
[48133.896995] BUG: unable to handle kernel paging request at ffff88009485bed0
[48133.897043] IP: [<ffffffffa06d35ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs]
[48133.897043] PGD 1a26063 PUD 501067 PMD 5a6067 PTE 800000009485b160
[48133.897043] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[48133.897043] last sysfs file: /sys/devices/virtual/block/loop4/queue/scheduler
[48133.897043] CPU 2 
[48133.897043] Modules linked in: llog_test lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs ext2 exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
[48133.897043] 
[48133.897043] Pid: 18270, comm: jbd2/loop1-8 Not tainted 2.6.32-debug #6 Bochs Bochs
[48133.897043] RIP: 0010:[<ffffffffa06d35ce>]  [<ffffffffa06d35ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs]
[48133.897043] RSP: 0018:ffff8800598b3cd0  EFLAGS: 00010283
[48133.897043] RAX: ffff88002cc58f40 RBX: ffff88002cc58f40 RCX: ffff88009485bed0
[48133.897043] RDX: ffff880020836f40 RSI: 000000002cc5a160 RDI: ffff8800094f4b60
[48133.897043] RBP: ffff8800598b3d10 R08: 0000000000000001 R09: ffff880000000000
[48133.897043] R10: ffff880050ea6000 R11: 0000000087654321 R12: ffff880020836f40
[48133.897043] R13: ffff8800094f4b60 R14: ffff88009ba5abf0 R15: ffff88009485bed0
[48133.897043] FS:  0000000000000000(0000) GS:ffff880006300000(0000) knlGS:0000000000000000
[48133.897043] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[48133.897043] CR2: ffff88009485bed0 CR3: 0000000001a25000 CR4: 00000000000006e0
[48133.897043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[48133.897043] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[48133.897043] Process jbd2/loop1-8 (pid: 18270, threadinfo ffff8800598b2000, task ffff88006965c140)
[48133.897043] Stack:
[48133.897043]  ffff8800598b3ce0 00000000810376d9 ffff8800598b3d10 ffff88000c996c58
[48133.897043] <d> ffff88009485bdf0 ffff88000c9967f0 0000000000000000 00000000000000f8
[48133.897043] <d> ffff8800598b3e50 ffffffffa03a3d37 ffff8800598b3d80 ffffffff81009310
[48133.897043] Call Trace:
[48133.897043]  [<ffffffffa03a3d37>] jbd2_journal_commit_transaction+0x13d7/0x16e0 [jbd2]
[48133.897043]  [<ffffffff81009310>] ? __switch_to+0xd0/0x320
[48133.897043]  [<ffffffff8107c65b>] ? try_to_del_timer_sync+0x7b/0xe0
[48133.897043]  [<ffffffffa03a9627>] kjournald2+0xb7/0x210 [jbd2]
[48133.897043]  [<ffffffff8108fd60>] ? autoremove_wake_function+0x0/0x40
[48133.897043]  [<ffffffffa03a9570>] ? kjournald2+0x0/0x210 [jbd2]
[48133.897043]  [<ffffffff8108fa16>] kthread+0x96/0xa0
[48133.897043]  [<ffffffff8100c14a>] child_rip+0xa/0x20
[48133.897043]  [<ffffffff8108f980>] ? kthread+0x0/0xa0
[48133.897043]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
[48133.897043] Code: 00 00 00 49 81 c7 e0 00 00 00 4c 39 fb 4c 8b 23 48 89 d8 74 48 4c 89 e2 eb 06 0f 1f 00 49 89 d4 48 8b 4b 08 4c 89 ef 48 89 4a 08 <48> 89 11 48 89 03 48 89 43 08 e8 53 79 e2 e0 8b 55 cc 48 89 de 
[48133.897043] RIP  [<ffffffffa06d35ce>] ldiskfs_journal_commit_callback+0x6e/0xc0 [ldiskfs]

crashdump is in /exports/crashdumps/192.168.10.219-2012-12-23-06\:02\:21/

Comment by Andreas Dilger [ 08/Nov/17 ]

No reports of this issue in a long time.

Generated at Sat Feb 10 01:25:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.