Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.0
-
None
-
3
-
8277
Description
Had this crash happen a few times over the weekend running recovery-small in a loop:
[45479.612899] Lustre: DEBUG MARKER: == recovery-small test 111: mdd setup fail s hould not cause umount oops == 23:46:34 (1368157594) [45482.064957] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quo ta=on. Opts: [45482.424338] Lustre: *** cfs_fail_loc=151, val=0*** [45482.424579] LustreError: 2688:0:(mdd_device.c:378:mdd_changelog_init()) lustre -MDD0000: changelog setup during init failed: rc = -5 [45482.430003] LustreError: 2688:0:(mdd_device.c:879:mdd_prepare()) lustre-MDD000 0: failed to initialize changelog: rc = -5 [45482.430628] LustreError: 2688:0:(obd_mount_server.c:1699:server_fill_super()) Unable to start targets: -5 [45482.548805] BUG: unable to handle kernel paging request at ffff8800b7addb3c [45482.549105] IP: [<ffffffffa07c5c64>] osp_sync_thread+0x404/0x800 [osp]
crashdump and modules in /exports/crashdumps/192.168.10.211-2013-05-09-23\:46\:39/
and
[219436.299761] Lustre: DEBUG MARKER: test_51: failover in 1 sec [219437.524624] Lustre: Failing over lustre-MDT0000 [219437.524899] Lustre: Skipped 1 previous similar message [219437.538081] LustreError: 15244:0:(ldlm_lib.c:2137:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery [219437.539501] Lustre: 15164:0:(ldlm_lib.c:1801:target_recovery_overseer()) recovery is aborted, evict exports in recovery [219437.540030] Lustre: 15164:0:(ldlm_lib.c:1801:target_recovery_overseer()) Skipped 2 previous similar messages [219437.561840] Lustre: lustre-OST0001: deleting orphan objects from 0x0:8387 to 0x0:8417 [219437.561859] LustreError: 15158:0:(osp_precreate.c:737:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -108 [219437.805332] BUG: unable to handle kernel paging request at ffff88005b433b3c [219437.805644] IP: [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp] [219437.805943] PGD 1a26063 PUD 300067 PMD 3db067 PTE 800000005b433060 [219437.806237] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [219437.806495] last sysfs file: /sys/devices/system/cpu/possible [219437.806763] CPU 1 [219437.806803] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs] [219437.809282] [219437.809282] Pid: 15159, comm: osp-syn-0 Not tainted 2.6.32-rhe6.4-debug #2 Bochs Bochs [219437.809282] RIP: 0010:[<ffffffffa07c1c64>] [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp] [219437.809282] RSP: 0018:ffff880097561e20 EFLAGS: 00010286 [219437.809282] RAX: 0000000000000001 RBX: ffff88005b4337f0 RCX: ffff88005b433ac0 [219437.809282] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282 [219437.809282] RBP: ffff880097561f40 R08: 0000000000000000 R09: 0000000000000001 [219437.809282] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880097561e80 [219437.809282] R13: 0000000000000000 R14: ffff880097561ec0 R15: ffff8800865ea540 [219437.809282] FS: 00007f84050ce700(0000) GS:ffff880006240000(0000) knlGS:0000000000000000 [219437.809282] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [219437.809282] CR2: ffff88005b433b3c CR3: 0000000001a25000 CR4: 00000000000006e0 [219437.809282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [219437.809282] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [219437.809282] Process osp-syn-0 (pid: 15159, threadinfo ffff880097560000, task ffff8800865ea540) [219437.809282] Stack: [219437.809282] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [219437.809282] <d> ffff88005b433aa8 ffff8800940b6ef0 ffff8800865ea540 ffff88007a731f30 [219437.809282] <d> 0000000000000000 0000000000000000 0000000000000000 ffff88005b433af8 [219437.809282] Call Trace: [219437.809282] [<ffffffff814fe09e>] ? _spin_unlock_irq+0xe/0x20 [219437.809282] [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp] [219437.809282] [<ffffffff8100c10a>] child_rip+0xa/0x20 [219437.809282] [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp] [219437.809282] [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp] [219437.809282] [<ffffffff8100c100>] ? child_rip+0x0/0x20 [219437.809282] Code: b5 18 ff ff ff 4c 89 e7 e8 7a f7 ce ff 85 c0 0f 85 92 01 00 00 48 8b bd 00 ff ff ff c7 83 98 02 00 00 01 00 00 00 e8 0c 4b 0d 00 <44> 8b 9b 4c 03 00 00 45 85 db 0f 85 6f 03 00 00 4c 89 e7 e8 24 [219437.809282] RIP [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]
Crashdump in /exports/crashdumps/192.168.10.211-2013-05-12-12\:44\:26/vmcore
code snapshot: master-20130509