[LU-3328] Crash in osp_sync_thread Created: 13/May/13  Updated: 15/May/13  Resolved: 15/May/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 8277

 Description   

Had this crash happen a few times over the weekend running recovery-small in a loop:

[45479.612899] Lustre: DEBUG MARKER: == recovery-small test 111: mdd setup fail s
hould not cause umount oops == 23:46:34 (1368157594)
[45482.064957] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quo
ta=on. Opts: 
[45482.424338] Lustre: *** cfs_fail_loc=151, val=0***
[45482.424579] LustreError: 2688:0:(mdd_device.c:378:mdd_changelog_init()) lustre
-MDD0000: changelog setup during init failed: rc = -5
[45482.430003] LustreError: 2688:0:(mdd_device.c:879:mdd_prepare()) lustre-MDD000
0: failed to initialize changelog: rc = -5
[45482.430628] LustreError: 2688:0:(obd_mount_server.c:1699:server_fill_super()) 
Unable to start targets: -5
[45482.548805] BUG: unable to handle kernel paging request at ffff8800b7addb3c
[45482.549105] IP: [<ffffffffa07c5c64>] osp_sync_thread+0x404/0x800 [osp]

crashdump and modules in /exports/crashdumps/192.168.10.211-2013-05-09-23\:46\:39/

and

[219436.299761] Lustre: DEBUG MARKER: test_51: failover in 1 sec
[219437.524624] Lustre: Failing over lustre-MDT0000
[219437.524899] Lustre: Skipped 1 previous similar message
[219437.538081] LustreError: 15244:0:(ldlm_lib.c:2137:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
[219437.539501] Lustre: 15164:0:(ldlm_lib.c:1801:target_recovery_overseer()) recovery is aborted, evict exports in recovery
[219437.540030] Lustre: 15164:0:(ldlm_lib.c:1801:target_recovery_overseer()) Skipped 2 previous similar messages
[219437.561840] Lustre: lustre-OST0001: deleting orphan objects from 0x0:8387 to 0x0:8417
[219437.561859] LustreError: 15158:0:(osp_precreate.c:737:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -108
[219437.805332] BUG: unable to handle kernel paging request at ffff88005b433b3c
[219437.805644] IP: [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]
[219437.805943] PGD 1a26063 PUD 300067 PMD 3db067 PTE 800000005b433060
[219437.806237] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[219437.806495] last sysfs file: /sys/devices/system/cpu/possible
[219437.806763] CPU 1 
[219437.806803] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
[219437.809282] 
[219437.809282] Pid: 15159, comm: osp-syn-0 Not tainted 2.6.32-rhe6.4-debug #2 Bochs Bochs
[219437.809282] RIP: 0010:[<ffffffffa07c1c64>]  [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]
[219437.809282] RSP: 0018:ffff880097561e20  EFLAGS: 00010286
[219437.809282] RAX: 0000000000000001 RBX: ffff88005b4337f0 RCX: ffff88005b433ac0
[219437.809282] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
[219437.809282] RBP: ffff880097561f40 R08: 0000000000000000 R09: 0000000000000001
[219437.809282] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880097561e80
[219437.809282] R13: 0000000000000000 R14: ffff880097561ec0 R15: ffff8800865ea540
[219437.809282] FS:  00007f84050ce700(0000) GS:ffff880006240000(0000) knlGS:0000000000000000
[219437.809282] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[219437.809282] CR2: ffff88005b433b3c CR3: 0000000001a25000 CR4: 00000000000006e0
[219437.809282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[219437.809282] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[219437.809282] Process osp-syn-0 (pid: 15159, threadinfo ffff880097560000, task ffff8800865ea540)
[219437.809282] Stack:
[219437.809282]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[219437.809282] <d> ffff88005b433aa8 ffff8800940b6ef0 ffff8800865ea540 ffff88007a731f30
[219437.809282] <d> 0000000000000000 0000000000000000 0000000000000000 ffff88005b433af8
[219437.809282] Call Trace:
[219437.809282]  [<ffffffff814fe09e>] ? _spin_unlock_irq+0xe/0x20
[219437.809282]  [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp]
[219437.809282]  [<ffffffff8100c10a>] child_rip+0xa/0x20
[219437.809282]  [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp]
[219437.809282]  [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp]
[219437.809282]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
[219437.809282] Code: b5 18 ff ff ff 4c 89 e7 e8 7a f7 ce ff 85 c0 0f 85 92 01 00 00 48 8b bd 00 ff ff ff c7 83 98 02 00 00 01 00 00 00 e8 0c 4b 0d 00 <44> 8b 9b 4c 03 00 00 45 85 db 0f 85 6f 03 00 00 4c 89 e7 e8 24 
[219437.809282] RIP  [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]

Crashdump in /exports/crashdumps/192.168.10.211-2013-05-12-12\:44\:26/vmcore
code snapshot: master-20130509



 Comments   
Comment by Oleg Drokin [ 13/May/13 ]

Alex proposes this patch: http://review.whamcloud.com/6329

Comment by Oleg Drokin [ 15/May/13 ]

landed, fixed.

Generated at Sat Feb 10 01:32:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.