Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3328

Crash in osp_sync_thread

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 8277

    Description

      Had this crash happen a few times over the weekend running recovery-small in a loop:

      [45479.612899] Lustre: DEBUG MARKER: == recovery-small test 111: mdd setup fail s
      hould not cause umount oops == 23:46:34 (1368157594)
      [45482.064957] LDISKFS-fs (loop0): mounted filesystem with ordered data mode. quo
      ta=on. Opts: 
      [45482.424338] Lustre: *** cfs_fail_loc=151, val=0***
      [45482.424579] LustreError: 2688:0:(mdd_device.c:378:mdd_changelog_init()) lustre
      -MDD0000: changelog setup during init failed: rc = -5
      [45482.430003] LustreError: 2688:0:(mdd_device.c:879:mdd_prepare()) lustre-MDD000
      0: failed to initialize changelog: rc = -5
      [45482.430628] LustreError: 2688:0:(obd_mount_server.c:1699:server_fill_super()) 
      Unable to start targets: -5
      [45482.548805] BUG: unable to handle kernel paging request at ffff8800b7addb3c
      [45482.549105] IP: [<ffffffffa07c5c64>] osp_sync_thread+0x404/0x800 [osp]
      

      crashdump and modules in /exports/crashdumps/192.168.10.211-2013-05-09-23\:46\:39/

      and

      [219436.299761] Lustre: DEBUG MARKER: test_51: failover in 1 sec
      [219437.524624] Lustre: Failing over lustre-MDT0000
      [219437.524899] Lustre: Skipped 1 previous similar message
      [219437.538081] LustreError: 15244:0:(ldlm_lib.c:2137:target_stop_recovery_thread()) lustre-MDT0000: Aborting recovery
      [219437.539501] Lustre: 15164:0:(ldlm_lib.c:1801:target_recovery_overseer()) recovery is aborted, evict exports in recovery
      [219437.540030] Lustre: 15164:0:(ldlm_lib.c:1801:target_recovery_overseer()) Skipped 2 previous similar messages
      [219437.561840] Lustre: lustre-OST0001: deleting orphan objects from 0x0:8387 to 0x0:8417
      [219437.561859] LustreError: 15158:0:(osp_precreate.c:737:osp_precreate_cleanup_orphans()) lustre-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -108
      [219437.805332] BUG: unable to handle kernel paging request at ffff88005b433b3c
      [219437.805644] IP: [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]
      [219437.805943] PGD 1a26063 PUD 300067 PMD 3db067 PTE 800000005b433060
      [219437.806237] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [219437.806495] last sysfs file: /sys/devices/system/cpu/possible
      [219437.806763] CPU 1 
      [219437.806803] Modules linked in: lustre ofd osp lod ost mdt osd_ldiskfs fsfilt_ldiskfs ldiskfs mdd mgs lquota obdecho mgc lov osc mdc lmv fid fld ptlrpc obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache auth_rpcgss nfs_acl sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: libcfs]
      [219437.809282] 
      [219437.809282] Pid: 15159, comm: osp-syn-0 Not tainted 2.6.32-rhe6.4-debug #2 Bochs Bochs
      [219437.809282] RIP: 0010:[<ffffffffa07c1c64>]  [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]
      [219437.809282] RSP: 0018:ffff880097561e20  EFLAGS: 00010286
      [219437.809282] RAX: 0000000000000001 RBX: ffff88005b4337f0 RCX: ffff88005b433ac0
      [219437.809282] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000282
      [219437.809282] RBP: ffff880097561f40 R08: 0000000000000000 R09: 0000000000000001
      [219437.809282] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880097561e80
      [219437.809282] R13: 0000000000000000 R14: ffff880097561ec0 R15: ffff8800865ea540
      [219437.809282] FS:  00007f84050ce700(0000) GS:ffff880006240000(0000) knlGS:0000000000000000
      [219437.809282] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [219437.809282] CR2: ffff88005b433b3c CR3: 0000000001a25000 CR4: 00000000000006e0
      [219437.809282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [219437.809282] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [219437.809282] Process osp-syn-0 (pid: 15159, threadinfo ffff880097560000, task ffff8800865ea540)
      [219437.809282] Stack:
      [219437.809282]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [219437.809282] <d> ffff88005b433aa8 ffff8800940b6ef0 ffff8800865ea540 ffff88007a731f30
      [219437.809282] <d> 0000000000000000 0000000000000000 0000000000000000 ffff88005b433af8
      [219437.809282] Call Trace:
      [219437.809282]  [<ffffffff814fe09e>] ? _spin_unlock_irq+0xe/0x20
      [219437.809282]  [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp]
      [219437.809282]  [<ffffffff8100c10a>] child_rip+0xa/0x20
      [219437.809282]  [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp]
      [219437.809282]  [<ffffffffa07c1860>] ? osp_sync_thread+0x0/0x800 [osp]
      [219437.809282]  [<ffffffff8100c100>] ? child_rip+0x0/0x20
      [219437.809282] Code: b5 18 ff ff ff 4c 89 e7 e8 7a f7 ce ff 85 c0 0f 85 92 01 00 00 48 8b bd 00 ff ff ff c7 83 98 02 00 00 01 00 00 00 e8 0c 4b 0d 00 <44> 8b 9b 4c 03 00 00 45 85 db 0f 85 6f 03 00 00 4c 89 e7 e8 24 
      [219437.809282] RIP  [<ffffffffa07c1c64>] osp_sync_thread+0x404/0x800 [osp]
      

      Crashdump in /exports/crashdumps/192.168.10.211-2013-05-12-12\:44\:26/vmcore
      code snapshot: master-20130509

      Attachments

        Activity

          People

            wc-triage WC Triage
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: