Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6696

ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.5.3, Lustre 2.8.0
    • None
    • 2
    • 9223372036854775807

    Description

      LustreError: 11-0: hw_nb-OST0016-osc-MDT0000: Communicating with 10.151.26.55@o2ib, operation ost_connect failed with -114.
      LustreError: 6488:0:(llog_cat.c:866:llog_cat_init_and_process()) hw_nb-OST0024-osc-MDT0000: llog_process() with cat_cancel_cb failed: rc = -5
      LustreError: 6580:0:(osp_sync.c:874:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5
      LustreError: 6580:0:(osp_sync.c:874:osp_sync_thread()) LBUG
      Pid: 6580, comm: osp-syn-36-0
      
      Call Trace:
       [<ffffffffa05cf895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa05cfe97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa10d9243>] osp_sync_thread+0x753/0x7d0 [osp]
       [<ffffffff81559b9e>] ? thread_return+0x4e/0x770
       [<ffffffffa10d8af0>] ? osp_sync_thread+0x0/0x7d0 [osp]
      
      Entering kdb (current=0xffff8803b5e04080, pid 6580) on processor 3 Oops: (null)
      due to oops @ 0x0
      kdba_dumpregs: pt_regs not available, use bt* or pid to select a different task
      [3]kdb> 
      

      Attachments

        Issue Links

          Activity

            [LU-6696] ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 0 in progress, 0 in flight: -5

            Patch has landed to master for 2.9.0.

            The tool patch is being tracked by LU-7011

            jgmitter Joseph Gmitter (Inactive) added a comment - Patch has landed to master for 2.9.0. The tool patch is being tracked by LU-7011

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19856/
            Subject: LU-6696 llog: improve error handling
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 53d2f414d75ac1302b53017376ca2f1fda1f3d17

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19856/ Subject: LU-6696 llog: improve error handling Project: fs/lustre-release Branch: master Current Patch Set: Commit: 53d2f414d75ac1302b53017376ca2f1fda1f3d17

            Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/19856
            Subject: LU-6696 llog: improve error handling
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 12153490536bb3f1049631720b3629de68ad8574

            gerrit Gerrit Updater added a comment - Bobi Jam (bobijam@hotmail.com) uploaded a new patch: http://review.whamcloud.com/19856 Subject: LU-6696 llog: improve error handling Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 12153490536bb3f1049631720b3629de68ad8574

            Andreas, the 15250 is not in master, so can be ported, 15247 has master patch which is 15245 and is tracked under LU-7011, it is not landed but will not be lost after the closing of this ticket.

            tappro Mikhail Pershin added a comment - Andreas, the 15250 is not in master, so can be ported, 15247 has master patch which is 15245 and is tracked under LU-7011 , it is not landed but will not be lost after the closing of this ticket.

            It doesn't appear that either http://review.whamcloud.com/15250 or http://review.whamcloud.com/15247 have landed to master. Are these patches no longer needed (and should be abandoned) because of different patches to master, or do they need to be ported to master?

            adilger Andreas Dilger added a comment - It doesn't appear that either http://review.whamcloud.com/15250 or http://review.whamcloud.com/15247 have landed to master. Are these patches no longer needed (and should be abandoned) because of different patches to master, or do they need to be ported to master?

            We can close this LU.

            mhanafi Mahmoud Hanafi added a comment - We can close this LU.

            I wonder about -115 (EINPROGRESS) error code and think it is from obd_fid_alloc() which may do RPC to the master MDT. While we need better error handling in OSP, in this particular case I think it is also not right to return -EINPROGRESS from the FID/SEQ code at all, it should be handled inside.

            tappro Mikhail Pershin added a comment - I wonder about -115 (EINPROGRESS) error code and think it is from obd_fid_alloc() which may do RPC to the master MDT. While we need better error handling in OSP, in this particular case I think it is also not right to return -EINPROGRESS from the FID/SEQ code at all, it should be handled inside.
            sarah Sarah Liu added a comment -

            Hit this bug on master branch, replay-single test_60 failed. lustre-master build# 3175 RHEL7 DNE

            https://testing.hpdd.intel.com/test_logs/d35a0490-54ed-11e5-9cd2-5254006e85c2/show_text

            llog unlink ================================ 15:49:47 \(1441468187\)
            15:51:00:[14402.025502] Lustre: DEBUG MARKER: == replay-single test 60: test llog post recovery init vs llog unlink ================================ 15:49:47 (1441468187)
            15:51:00:[14402.616182] Lustre: DEBUG MARKER: sync; sync; sync
            15:51:00:[14403.619443] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno
            15:51:00:[14403.864368] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly
            15:51:00:[14403.990255] Turning device dm-0 (0xfc00000) read-only
            15:51:00:[14404.116309] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000
            15:51:00:[14404.237332] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000
            15:51:00:[14404.649105] Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
            15:51:00:[14404.886835] Lustre: DEBUG MARKER: umount -d /mnt/mds1
            15:51:00:[14411.183415] Removing read-only on unknown block (0xfc00000)
            15:51:00:[14411.343626] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            15:51:00:[14421.656194] Lustre: DEBUG MARKER: hostname
            15:51:00:[14421.954993] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
            15:51:00:[14422.192567] Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
            15:51:00:[14422.457907] LDISKFS-fs (dm-0): recovery complete
            15:51:00:[14422.469317] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
            15:51:00:[14422.622381] Lustre: lustre-MDT0000-o: trigger OI scrub by RPC for [0x1:0x21a:0x0], rc = 0 [1]
            15:51:00:[14422.623847] LustreError: 22791:0:(llog_cat.c:171:llog_cat_id2handle()) lustre-OST0000-osc-MDT0000: error opening log id 0x21a:1:0: rc = -115
            15:51:00:[14422.625103] LustreError: 22791:0:(llog_cat.c:545:llog_cat_process_cb()) lustre-OST0000-osc-MDT0000: cannot find handle for llog 0x21a:1: -115
            15:51:00:[14422.626273] LustreError: 22791:0:(osp_sync.c:1132:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 1 in progress, 0 in flight: -115
            15:51:00:[14422.627634] LustreError: 22791:0:(osp_sync.c:1132:osp_sync_thread()) LBUG
            15:51:00:[14422.628266] Pid: 22791, comm: osp-syn-0-0
            15:51:00:[14422.628640] 
            15:51:00:[14422.628640] Call Trace:
            15:51:00:[14422.629041]  [<ffffffffa06257d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
            15:51:00:[14422.629675]  [<ffffffffa0625d75>] lbug_with_loc+0x45/0xc0 [libcfs]
            15:51:00:[14422.630265]  [<ffffffffa0f532ea>] osp_sync_thread+0x7fa/0x8f0 [osp]
            15:51:00:[14422.630848]  [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0
            15:51:00:[14422.631365]  [<ffffffffa0f52af0>] ? osp_sync_thread+0x0/0x8f0 [osp]
            15:51:00:[14422.631941]  [<ffffffff8109739f>] kthread+0xcf/0xe0
            15:51:00:[14422.632393]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
            15:51:00:[14422.632866]  [<ffffffff81615018>] ret_from_fork+0x58/0x90
            15:51:00:[14422.633368]  [<ffffffff810972d0>] ? kthread+0x0/0xe0
            15:51:00:[14422.633818] 
            15:51:00:[14422.635637] Kernel panic - not syncing: LBUG
            15:51:00:[14422.636020] CPU: 0 PID: 22791 Comm: osp-syn-0-0 Tainted: GF          O--------------   3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1
            15:51:00:[14422.636020] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            15:51:00:[14422.636020]  ffffffffa0642ecf 0000000071490b94 ffff88006e21fd28 ffffffff816051aa
            15:51:00:[14422.636020]  ffff88006e21fda8 ffffffff815fea1e ffffffff00000008 ffff88006e21fdb8
            15:51:00:[14422.636020]  ffff88006e21fd58 0000000071490b94 ffffffffa0f63e60 0000000000000246
            15:51:00:[14422.636020] Call Trace:
            15:51:00:[14422.636020]  [<ffffffff816051aa>] dump_stack+0x19/0x1b
            15:51:00:[14422.636020]  [<ffffffff815fea1e>] panic+0xd8/0x1e7
            15:51:00:[14422.636020]  [<ffffffffa0625ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
            15:51:00:[14422.636020]  [<ffffffffa0f532ea>] osp_sync_thread+0x7fa/0x8f0 [osp]
            15:51:00:[14422.636020]  [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0
            15:51:00:[14422.636020]  [<ffffffffa0f52af0>] ? osp_sync_process_queues+0x1660/0x1660 [osp]
            15:51:00:[14422.636020]  [<ffffffff8109739f>] kthread+0xcf/0xe0
            15:51:00:[14422.636020]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
            15:51:00:[14422.636020]  [<ffffffff81615018>] ret_from_fork+0x58/0x90
            15:51:00:[14422.636020]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
            15:51:00:[14422.636020] drm_kms_helper: panic occurred, switching back to text console
            15:51:00:[14422.636020] ------------[ cut here ]------------
            15:51:00:[14422.636020] kernel BUG at arch/x86/mm/pageattr.c:216!
            15:51:00:[14422.636020] invalid opcode: 0000 [#1] SMP 
            15:51:00:[14422.636020] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw virtio_balloon parport_pc i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper virtio_blk ttm 8139too drm ata_piix 8139cp mii virtio_pci virtio_ring virtio i2c_core libata floppy
            15:51:00:[14422.636020] CPU: 0 PID: 22791 Comm: osp-syn-0-0 Tainted: GF          O--------------   3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1
            15:51:00:[14422.636020] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
            15:51:00:[14422.636020] task: ffff88007acb6660 ti: ffff88006e21c000 task.ti: ffff88006e21c000
            15:51:00:[14422.636020] RIP: 0010:[<ffffffff8105c2ef>]  [<ffffffff8105c2ef>] change_page_attr_set_clr+0x4ef/0x500
            15:51:00:[14422.636020] RSP: 0018:ffff88006e21f530  EFLAGS: 00010046
            15:51:00:[14422.636020] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000010
            15:51:00:[14422.636020] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000080000000
            15:51:00:[14422.636020] RBP: ffff88006e21f5c8 R08: 0000000000000004 R09: 000000000006dcf8
            15:51:00:[14422.636020] R10: 0000000000003689 R11: ffffffff8118ff6f R12: 0000000000000010
            15:51:00:[14422.636020] R13: 0000000000000000 R14: 0000000000000200 R15: 0000000000000005
            15:51:00:[14422.636020] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
            15:51:00:[14422.636020] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
            15:51:00:[14422.636020] CR2: 00007fb7152a8220 CR3: 000000000190e000 CR4: 00000000000006f0
            15:51:00:[14422.636020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
            15:51:00:[14422.636020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
            15:51:00:[14422.636020] Stack:
            15:51:00:[14422.636020]  00000004b3216d38 ffff880000000000 0000000000000000 ffff88006b62c000
            15:51:00:[14422.636020]  ffff88007acb6660 0000000000000000 0000000000000000 0000000000000010
            15:51:00:[14422.636020]  0000000000000000 0000000500000001 000000000006dcf8 0000020000000000
            15:51:00:[14422.636020] Call Trace:
            15:51:00:[14422.636020]  [<ffffffff8105c646>] _set_pages_array+0xe6/0x130
            15:51:00:[14422.636020]  [<ffffffff8105c6c3>] set_pages_array_wc+0x13/0x20
            15:51:00:[14422.636020]  [<ffffffffa01133af>] ttm_set_pages_caching+0x2f/0x70 [ttm]
            15:51:00:[14422.636020]  [<ffffffffa01134f4>] ttm_alloc_new_pages.isra.7+0xb4/0x180 [ttm]
            15:51:00:[14422.636020]  [<ffffffffa0113e50>] ttm_pool_populate+0x3e0/0x500 [ttm]
            15:51:00:[14422.636020]  [<ffffffffa013132e>] cirrus_ttm_tt_populate+0xe/0x10 [cirrus]
            15:51:00:[14422.636020]  [<ffffffffa01106dd>] ttm_bo_move_memcpy+0x65d/0x6e0 [ttm]
            15:51:00:[14422.636020]  [<ffffffff8118f73e>] ? map_vm_area+0x2e/0x40
            15:51:00:[14422.636020]  [<ffffffffa010c2c9>] ? ttm_tt_init+0x69/0xb0 [ttm]
            15:51:00:[14422.636020]  [<ffffffffa01312d8>] cirrus_bo_move+0x18/0x20 [cirrus]
            15:51:00:[14422.636020]  [<ffffffffa010dde5>] ttm_bo_handle_move_mem+0x265/0x5b0 [ttm]
            15:51:00:[14422.636020]  [<ffffffff81601a64>] ? __slab_free+0x10e/0x277
            15:51:00:[14422.636020]  [<ffffffffa010e74a>] ? ttm_bo_mem_space+0x10a/0x310 [ttm]
            15:51:00:[14422.636020]  [<ffffffffa010ee17>] ttm_bo_validate+0x247/0x260 [ttm]
            15:51:00:[14422.636020]  [<ffffffff81059e69>] ? iounmap+0x79/0xa0
            15:51:00:[14422.636020]  [<ffffffff81050000>] ? kgdb_arch_late+0x80/0x180
            15:51:00:[14422.636020]  [<ffffffffa0131ac2>] cirrus_bo_push_sysram+0x82/0xe0 [cirrus]
            15:51:00:[14422.636020]  [<ffffffffa012fc84>] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus]
            15:51:00:[14422.636020]  [<ffffffffa0130479>] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus]
            15:51:00:[14422.636020]  [<ffffffffa00ee939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper]
            15:51:00:[14422.636020]  [<ffffffffa00ef6bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper]
            15:51:00:[14422.636020]  [<ffffffffa00af711>] drm_mode_set_config_internal+0x61/0xe0 [drm]
            15:51:00:[14422.636020]  [<ffffffffa00f6e83>] restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper]
            15:51:00:[14422.636020]  [<ffffffffa00f7045>] drm_fb_helper_force_kernel_mode+0x75/0xb0 [drm_kms_helper]
            15:51:00:[14422.636020]  [<ffffffffa00f7d59>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper]
            15:51:00:[14422.636020]  [<ffffffff81610a6c>] notifier_call_chain+0x4c/0x70
            15:51:00:[14422.636020]  [<ffffffff81610aca>] atomic_notifier_call_chain+0x1a/0x20
            15:51:00:[14422.636020]  [<ffffffff815fea4c>] panic+0x106/0x1e7
            15:51:00:[14422.636020]  [<ffffffffa0625ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
            15:51:00:[14422.636020]  [<ffffffffa0f532ea>] osp_sync_thread+0x7fa/0x8f0 [osp]
            15:51:00:[14422.636020]  [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0
            15:51:00:[14422.636020]  [<ffffffffa0f52af0>] ? osp_sync_process_queues+0x1660/0x1660 [osp]
            15:51:00:[14422.636020]  [<ffffffff8109739f>] kthread+0xcf/0xe0
            15:51:00:[14422.636020]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
            15:51:00:[14422.636020]  [<ffffffff81615018>] ret_from_fork+0x58/0x90
            15:51:00:[14422.636020]  [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140
            16:50:26:********** Timeout by autotest system **********
            
            sarah Sarah Liu added a comment - Hit this bug on master branch, replay-single test_60 failed. lustre-master build# 3175 RHEL7 DNE https://testing.hpdd.intel.com/test_logs/d35a0490-54ed-11e5-9cd2-5254006e85c2/show_text llog unlink ================================ 15:49:47 \(1441468187\) 15:51:00:[14402.025502] Lustre: DEBUG MARKER: == replay-single test 60: test llog post recovery init vs llog unlink ================================ 15:49:47 (1441468187) 15:51:00:[14402.616182] Lustre: DEBUG MARKER: sync; sync; sync 15:51:00:[14403.619443] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno 15:51:00:[14403.864368] Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly 15:51:00:[14403.990255] Turning device dm-0 (0xfc00000) read-only 15:51:00:[14404.116309] Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 15:51:00:[14404.237332] Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 15:51:00:[14404.649105] Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 15:51:00:[14404.886835] Lustre: DEBUG MARKER: umount -d /mnt/mds1 15:51:00:[14411.183415] Removing read-only on unknown block (0xfc00000) 15:51:00:[14411.343626] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 15:51:00:[14421.656194] Lustre: DEBUG MARKER: hostname 15:51:00:[14421.954993] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 15:51:00:[14422.192567] Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1 15:51:00:[14422.457907] LDISKFS-fs (dm-0): recovery complete 15:51:00:[14422.469317] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache 15:51:00:[14422.622381] Lustre: lustre-MDT0000-o: trigger OI scrub by RPC for [0x1:0x21a:0x0], rc = 0 [1] 15:51:00:[14422.623847] LustreError: 22791:0:(llog_cat.c:171:llog_cat_id2handle()) lustre-OST0000-osc-MDT0000: error opening log id 0x21a:1:0: rc = -115 15:51:00:[14422.625103] LustreError: 22791:0:(llog_cat.c:545:llog_cat_process_cb()) lustre-OST0000-osc-MDT0000: cannot find handle for llog 0x21a:1: -115 15:51:00:[14422.626273] LustreError: 22791:0:(osp_sync.c:1132:osp_sync_thread()) ASSERTION( rc == 0 || rc == LLOG_PROC_BREAK ) failed: 0 changes, 1 in progress, 0 in flight: -115 15:51:00:[14422.627634] LustreError: 22791:0:(osp_sync.c:1132:osp_sync_thread()) LBUG 15:51:00:[14422.628266] Pid: 22791, comm: osp-syn-0-0 15:51:00:[14422.628640] 15:51:00:[14422.628640] Call Trace: 15:51:00:[14422.629041] [<ffffffffa06257d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] 15:51:00:[14422.629675] [<ffffffffa0625d75>] lbug_with_loc+0x45/0xc0 [libcfs] 15:51:00:[14422.630265] [<ffffffffa0f532ea>] osp_sync_thread+0x7fa/0x8f0 [osp] 15:51:00:[14422.630848] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 15:51:00:[14422.631365] [<ffffffffa0f52af0>] ? osp_sync_thread+0x0/0x8f0 [osp] 15:51:00:[14422.631941] [<ffffffff8109739f>] kthread+0xcf/0xe0 15:51:00:[14422.632393] [<ffffffff810972d0>] ? kthread+0x0/0xe0 15:51:00:[14422.632866] [<ffffffff81615018>] ret_from_fork+0x58/0x90 15:51:00:[14422.633368] [<ffffffff810972d0>] ? kthread+0x0/0xe0 15:51:00:[14422.633818] 15:51:00:[14422.635637] Kernel panic - not syncing: LBUG 15:51:00:[14422.636020] CPU: 0 PID: 22791 Comm: osp-syn-0-0 Tainted: GF O-------------- 3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1 15:51:00:[14422.636020] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 15:51:00:[14422.636020] ffffffffa0642ecf 0000000071490b94 ffff88006e21fd28 ffffffff816051aa 15:51:00:[14422.636020] ffff88006e21fda8 ffffffff815fea1e ffffffff00000008 ffff88006e21fdb8 15:51:00:[14422.636020] ffff88006e21fd58 0000000071490b94 ffffffffa0f63e60 0000000000000246 15:51:00:[14422.636020] Call Trace: 15:51:00:[14422.636020] [<ffffffff816051aa>] dump_stack+0x19/0x1b 15:51:00:[14422.636020] [<ffffffff815fea1e>] panic+0xd8/0x1e7 15:51:00:[14422.636020] [<ffffffffa0625ddb>] lbug_with_loc+0xab/0xc0 [libcfs] 15:51:00:[14422.636020] [<ffffffffa0f532ea>] osp_sync_thread+0x7fa/0x8f0 [osp] 15:51:00:[14422.636020] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 15:51:00:[14422.636020] [<ffffffffa0f52af0>] ? osp_sync_process_queues+0x1660/0x1660 [osp] 15:51:00:[14422.636020] [<ffffffff8109739f>] kthread+0xcf/0xe0 15:51:00:[14422.636020] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 15:51:00:[14422.636020] [<ffffffff81615018>] ret_from_fork+0x58/0x90 15:51:00:[14422.636020] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 15:51:00:[14422.636020] drm_kms_helper: panic occurred, switching back to text console 15:51:00:[14422.636020] ------------[ cut here ]------------ 15:51:00:[14422.636020] kernel BUG at arch/x86/mm/pageattr.c:216! 15:51:00:[14422.636020] invalid opcode: 0000 [#1] SMP 15:51:00:[14422.636020] Modules linked in: osp(OF) mdd(OF) lod(OF) mdt(OF) lfsck(OF) mgs(OF) mgc(OF) osd_ldiskfs(OF) lquota(OF) fid(OF) fld(OF) ksocklnd(OF) ptlrpc(OF) obdclass(OF) lnet(OF) sha512_generic libcfs(OF) ldiskfs(OF) dm_mod nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd fscache xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ppdev pcspkr serio_raw virtio_balloon parport_pc i2c_piix4 parport ext4 mbcache jbd2 ata_generic pata_acpi cirrus syscopyarea sysfillrect sysimgblt drm_kms_helper virtio_blk ttm 8139too drm ata_piix 8139cp mii virtio_pci virtio_ring virtio i2c_core libata floppy 15:51:00:[14422.636020] CPU: 0 PID: 22791 Comm: osp-syn-0-0 Tainted: GF O-------------- 3.10.0-229.7.2.el7_lustre.gea2bb60.x86_64 #1 15:51:00:[14422.636020] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 15:51:00:[14422.636020] task: ffff88007acb6660 ti: ffff88006e21c000 task.ti: ffff88006e21c000 15:51:00:[14422.636020] RIP: 0010:[<ffffffff8105c2ef>] [<ffffffff8105c2ef>] change_page_attr_set_clr+0x4ef/0x500 15:51:00:[14422.636020] RSP: 0018:ffff88006e21f530 EFLAGS: 00010046 15:51:00:[14422.636020] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000010 15:51:00:[14422.636020] RDX: 0000000000002000 RSI: 0000000000000000 RDI: 0000000080000000 15:51:00:[14422.636020] RBP: ffff88006e21f5c8 R08: 0000000000000004 R09: 000000000006dcf8 15:51:00:[14422.636020] R10: 0000000000003689 R11: ffffffff8118ff6f R12: 0000000000000010 15:51:00:[14422.636020] R13: 0000000000000000 R14: 0000000000000200 R15: 0000000000000005 15:51:00:[14422.636020] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 15:51:00:[14422.636020] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b 15:51:00:[14422.636020] CR2: 00007fb7152a8220 CR3: 000000000190e000 CR4: 00000000000006f0 15:51:00:[14422.636020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 15:51:00:[14422.636020] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 15:51:00:[14422.636020] Stack: 15:51:00:[14422.636020] 00000004b3216d38 ffff880000000000 0000000000000000 ffff88006b62c000 15:51:00:[14422.636020] ffff88007acb6660 0000000000000000 0000000000000000 0000000000000010 15:51:00:[14422.636020] 0000000000000000 0000000500000001 000000000006dcf8 0000020000000000 15:51:00:[14422.636020] Call Trace: 15:51:00:[14422.636020] [<ffffffff8105c646>] _set_pages_array+0xe6/0x130 15:51:00:[14422.636020] [<ffffffff8105c6c3>] set_pages_array_wc+0x13/0x20 15:51:00:[14422.636020] [<ffffffffa01133af>] ttm_set_pages_caching+0x2f/0x70 [ttm] 15:51:00:[14422.636020] [<ffffffffa01134f4>] ttm_alloc_new_pages.isra.7+0xb4/0x180 [ttm] 15:51:00:[14422.636020] [<ffffffffa0113e50>] ttm_pool_populate+0x3e0/0x500 [ttm] 15:51:00:[14422.636020] [<ffffffffa013132e>] cirrus_ttm_tt_populate+0xe/0x10 [cirrus] 15:51:00:[14422.636020] [<ffffffffa01106dd>] ttm_bo_move_memcpy+0x65d/0x6e0 [ttm] 15:51:00:[14422.636020] [<ffffffff8118f73e>] ? map_vm_area+0x2e/0x40 15:51:00:[14422.636020] [<ffffffffa010c2c9>] ? ttm_tt_init+0x69/0xb0 [ttm] 15:51:00:[14422.636020] [<ffffffffa01312d8>] cirrus_bo_move+0x18/0x20 [cirrus] 15:51:00:[14422.636020] [<ffffffffa010dde5>] ttm_bo_handle_move_mem+0x265/0x5b0 [ttm] 15:51:00:[14422.636020] [<ffffffff81601a64>] ? __slab_free+0x10e/0x277 15:51:00:[14422.636020] [<ffffffffa010e74a>] ? ttm_bo_mem_space+0x10a/0x310 [ttm] 15:51:00:[14422.636020] [<ffffffffa010ee17>] ttm_bo_validate+0x247/0x260 [ttm] 15:51:00:[14422.636020] [<ffffffff81059e69>] ? iounmap+0x79/0xa0 15:51:00:[14422.636020] [<ffffffff81050000>] ? kgdb_arch_late+0x80/0x180 15:51:00:[14422.636020] [<ffffffffa0131ac2>] cirrus_bo_push_sysram+0x82/0xe0 [cirrus] 15:51:00:[14422.636020] [<ffffffffa012fc84>] cirrus_crtc_do_set_base.isra.8.constprop.10+0x84/0x430 [cirrus] 15:51:00:[14422.636020] [<ffffffffa0130479>] cirrus_crtc_mode_set+0x449/0x4d0 [cirrus] 15:51:00:[14422.636020] [<ffffffffa00ee939>] drm_crtc_helper_set_mode+0x2e9/0x520 [drm_kms_helper] 15:51:00:[14422.636020] [<ffffffffa00ef6bf>] drm_crtc_helper_set_config+0x87f/0xaa0 [drm_kms_helper] 15:51:00:[14422.636020] [<ffffffffa00af711>] drm_mode_set_config_internal+0x61/0xe0 [drm] 15:51:00:[14422.636020] [<ffffffffa00f6e83>] restore_fbdev_mode+0xb3/0xe0 [drm_kms_helper] 15:51:00:[14422.636020] [<ffffffffa00f7045>] drm_fb_helper_force_kernel_mode+0x75/0xb0 [drm_kms_helper] 15:51:00:[14422.636020] [<ffffffffa00f7d59>] drm_fb_helper_panic+0x29/0x30 [drm_kms_helper] 15:51:00:[14422.636020] [<ffffffff81610a6c>] notifier_call_chain+0x4c/0x70 15:51:00:[14422.636020] [<ffffffff81610aca>] atomic_notifier_call_chain+0x1a/0x20 15:51:00:[14422.636020] [<ffffffff815fea4c>] panic+0x106/0x1e7 15:51:00:[14422.636020] [<ffffffffa0625ddb>] lbug_with_loc+0xab/0xc0 [libcfs] 15:51:00:[14422.636020] [<ffffffffa0f532ea>] osp_sync_thread+0x7fa/0x8f0 [osp] 15:51:00:[14422.636020] [<ffffffff810125f6>] ? __switch_to+0x136/0x4a0 15:51:00:[14422.636020] [<ffffffffa0f52af0>] ? osp_sync_process_queues+0x1660/0x1660 [osp] 15:51:00:[14422.636020] [<ffffffff8109739f>] kthread+0xcf/0xe0 15:51:00:[14422.636020] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 15:51:00:[14422.636020] [<ffffffff81615018>] ret_from_fork+0x58/0x90 15:51:00:[14422.636020] [<ffffffff810972d0>] ? kthread_create_on_node+0x140/0x140 16:50:26:********** Timeout by autotest system **********

            Mike, thanks for getting this tool working for the customer so quickly.

            It would be more useful in the long run if the kernel llog code would just skip the corrupted records itself, and/or have LFSCK repair them before use. That allows the filesystem to keep working, rather than taking an outage and requiring the admin to even figure out such a tool exists and have to run it, instead of the kernel dealing with this problem directly.

            adilger Andreas Dilger added a comment - Mike, thanks for getting this tool working for the customer so quickly. It would be more useful in the long run if the kernel llog code would just skip the corrupted records itself, and/or have LFSCK repair them before use. That allows the filesystem to keep working, rather than taking an outage and requiring the admin to even figure out such a tool exists and have to run it, instead of the kernel dealing with this problem directly.
            pjones Peter Jones added a comment -

            ok - thanks Mahmoud

            pjones Peter Jones added a comment - ok - thanks Mahmoud

            People

              bobijam Zhenyu Xu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: