Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6944

LBUG: (osp_sync.c:1139:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 806 changes, 230 in progress, 7 in flight

Details

    • 3
    • 9223372036854775807

    Description

      performance-sanity test 3 hung:
      https://testing.hpdd.intel.com/test_sets/59acf684-34ea-11e5-be21-5254006e85c2

      Console log on MDS shadow-13vm8:

      Lustre: DEBUG MARKER: /usr/sbin/lctl set_param                           osd-ldiskfs.track_declares_assert=1 || true^M
      Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1^M
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-small.sh ### 1 NODE CREATE ###^M
      Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh^M
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-small.sh ### 1 NODE UNLINK ###^M
      Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh^M
      LustreError: 30336:0:(osp_sync.c:1139:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 806 changes, 230 in progress, 7 in flight^M
      LustreError: 30336:0:(osp_sync.c:1139:osp_sync_thread()) LBUG^M
      Pid: 30336, comm: osp-syn-0-0^M
      ^M
      Call Trace:^M
       [<ffffffffa0490875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
       [<ffffffffa0490e77>] lbug_with_loc+0x47/0xb0 [libcfs]^M
       [<ffffffffa10ae3c2>] osp_sync_thread+0x7e2/0x7f0 [osp]^M
       [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0^M
       [<ffffffffa10adbe0>] ? osp_sync_thread+0x0/0x7f0 [osp]^M
       [<ffffffff8109e78e>] kthread+0x9e/0xc0^M
       [<ffffffff8100c28a>] child_rip+0xa/0x20^M
       [<ffffffff8109e6f0>] ? kthread+0x0/0xc0^M
       [<ffffffff8100c280>] ? child_rip+0x0/0x20^M
      ^M
      Kernel panic - not syncing: LBUG^M
      Pid: 30336, comm: osp-syn-0-0 Not tainted 2.6.32-504.30.3.el6_lustre.g0dba034.x86_64 #1^M
      Call Trace:^M
       [<ffffffff81529c9c>] ? panic+0xa7/0x16f^M
       [<ffffffffa0490ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]^M
       [<ffffffffa10ae3c2>] ? osp_sync_thread+0x7e2/0x7f0 [osp]^M
       [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0^M
       [<ffffffffa10adbe0>] ? osp_sync_thread+0x0/0x7f0 [osp]^M
       [<ffffffff8109e78e>] ? kthread+0x9e/0xc0^M
       [<ffffffff8100c28a>] ? child_rip+0xa/0x20^M
       [<ffffffff8109e6f0>] ? kthread+0x0/0xc0^M
       [<ffffffff8100c280>] ? child_rip+0x0/0x20^M
      

      More instances:
      https://testing.hpdd.intel.com/test_sets/779e23da-32d7-11e5-a4fd-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/c1a1ee7c-34ed-11e5-b875-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-6944] LBUG: (osp_sync.c:1139:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 806 changes, 230 in progress, 7 in flight
            pjones Peter Jones added a comment -

            Yes - I think that this can be closed as a duplicate of LU-6714 as that was the ticket used to track the fix

            pjones Peter Jones added a comment - Yes - I think that this can be closed as a duplicate of LU-6714 as that was the ticket used to track the fix

            patch was landed so I think this ticket can be closed

            tappro Mikhail Pershin added a comment - patch was landed so I think this ticket can be closed
            yujian Jian Yu added a comment -

            Yes, Andreas, with the patch, conf-sanity and performance-sanity tests did not hit the LBUG.

            yujian Jian Yu added a comment - Yes, Andreas, with the patch, conf-sanity and performance-sanity tests did not hit the LBUG.

            Jian, does the latest patch from LU-6714 http://review.whamcloud.com/15841 fix this problem?

            adilger Andreas Dilger added a comment - Jian, does the latest patch from LU-6714 http://review.whamcloud.com/15841 fix this problem?
            yujian Jian Yu added a comment -

            The patch for LU-6714 introduced the regression failure.

            yujian Jian Yu added a comment - The patch for LU-6714 introduced the regression failure.

            People

              tappro Mikhail Pershin
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: