[LU-6944] LBUG: (osp_sync.c:1139:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 806 changes, 230 in progress, 7 in flight Created: 03/Aug/15  Updated: 13/Aug/15  Resolved: 12/Aug/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Jian Yu Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre Build: https://build.hpdd.intel.com/job/lustre-master/3118


Issue Links:
Related
is related to LU-6714 llog_process_thread() may use wrong o... Resolved
is related to LU-7001 osp_sync.c: 1139: osp_sync_thread Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

performance-sanity test 3 hung:
https://testing.hpdd.intel.com/test_sets/59acf684-34ea-11e5-be21-5254006e85c2

Console log on MDS shadow-13vm8:

Lustre: DEBUG MARKER: /usr/sbin/lctl set_param                           osd-ldiskfs.track_declares_assert=1 || true^M
Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1^M
Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-small.sh ### 1 NODE CREATE ###^M
Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh^M
Lustre: DEBUG MARKER: /usr/sbin/lctl mark ===== mdsrate-create-small.sh ### 1 NODE UNLINK ###^M
Lustre: DEBUG MARKER: ===== mdsrate-create-small.sh^M
LustreError: 30336:0:(osp_sync.c:1139:osp_sync_thread()) ASSERTION( thread->t_flags != SVC_RUNNING ) failed: 806 changes, 230 in progress, 7 in flight^M
LustreError: 30336:0:(osp_sync.c:1139:osp_sync_thread()) LBUG^M
Pid: 30336, comm: osp-syn-0-0^M
^M
Call Trace:^M
 [<ffffffffa0490875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
 [<ffffffffa0490e77>] lbug_with_loc+0x47/0xb0 [libcfs]^M
 [<ffffffffa10ae3c2>] osp_sync_thread+0x7e2/0x7f0 [osp]^M
 [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0^M
 [<ffffffffa10adbe0>] ? osp_sync_thread+0x0/0x7f0 [osp]^M
 [<ffffffff8109e78e>] kthread+0x9e/0xc0^M
 [<ffffffff8100c28a>] child_rip+0xa/0x20^M
 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0^M
 [<ffffffff8100c280>] ? child_rip+0x0/0x20^M
^M
Kernel panic - not syncing: LBUG^M
Pid: 30336, comm: osp-syn-0-0 Not tainted 2.6.32-504.30.3.el6_lustre.g0dba034.x86_64 #1^M
Call Trace:^M
 [<ffffffff81529c9c>] ? panic+0xa7/0x16f^M
 [<ffffffffa0490ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]^M
 [<ffffffffa10ae3c2>] ? osp_sync_thread+0x7e2/0x7f0 [osp]^M
 [<ffffffff8152a39e>] ? thread_return+0x4e/0x7d0^M
 [<ffffffffa10adbe0>] ? osp_sync_thread+0x0/0x7f0 [osp]^M
 [<ffffffff8109e78e>] ? kthread+0x9e/0xc0^M
 [<ffffffff8100c28a>] ? child_rip+0xa/0x20^M
 [<ffffffff8109e6f0>] ? kthread+0x0/0xc0^M
 [<ffffffff8100c280>] ? child_rip+0x0/0x20^M

More instances:
https://testing.hpdd.intel.com/test_sets/779e23da-32d7-11e5-a4fd-5254006e85c2
https://testing.hpdd.intel.com/test_sets/c1a1ee7c-34ed-11e5-b875-5254006e85c2



 Comments   
Comment by Jian Yu [ 03/Aug/15 ]

The patch for LU-6714 introduced the regression failure.

Comment by Andreas Dilger [ 05/Aug/15 ]

Jian, does the latest patch from LU-6714 http://review.whamcloud.com/15841 fix this problem?

Comment by Jian Yu [ 05/Aug/15 ]

Yes, Andreas, with the patch, conf-sanity and performance-sanity tests did not hit the LBUG.

Comment by Mikhail Pershin [ 11/Aug/15 ]

patch was landed so I think this ticket can be closed

Comment by Peter Jones [ 12/Aug/15 ]

Yes - I think that this can be closed as a duplicate of LU-6714 as that was the ticket used to track the fix

Generated at Sat Feb 10 02:04:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.