[LU-4097] ptlrpc_main()) ASSERTION( svcpt->scp_nthrs_starting == 1 ) failed: Created: 12/Oct/13  Updated: 15/Oct/13  Resolved: 15/Oct/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: Oleg Drokin
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Hyperion


Attachments: Text File lu-4097.txt    
Severity: 3
Rank (Obsolete): 11006

 Description   

Starting IOR ssf, server crashes/wedged

2013-10-12 11:13:33 LustreError: 6291:0:(service.c:2864:ptlrpc_start_thread()) cannot start thread 'll_ost01_009': rc -2816
2013-10-12 11:13:33 LustreError: 6321:0:(service.c:2467:ptlrpc_main()) ASSERTION( svcpt->scp_nthrs_starting == 1 ) failed:
2013-10-12 11:13:33 LustreError: 6321:0:(service.c:2467:ptlrpc_main()) LBUG
2013-10-12 11:13:33 Pid: 6321, comm: ll_ost01_010
2013-10-12 11:13:33
2013-10-12 11:13:33 Call Trace:
2013-10-12 11:13:33  [<ffffffffa06bf895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2013-10-12 11:13:33  [<ffffffffa06bfe97>] lbug_with_loc+0x47/0xb0 [libcfs]
2013-10-12 11:13:33  [<ffffffffa0a49bdc>] ptlrpc_main+0x153c/0x1740 [ptlrpc]
2013-10-12 11:13:33  [<ffffffffa0a486a0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
2013-10-12 11:13:33  [<ffffffff81096a36>] kthread+0x96/0xa0
2013-10-12 11:13:33  [<ffffffff8100c0ca>] child_rip+0xa/0x20
2013-10-12 11:13:33  [<ffffffff810969a0>] ? kthread+0x0/0xa0
2013-10-12 11:13:33  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
2013-10-12 11:13:33


 Comments   
Comment by Cliff White (Inactive) [ 12/Oct/13 ]

Console log from hyperion-agb20 (dead OSS)

Comment by Cliff White (Inactive) [ 12/Oct/13 ]

System had started iorfpp, node had repeated watchdogs after the LBUG.

Comment by Peter Jones [ 12/Oct/13 ]

Oleg what do you suggest?

Comment by Oleg Drokin [ 14/Oct/13 ]

Cliff did a rerun of the test and it did not reproduce.

The error message itself does not make much sense, and we do not have any extra debugging info, so I do not hink we can do anything about this. Could be a one-time fluke too.

Comment by Jodi Levi (Inactive) [ 15/Oct/13 ]

Cliff has rerun tests and unable to reproduce this issue.

Generated at Sat Feb 10 01:39:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.