[LU-9135] sanity test_313: osp_sync.c:571:osp_sync_interpret()) LBUG Created: 16/Feb/17 Updated: 12/Mar/18 Resolved: 29/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Bob Glossman (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/1d4a3042-f3cc-11e6-8862-5254006e85c2. The sub-test test_313 failed with the following error: test failed to respond and timed out the following panic seen in console log for MDS: 20:34:29:[ 6925.325814] LustreError: 6289:0:(osp_sync.c:571:osp_sync_interpret()) ASSERTION( req->rq_transno == 0 || req->rq_import_generation < imp->imp_generation ) failed: transno 21474848133, rc -5, gen: req 1, imp 1 20:34:29:[ 6925.334095] LustreError: 6289:0:(osp_sync.c:571:osp_sync_interpret()) LBUG 20:34:29:[ 6925.337147] Pid: 6289, comm: ptlrpcd_00_00 20:34:29:[ 6925.338786] 20:34:29:[ 6925.338786] Call Trace: 20:34:29:[ 6925.341582] [<ffffffffa06e77f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] 20:34:29:[ 6925.343382] [<ffffffffa06e7861>] lbug_with_loc+0x41/0xb0 [libcfs] 20:34:29:[ 6925.345155] [<ffffffffa0fb99b3>] osp_sync_interpret+0x363/0x520 [osp] 20:34:29:[ 6925.347107] [<ffffffffa0a490b5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc] 20:34:29:[ 6925.348959] [<ffffffffa0a4aabb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] 20:34:29:[ 6925.350719] [<ffffffffa0a76b8b>] ptlrpcd_check+0x4db/0x5d0 [ptlrpc] 20:34:29:[ 6925.352418] [<ffffffffa0a76f3b>] ptlrpcd+0x2bb/0x560 [ptlrpc] 20:34:29:[ 6925.354016] [<ffffffff810c4fd0>] ? default_wake_function+0x0/0x20 20:34:29:[ 6925.355646] [<ffffffffa0a76c80>] ? ptlrpcd+0x0/0x560 [ptlrpc] 20:34:29:[ 6925.357242] [<ffffffff810b064f>] kthread+0xcf/0xe0 20:34:29:[ 6925.358743] [<ffffffff810b0580>] ? kthread+0x0/0xe0 20:34:29:[ 6925.360250] [<ffffffff81696958>] ret_from_fork+0x58/0x90 20:34:29:[ 6925.361794] [<ffffffff810b0580>] ? kthread+0x0/0xe0 20:34:29:[ 6925.363307] 20:34:29:[ 6925.364557] Kernel panic - not syncing: LBUG 20:34:29:[ 6925.365550] CPU: 0 PID: 6289 Comm: ptlrpcd_00_00 Tainted: G OE ------------ 3.10.0-514.6.1.el7_lustre.x86_64 #1 20:34:29:[ 6925.365550] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 20:34:29:[ 6925.365550] ffffffffa0705d8c 00000000cf1e6ef2 ffff880077eb7bc8 ffffffff816863f8 20:34:29:[ 6925.365550] ffff880077eb7c48 ffffffff8167f823 ffffffff00000008 ffff880077eb7c58 20:34:29:[ 6925.365550] ffff880077eb7bf8 00000000cf1e6ef2 00000000cf1e6ef2 ffff88007fc0f838 20:34:29:[ 6925.365550] Call Trace: 20:34:29:[ 6925.365550] [<ffffffff816863f8>] dump_stack+0x19/0x1b 20:34:29:[ 6925.365550] [<ffffffff8167f823>] panic+0xe3/0x1f2 20:34:29:[ 6925.365550] [<ffffffffa06e7879>] lbug_with_loc+0x59/0xb0 [libcfs] 20:34:29:[ 6925.365550] [<ffffffffa0fb99b3>] osp_sync_interpret+0x363/0x520 [osp] 20:34:29:[ 6925.365550] [<ffffffffa0a490b5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc] 20:34:29:[ 6925.365550] [<ffffffffa0a4aabb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] 20:34:29:[ 6925.365550] [<ffffffffa0a76b8b>] ptlrpcd_check+0x4db/0x5d0 [ptlrpc] 20:34:29:[ 6925.365550] [<ffffffffa0a76f3b>] ptlrpcd+0x2bb/0x560 [ptlrpc] 20:34:29:[ 6925.365550] [<ffffffff810c4fd0>] ? wake_up_state+0x20/0x20 20:34:29:[ 6925.365550] [<ffffffffa0a76c80>] ? ptlrpcd_check+0x5d0/0x5d0 [ptlrpc] 20:34:29:[ 6925.365550] [<ffffffff810b064f>] kthread+0xcf/0xe0 20:34:29:[ 6925.365550] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 20:34:29:[ 6925.365550] [<ffffffff81696958>] ret_from_fork+0x58/0x90 20:34:29:[ 6925.365550] [<ffffffff810b0580>] ? kthread_create_on_node+0x140/0x140 Info required for matching: sanity 313 |
| Comments |
| Comment by Oleg Drokin [ 24/Jul/17 ] |
|
just hit this on my new testbed in master. |
| Comment by Bob Glossman (Inactive) [ 16/Aug/17 ] |
|
another on b2_10: |
| Comment by Bob Glossman (Inactive) [ 29/Aug/17 ] |
|
another on master: |
| Comment by Jian Yu [ 01/Oct/17 ] |
|
More failure instances on master branch: |
| Comment by Bob Glossman (Inactive) [ 10/Oct/17 ] |
|
failure on master: Not 100% sure this is the same fail. [ 4835.796965] LustreError: 4565:0:(osp_sync.c:578:osp[ 0.000000] followed by normal reboot logs. |
| Comment by Bob Glossman (Inactive) [ 13/Oct/17 ] |
|
another on master: |
| Comment by Emoly Liu [ 16/Oct/17 ] |
|
+1 on master: |
| Comment by Andreas Dilger [ 07/Nov/17 ] |
|
Again on master: https://testing.hpdd.intel.com/test_sets/97931696-b5e8-11e7-9d39-52540065bddc |
| Comment by Mikhail Pershin [ 14/Nov/17 ] |
|
Was seen several times in master: https://testing.hpdd.intel.com/test_sets/8e069408-c935-11e7-8027-52540065bddc https://testing.hpdd.intel.com/test_sets/87888c5e-c8e9-11e7-9c63-52540065bddc
|
| Comment by Oleg Drokin [ 14/Nov/17 ] |
|
this fails for me all the time still, I discussed it with Alex for a long time. I have hundreds of crashdumps of this if anybody wants to take a look here. |
| Comment by Jinshan Xiong (Inactive) [ 15/Nov/17 ] |
|
https://testing.hpdd.intel.com/sub_tests/bc8cf07c-c978-11e7-a066-52540065bddc |
| Comment by Gerrit Updater [ 16/Nov/17 ] |
|
Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: https://review.whamcloud.com/30129 |
| Comment by Gerrit Updater [ 29/Nov/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30129/ |
| Comment by Peter Jones [ 29/Nov/17 ] |
|
Landed for 2.11 |
| Comment by Andreas Dilger [ 05/Dec/17 ] |
|
This test was added in patch https://review.whamcloud.com/21398 " |
| Comment by Alex Zhuravlev [ 05/Dec/17 ] |
|
iirc, Oleg had a very long (more than an year?) history of hitting that. the assertion was brought with the initial OSP code, iirc. |
| Comment by Gerrit Updater [ 25/Jan/18 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/31013 |
| Comment by Gerrit Updater [ 12/Mar/18 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/31013/ |