Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.8.0, Lustre 2.9.0
-
None
-
autotest review-dne-part-2
-
3
-
9223372036854775807
Description
replay-single test_70b times out. In the MDS 2, MDS 3, MDS 4 console log, we see:
13:15:37:LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: 13:15:37:LustreError: 25429:0:(mgc_request.c:995:mgc_blocking_ast()) ASSERTION( atomic_read(&cld->cld_refcount) > 0 ) failed: 13:15:37:LustreError: 25429:0:(mgc_request.c:995:mgc_blocking_ast()) LBUG 13:15:37:Pid: 25429, comm: ldlm_bl_01 13:15:37: 13:15:37:Call Trace: 13:15:37: [<ffffffffa0467875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 13:15:37: [<ffffffffa0467e77>] lbug_with_loc+0x47/0xb0 [libcfs] 13:15:37: [<ffffffffa0cff9d9>] mgc_blocking_ast+0x6e9/0x810 [mgc] 13:15:37: [<ffffffffa0758b57>] ldlm_cancel_callback+0x87/0x280 [ptlrpc] 13:15:37: [<ffffffffa07779ba>] ldlm_cli_cancel_local+0x8a/0x470 [ptlrpc] 13:15:37: [<ffffffffa077c55c>] ldlm_cli_cancel+0x9c/0x3e0 [ptlrpc] 13:15:37: [<ffffffffa0cff3db>] mgc_blocking_ast+0xeb/0x810 [mgc] 13:15:37: [<ffffffffa0cff2f0>] ? mgc_blocking_ast+0x0/0x810 [mgc] 13:15:37: [<ffffffffa0780c90>] ldlm_handle_bl_callback+0x130/0x400 [ptlrpc] 13:15:37: [<ffffffffa0781ba1>] ldlm_bl_thread_main+0x481/0x710 [ptlrpc] 13:15:37: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20 13:15:37: [<ffffffffa0781720>] ? ldlm_bl_thread_main+0x0/0x710 [ptlrpc] 13:15:37: [<ffffffff810a0fce>] kthread+0x9e/0xc0 13:15:37: [<ffffffff8100c28a>] child_rip+0xa/0x20 13:15:37: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 13:15:37: [<ffffffff8100c280>] ? child_rip+0x0/0x20 13:15:37: 13:15:37:Kernel panic - not syncing: LBUG
In the past month, I can only find two occurrences of this error for test_70b. Logs at
2016-01-28 15:21:30 - https://testing.hpdd.intel.com/test_sets/c296d92c-c620-11e5-b4e1-5254006e85c2
2016-02-03 19:34:24 - https://testing.hpdd.intel.com/test_sets/e4674cb8-caf7-11e5-be8d-5254006e85c2