Details
-
Bug
-
Resolution: Won't Fix
-
Blocker
-
None
-
Lustre 2.1.6
-
RHEL6
-
3
-
11384
Description
Bug originally hit during xyratex testing.
BUG: spinlock lockup on CPU#3, tgt_recov/8159, ffff880099c1ca90 (Tainted: G W ---------------- ) Pid: 8159, comm: tgt_recov Tainted: G W ---------------- 2.6.32-131.17.1-lustre #0 Call Trace: [<ffffffff8128c2da>] ? _raw_spin_lock+0x16a/0x180 [<ffffffff81500ff6>] ? _spin_lock+0x56/0x70 [<ffffffffa056668a>] ? class_export_recovery_cleanup+0x3a/0x230 [obdclass] [<ffffffffa03ea572>] ? cfs_hash_del+0xa2/0x1d0 [libcfs] [<ffffffffa056668a>] ? class_export_recovery_cleanup+0x3a/0x230 [obdclass] [<ffffffffa056875d>] ? class_disconnect+0x15d/0x3d0 [obdclass] [<ffffffffa06bfd17>] ? server_disconnect_export+0x37/0x1a0 [ptlrpc] [<ffffffffa0c9630f>] ? filter_disconnect+0xbf/0x380 [obdfilter] [<ffffffffa056db97>] ? class_disconnect_export_list+0x347/0x680 [obdclass] [<ffffffffa056e027>] ? class_disconnect_stale_exports+0x157/0x380 [obdclass] [<ffffffffa06bc180>] ? exp_connect_healthy+0x0/0x20 [ptlrpc] [<ffffffffa06bc490>] ? check_for_clients+0x0/0x80 [ptlrpc] [<ffffffffa06bf04b>] ? target_recovery_overseer+0x15b/0x2d0 [ptlrpc] [<ffffffffa06bc180>] ? exp_connect_healthy+0x0/0x20 [ptlrpc] [<ffffffff81091a80>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa06c4b90>] ? target_recovery_thread+0x460/0x15d0 [ptlrpc] [<ffffffff810563bd>] ? finish_task_switch+0x7d/0x110 [<ffffffffa06c4730>] ? target_recovery_thread+0x0/0x15d0 [ptlrpc] [<ffffffff8100c2ca>] ? child_rip+0xa/0x20 [<ffffffff81500d50>] ? _spin_unlock_irq+0x30/0x40 [<ffffffff8100bc10>] ? restore_args+0x0/0x30 [<ffffffffa06c4730>] ? target_recovery_thread+0x0/0x15d0 [ptlrpc] [<ffffffff8100c2c0>] ? child_rip+0x0/0x20
discovering an bug found an commit with backporting an LU-1522.
quick look say that bug exist in target_handle_connect() function on b2_1 also
cfs_spin_lock(&target->obd_recovery_task_lock);
if (target->obd_recovering && !export->exp_in_recovery &&
!export->exp_disconnected) {
cfs_spin_lock(&export->exp_lock);
/* possible race with class_disconnect_stale_exports,
* export may be already in the eviction process */
if (export->exp_failed) {
cfs_spin_unlock(&export->exp_lock);
GOTO(out, rc = -ENODEV);
}
so if we have race with disconnect stale export we will exit from obd_recovery_task_lock held, that kill recovery and node at all.