[LU-2591] race between mount/umount and lov_notify Created: 09/Jan/13 Updated: 05/Mar/13 Resolved: 25/Feb/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.1.5, Lustre 1.8.8 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.1.5 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Hiroya Nozaki | Assignee: | Jian Yu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB, patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 6039 |
| Description |
|
I've found a race between mount/umount and lov_notify(). 1) mount/umount: mount runs and fails to communicate with some OSTs. Then import objects are registered to a pinger list. Now that ptlrpc_rcv thread is waiting for the import state to be changed to non-recovery state but ptlrpc_rcv is the one who is supposed to change a recovery state to a non-recovery state. So ptlrpc_rcv must hung. And that's why this node is not able to use ptlrpc_rcv thread anymore. ------------- Jan 8 14:00:05 rx200-088 kernel: Lustre: Lustre: Build Version: 2.1.56-gf394dce-CHANGED-2.6.32-279.5.1.el6_lustre.gb16fe80.x86_64 Jan 8 14:00:06 rx200-088 kernel: LNet: Added LNI 192.168.128.88@o2ib [8/256/0/180] Jan 8 14:00:06 rx200-088 kernel: Lustre: MGC192.168.128.86@o2ib: Reactivating import ----- This is because of my reproducer ----- Jan 8 14:00:06 rx200-088 kernel: LustreError: 2726:0:(llite_lib.c:562:client_common_fill_super()) lustre: can't make root dentry -------------------------------------------- Jan 8 14:00:11 rx200-088 kernel: LustreError: 2773:0:(lov_obd.c:465:lov_set_osc_active()) ===== ACTIVE WAIT ===== Jan 8 14:00:16 rx200-088 kernel: LustreError: 2753:0:(lu_object.c:1114:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 Jan 8 14:00:16 rx200-088 kernel: LustreError: 2773:0:(lov_obd.c:469:lov_set_osc_active()) ===== ACTIVE END ===== Jan 8 14:00:16 rx200-088 kernel: LustreError: 2773:0:(lov_obd.c:503:lov_notify()) event(2) of lustre-OST0001_UUID failed: -22 Jan 8 14:00:16 rx200-088 kernel: LustreError: 2753:0:(lu_object.c:1114:lu_device_fini()) LBUG Jan 8 14:00:16 rx200-088 kernel: Pid: 2753, comm: obd_zombid Jan 8 14:00:16 rx200-088 kernel: Jan 8 14:00:16 rx200-088 kernel: Call Trace: Jan 8 14:00:16 rx200-088 kernel: [<ffffffffa03b5905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Jan 8 14:00:16 rx200-088 kernel: [<ffffffffa03b5f17>] lbug_with_loc+0x47/0xb0 [libcfs] Jan 8 14:00:16 rx200-088 kernel: [<ffffffffa0533ecc>] lu_device_fini+0xcc/0xd0 [obdclass] Jan 8 14:00:16 rx200-088 kernel: [<ffffffffa0948b9e>] osc_device_free+0x6e/0x220 [osc] Jan 8 14:00:16 rx200-088 kernel: [<ffffffffa0511d8d>] class_decref+0x46d/0x590 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffffa04e9c78>] ? class_import_destroy+0x208/0x450 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffffa04ede29>] obd_zombie_impexp_cull+0x309/0x610 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffffa04ee1f5>] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 Jan 8 14:00:17 rx200-088 kernel: [<ffffffffa04ee130>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20 Jan 8 14:00:17 rx200-088 kernel: [<ffffffffa04ee130>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffffa04ee130>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] Jan 8 14:00:17 rx200-088 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20 Jan 8 14:00:17 rx200-088 kernel: Jan 8 14:00:17 rx200-088 kernel: Kernel panic - not syncing: LBUG ----------------------------- To fix the problems, I've added a new rw_semaphore into struct lov_obd to exclude lov_notify() and lov_del_target(). I think this is one of the right ways to fix the problems because I've never seen all of the problems since I applied the patch. But I'm wondering whether or not I'm allowed to add a new menber into a basic object. I'll upload the patch soon, so could you please review it? |
| Comments |
| Comment by Hiroya Nozaki [ 09/Jan/13 ] |
|
patch for the master branch |
| Comment by Jian Yu [ 21/Feb/13 ] |
|
Hi Hiroya, Are you going to port the patch to Lustre b2_1 and b1_8 branches? |
| Comment by Hiroya Nozaki [ 21/Feb/13 ] |
|
Hi Jian. |
| Comment by Hiroya Nozaki [ 25/Feb/13 ] |
|
patch for b1_8 |
| Comment by Peter Jones [ 25/Feb/13 ] |
|
Landed for 2.4 |
| Comment by Hiroya Nozaki [ 25/Feb/13 ] |
|
Hi, Jian. |
| Comment by Jian Yu [ 25/Feb/13 ] |
Hi Hiroya, |
| Comment by Hiroya Nozaki [ 25/Feb/13 ] |
|
Thank you for your advice, Jian. |
| Comment by Jian Yu [ 25/Feb/13 ] |
|
Thank you, Hiroya. |