[LU-292] Test failure on test suite recovery-small Created: 07/May/11 Updated: 23/Aug/12 Resolved: 10/Aug/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Liang Zhen (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4922 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/4f87c578-77b4-11e0-9b1b-52540025f9af. |
| Comments |
| Comment by Mikhail Pershin [ 07/May/11 ] |
|
this is quite similar to bug ORI-125 I am working on currently, if this issue can be reproduced I have the possible fix already to check it. UPD: no, this is quite different, LBUG occurs before exports and zombie barriers. |
| Comment by Jian Yu [ 29/Jul/11 ] |
|
|
| Comment by Jian Yu [ 01/Aug/11 ] |
|
Lustre Clients: Lustre Servers: After running sanity-benchmark, unmounting MDS hit the same LBUG: Aug 1 04:54:30 fat-amd-1 rshd[12437]: root@fat-amd-3-ib.lab.whamcloud.com as root: cmd='(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/li b64/lustre/tests; sh -c "umount -d -f /mnt/mds");echo XXRETCODE:$?' Aug 1 04:54:31 fat-amd-1 kernel: LustreError: 2568:0:(service.c:2704:ptlrpc_unregister_service()) ASSERTION(service->srv_n_queued_reqs == 0) failed Aug 1 04:54:31 fat-amd-1 kernel: LustreError: 2568:0:(service.c:2704:ptlrpc_unregister_service()) LBUG Aug 1 04:54:31 fat-amd-1 kernel: Pid: 2568, comm: obd_zombid Aug 1 04:54:31 fat-amd-1 kernel: Aug 1 04:54:31 fat-amd-1 kernel: Call Trace: Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa0370855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa0370e95>] lbug_with_loc+0x75/0xe0 [libcfs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa037bcb6>] libcfs_assertion_failed+0x66/0x70 [libcfs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa057ea03>] ptlrpc_unregister_service+0xb83/0xc20 [ptlrpc] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8105dc72>] ? default_wake_function+0x12/0x20 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8104af29>] ? __wake_up_common+0x59/0x90 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8104f843>] ? __wake_up+0x53/0x70 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa08a6acc>] mgs_cleanup+0x4c/0x220 [mgs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa046375a>] class_decref+0x19a/0x610 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff810dbebe>] ? call_rcu+0xe/0x10 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa08a632f>] ? mgs_destroy_export+0x3f/0x110 [mgs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa044eac5>] obd_zombie_impexp_cull+0x335/0x5a0 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8108e51c>] ? remove_wait_queue+0x3c/0x50 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa044ee35>] obd_zombie_impexp_thread+0x105/0x270 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8100c1ca>] child_rip+0xa/0x20 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa044ed30>] ? obd_zombie_impexp_thread+0x0/0x270 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Aug 1 04:54:31 fat-amd-1 kernel: Aug 1 04:54:31 fat-amd-1 kernel: Kernel panic - not syncing: LBUG Aug 1 04:54:31 fat-amd-1 kernel: Pid: 2568, comm: obd_zombid Tainted: G ---------------- T 2.6.32-131.2.1.el6_lustre.x86_64 #1 Aug 1 04:54:31 fat-amd-1 kernel: Call Trace: Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff814db1b8>] ? panic+0x78/0x143 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa0370eeb>] ? lbug_with_loc+0xcb/0xe0 [libcfs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa037bcb6>] ? libcfs_assertion_failed+0x66/0x70 [libcfs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa057ea03>] ? ptlrpc_unregister_service+0xb83/0xc20 [ptlrpc] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8105dc72>] ? default_wake_function+0x12/0x20 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8104af29>] ? __wake_up_common+0x59/0x90 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8104f843>] ? __wake_up+0x53/0x70 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa08a6acc>] ? mgs_cleanup+0x4c/0x220 [mgs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa046375a>] ? class_decref+0x19a/0x610 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff810dbebe>] ? call_rcu+0xe/0x10 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa08a632f>] ? mgs_destroy_export+0x3f/0x110 [mgs] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa044eac5>] ? obd_zombie_impexp_cull+0x335/0x5a0 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8108e51c>] ? remove_wait_queue+0x3c/0x50 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa044ee35>] ? obd_zombie_impexp_thread+0x105/0x270 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8105dc60>] ? default_wake_function+0x0/0x20 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffffa044ed30>] ? obd_zombie_impexp_thread+0x0/0x270 [obdclass] Aug 1 04:54:31 fat-amd-1 kernel: [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 sanity-quota test 29 also hit the above LBUG: https://maloo.whamcloud.com/test_sets/8f500b08-bc3c-11e0-8bdf-52540025f9af |
| Comment by Liang Zhen (Inactive) [ 01/Aug/11 ] |
|
I've posted a patch for this: |
| Comment by Peter Jones [ 03/Aug/11 ] |
|
Liang has worked on this |
| Comment by Liang Zhen (Inactive) [ 04/Aug/11 ] |
|
the test failed because this:
it seems that it's a userspace tool issue, and I don't think it has anything to do with this patch |
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Peter Jones [ 10/Aug/11 ] |
|
Landed for 2.1 |
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Build Master (Inactive) [ 10/Aug/11 ] |
|
Integrated in Oleg Drokin : 90444d82f5bbeeca44a71809f88aed71515a2da8
|
| Comment by Nathan Rutman [ 22/Aug/12 ] |
|
bz23289 screwed this up. The fix in diff --git a/lustre/ptlrpc/service.c b/lustre/ptlrpc/service.c
index a0bd65b..7d37f0c 100644
--- a/lustre/ptlrpc/service.c
+++ b/lustre/ptlrpc/service.c
@@ -1516,7 +1516,6 @@ ptlrpc_server_handle_req_in(struct ptlrpc_service *svc)
req = cfs_list_entry(svc->srv_req_in_queue.next,
struct ptlrpc_request, rq_list);
cfs_list_del_init (&req->rq_list);
- svc->srv_n_queued_reqs--;
/* Consider this still a "queued" request as far as stats are
concerned */
cfs_spin_unlock(&svc->srv_lock);
@@ -1631,6 +1630,7 @@ ptlrpc_server_handle_req_in(struct ptlrpc_service *svc)
err_req:
cfs_spin_lock(&svc->srv_rq_lock);
+ svc->srv_n_queued_reqs--;
svc->srv_n_active_reqs++;
cfs_spin_unlock(&svc->srv_rq_lock);
ptlrpc_server_finish_request(svc, req);
@@ -1691,6 +1691,7 @@ ptlrpc_server_handle_request(struct ptlrpc_service *svc,
}
cfs_list_del_init(&request->rq_list);
+ svc->srv_n_queued_reqs--;
svc->srv_n_active_reqs++;
if (request->rq_hp)
svc->srv_n_active_hpreq++;
@@ -2749,6 +2750,7 @@ int ptlrpc_unregister_service(struct ptlrpc_service *service)
req = ptlrpc_server_request_get(service, 1);
cfs_list_del(&req->rq_list);
+ service->srv_n_queued_reqs--;
service->srv_n_active_reqs++;
ptlrpc_server_finish_request(service, req);
}
|
| Comment by Liang Zhen (Inactive) [ 23/Aug/12 ] |
|
Nathan, because svc::srv_n_queued_reqs is protected by srv_lock, not svc::srv_rq_lock, so we can't fix it by your way. |