[LU-15123] sanity-quota: test_7a Error: 'reintegration failed' Created: 18/Oct/21 Updated: 02/Aug/23 Resolved: 26/Apr/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0, Lustre 2.15.3 |
| Fix Version/s: | Lustre 2.16.0, Lustre 2.15.4 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for paf <pfarrell@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/ded8a9a9-b77c-41ca-bb53-105290b2709e |
| Comments |
| Comment by Chris Horn [ 02/Dec/21 ] |
|
+1 on master - https://testing.whamcloud.com/test_sets/2d408dbe-88f6-428a-ae7e-0c8796fb3207 |
| Comment by Sergey Cheremencev [ 22/Dec/21 ] |
|
+1 on master - https://testing.whamcloud.com/test_sets/f07f8d61-0faa-41ad-9217-c238bd4c2bb0 I guess It fails because OST0000 waits for client 1 in a recovery blocking reintegration to start(and finish). Finally it evicts this client: [15386.910732] Lustre: lustre-OST0000: Will be in recovery for at least 1:00, or until 5 clients reconnect
...
[15485.424725] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-quota test_7a: @@@@@@ FAIL: reintegration failed
[15485.858740] Lustre: DEBUG MARKER: sanity-quota test_7a: @@@@@@ FAIL: reintegration failed
[15486.354991] Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /autotest/autotest-1/2021-12-15/lustre-reviews_review-dne-zfs-part-4_85112_1_13_4b065c95-2177-4a6a-b5c8-32b025199627//sanity-quota.test_7a.debug_log.$(hostname -s).1639605011.log;
dmesg > /autotest/autotest-1/2021-12-15/lustre-review
[15488.876821] Lustre: lustre-OST0000: recovery is timed out, evict stale exports
[15488.878099] Lustre: lustre-OST0000: disconnecting 1 stale clients
[15489.069798] Lustre: lustre-OST0000: Recovery over after 1:42, of 5 clients 4 recovered and 1 was evicted.
Logs from client1 dmesg: [15387.000236] Lustre: lustre-OST0000-osc-ffff9247ca727000: Connection to lustre-OST0000 (at 10.240.26.143@tcp) was lost; in progress operations using this service will wait for recovery to complete [15407.479024] Lustre: lustre-OST0001-osc-ffff9247ca727000: disconnect after 23s idle [15409.071515] Lustre: lustre-OST0000-osc-ffff9247ca727000: Connection restored to 10.240.26.143@tcp (at 10.240.26.143@tcp) [15474.038827] LustreError: 11-0: lustre-OST0000-osc-ffff9247ca727000: operation ost_disconnect to node 10.240.26.143@tcp failed: rc = -107
|
| Comment by Andreas Dilger [ 24/Mar/22 ] |
|
+3 on master, all on the same patch: |
| Comment by Andreas Dilger [ 18/Oct/22 ] |
|
Still being hit on master, 14/310 runs in the past week. |
| Comment by Qian Yingjin [ 17/Nov/22 ] |
|
+1 on master: |
| Comment by Alexander Zarochentsev [ 14/Dec/22 ] |
|
+1 on master: |
| Comment by Nikitas Angelinas [ 22/Dec/22 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/dc29fa4d-ef8c-4838-a182-7a544385f4cc |
| Comment by Nikitas Angelinas [ 17/Jan/23 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/3d837ba0-73c7-4737-9894-5bbca2c9b479 |
| Comment by Jian Yu [ 18/Apr/23 ] |
|
+1 on b2_15 branch: https://testing.whamcloud.com/test_sets/150ed79f-6d70-4048-b875-56a9bccc54cf |
| Comment by Alex Zhuravlev [ 19/Apr/23 ] |
|
[13964.128411] Lustre: lustre-OST0000: Imperative Recovery enabled, recovery window shrunk from 60-180 down to 60-180 reintegration starts only when recovery is over. in this case the recovery process was stuck due to a missing client (to be evicted in the end) and the recovery process took 102 seconds while test 7a waits 90s at most. |
| Comment by Gerrit Updater [ 19/Apr/23 ] |
|
"Alex Zhuravlev <bzzz@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50688 |
| Comment by Gerrit Updater [ 26/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50688/ |
| Comment by Peter Jones [ 26/Apr/23 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 06/Jun/23 ] |
|
"Xing Huang <hxing@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51233 |
| Comment by Gerrit Updater [ 02/Aug/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51233/ |