[LU-2458] Device MGC already exists, won't add Created: 10/Dec/12 Updated: 19/Jun/13 Resolved: 19/Jun/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.4, Lustre 2.1.6 |
| Fix Version/s: | Lustre 2.1.6 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 5801 |
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d34560e2-42a0-11e2-8dba-52540035b04c. The sanity.sh test_77i, conf-sanity.sh test_6, recovery-small.sh test_62 failed with the following error in the console logs: class_newdev()) Device MGC10.10.4.160@tcp already exists, won't add class_attach()) Cannot create device MGC10.10.4.160@tcp of type mgc : -17 lustre_start_simple()) MGC10.10.4.160@tcp attach error -17 lustre_fill_super()) Unable to mount (-17) DEBUG MARKER: conf-sanity test_6: @@@@@@ FAIL: test_6 failed with 87 Info required for matching: mount failed |
| Comments |
| Comment by Andreas Dilger [ 11/Dec/12 ] |
|
This looks like it may have started due to "jitter" introduction on the test nodes. Testing for recovery-small.sh had 5/8 failures since 12/10 05:30, but none before 12/08 06:26 (the previous b2_1 test run). http://review.whamcloud.com/4482 was the base for both passing and failing tests, but had been landed on 12/06 so it is questionable whether it was a code change that caused the problem. |
| Comment by Jian Yu [ 12/Dec/12 ] |
|
This is preventing the patches on b2_1 from passing review test group: and so on. |
| Comment by Jian Yu [ 12/Dec/12 ] |
|
This also affected the test results of b2_1 build #148 on 2.1.3<->2.1.4 interop combinations. |
| Comment by Bob Glossman (Inactive) [ 12/Dec/12 ] |
|
similar problem seen a few days earlier, also on a b2_1 test run: |
| Comment by Peter Jones [ 12/Dec/12 ] |
|
Bob Could you please look into this one? Thanks Peter |
| Comment by Bob Glossman (Inactive) [ 13/Dec/12 ] |
|
It now appears that these failures are side effects of the increased number of virtual cpus in our test VMs done for TT-928/TT-968. With that change backed out of our test environment repeating the test run for http://review.whamcloud.com/#change,4516 (build 11037) now passes where it failed before. Begs the question of if our testing is really appropriate. With more and more mulitcore servers and clients in the real world, maybe our test framework should work correctly on multicore test nodes. |
| Comment by Jian Yu [ 21/Dec/12 ] |
|
Lustre Tag: v2_1_4_RC1 While running replay-dual tests, the same issue occurred: |
| Comment by Jian Yu [ 05/Mar/13 ] |
|
Lustre Branch: b2_1 sanity test_77i, conf-sanity, recovery-small test_62 failed with this issue again: |
| Comment by Jian Yu [ 06/Mar/13 ] |
|
Hi Chris, Did you turn on "jitter" on the test nodes again? |
| Comment by Jian Yu [ 23/Mar/13 ] |
|
Lustre Client: v2_1_5_RC1 conf-sanity, recovery-small and replay-dual failed with this issue again: |
| Comment by Jian Yu [ 05/Jun/13 ] |
|
Lustre Tag: v2_1_6_RC1 sanity, conf-sanity, recovery-small tests failed with this issue again: |
| Comment by Jian Yu [ 05/Jun/13 ] |
|
Lustre Tag: v2_1_6_RC1 sanity, conf-sanity, recovery-small, replay-dual tests failed with this issue again: |
| Comment by Bob Glossman (Inactive) [ 13/Jun/13 ] |
|
in b2_1, sanity, conf-sanity, recovery-small: |
| Comment by Bob Glossman (Inactive) [ 14/Jun/13 ] |
|
more b2_1: |
| Comment by Bruno Faccini (Inactive) [ 17/Jun/13 ] |
|
Bob, according to the Lustre log taken during failure, this problem could the b2_1 duplicate of |