Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2458

Device MGC already exists, won't add

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.6
    • Lustre 2.1.4, Lustre 2.1.6
    • None
    • 3
    • 5801

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/d34560e2-42a0-11e2-8dba-52540035b04c.

      The sanity.sh test_77i, conf-sanity.sh test_6, recovery-small.sh test_62 failed with the following error in the console logs:

      class_newdev()) Device MGC10.10.4.160@tcp already exists, won't add
      class_attach()) Cannot create device MGC10.10.4.160@tcp of type mgc : -17
      lustre_start_simple()) MGC10.10.4.160@tcp attach error -17
      lustre_fill_super()) Unable to mount  (-17)
      DEBUG MARKER: conf-sanity test_6: @@@@@@ FAIL: test_6 failed with 87
      

      Info required for matching: mount failed

      Attachments

        Activity

          [LU-2458] Device MGC already exists, won't add

          Bob, according to the Lustre log taken during failure, this problem could the b2_1 duplicate of LU-639! So, I decided to back-port associated change (http://review.whamcloud.com/1896) to b2_1 and see if it fixes. I give it a try at http://review.whamcloud.com/6670.

          bfaccini Bruno Faccini (Inactive) added a comment - Bob, according to the Lustre log taken during failure, this problem could the b2_1 duplicate of LU-639 ! So, I decided to back-port associated change ( http://review.whamcloud.com/1896 ) to b2_1 and see if it fixes. I give it a try at http://review.whamcloud.com/6670 .
          bogl Bob Glossman (Inactive) added a comment - more b2_1: https://maloo.whamcloud.com/test_sessions/fb0ca642-d479-11e2-9e73-52540035b04c
          bogl Bob Glossman (Inactive) added a comment - in b2_1, sanity, conf-sanity, recovery-small: https://maloo.whamcloud.com/test_sessions/e3e05362-d43b-11e2-9e73-52540035b04c
          yujian Jian Yu added a comment -

          Lustre Tag: v2_1_6_RC1
          Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/208
          Distro/Arch: RHEL5.9/i686 (client), RHEL5.9/x86_64 (server)

          sanity, conf-sanity, recovery-small, replay-dual tests failed with this issue again:
          https://maloo.whamcloud.com/test_sessions/dad32c08-cda7-11e2-ba28-52540035b04c

          yujian Jian Yu added a comment - Lustre Tag: v2_1_6_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/208 Distro/Arch: RHEL5.9/i686 (client), RHEL5.9/x86_64 (server) sanity, conf-sanity, recovery-small, replay-dual tests failed with this issue again: https://maloo.whamcloud.com/test_sessions/dad32c08-cda7-11e2-ba28-52540035b04c
          yujian Jian Yu added a comment -

          Lustre Tag: v2_1_6_RC1
          Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/208
          Distro/Arch: RHEL5.9/x86_64

          sanity, conf-sanity, recovery-small tests failed with this issue again:
          https://maloo.whamcloud.com/test_sessions/8e59afbc-cd68-11e2-a1e0-52540035b04c

          yujian Jian Yu added a comment - Lustre Tag: v2_1_6_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/208 Distro/Arch: RHEL5.9/x86_64 sanity, conf-sanity, recovery-small tests failed with this issue again: https://maloo.whamcloud.com/test_sessions/8e59afbc-cd68-11e2-a1e0-52540035b04c
          yujian Jian Yu added a comment - Lustre Client: v2_1_5_RC1 Lustre Server: 2.1.4 conf-sanity, recovery-small and replay-dual failed with this issue again: https://maloo.whamcloud.com/test_sets/0f16b3ba-9327-11e2-b06e-52540035b04c https://maloo.whamcloud.com/test_sets/17874dc8-9329-11e2-b06e-52540035b04c https://maloo.whamcloud.com/test_sets/7ff9bc14-932a-11e2-b06e-52540035b04c
          yujian Jian Yu added a comment -

          Hi Chris,

          Did you turn on "jitter" on the test nodes again?

          yujian Jian Yu added a comment - Hi Chris, Did you turn on "jitter" on the test nodes again?
          yujian Jian Yu added a comment - Lustre Branch: b2_1 Lustre Build: http://build.whamcloud.com/job/lustre-reviews/13578/ Distro/Arch: RHEL5.9/x86_64 sanity test_77i, conf-sanity, recovery-small test_62 failed with this issue again: https://maloo.whamcloud.com/test_sets/d511b2f0-834f-11e2-98f5-52540035b04c https://maloo.whamcloud.com/test_sets/243a6366-8352-11e2-98f5-52540035b04c https://maloo.whamcloud.com/test_sets/681304c4-8354-11e2-98f5-52540035b04c
          yujian Jian Yu added a comment -

          Lustre Tag: v2_1_4_RC1
          Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/
          Distro/Arch: RHEL5.8/x86_64 + RHEL5.8/i686 (Server + Client)

          While running replay-dual tests, the same issue occurred:
          https://maloo.whamcloud.com/test_sets/df2f801c-4ad0-11e2-b87e-52540035b04c

          yujian Jian Yu added a comment - Lustre Tag: v2_1_4_RC1 Lustre Build: http://build.whamcloud.com/job/lustre-b2_1/159/ Distro/Arch: RHEL5.8/x86_64 + RHEL5.8/i686 (Server + Client) While running replay-dual tests, the same issue occurred: https://maloo.whamcloud.com/test_sets/df2f801c-4ad0-11e2-b87e-52540035b04c
          bogl Bob Glossman (Inactive) added a comment - - edited

          It now appears that these failures are side effects of the increased number of virtual cpus in our test VMs done for TT-928/TT-968. With that change backed out of our test environment repeating the test run for http://review.whamcloud.com/#change,4516 (build 11037) now passes where it failed before.

          Begs the question of if our testing is really appropriate. With more and more mulitcore servers and clients in the real world, maybe our test framework should work correctly on multicore test nodes.

          bogl Bob Glossman (Inactive) added a comment - - edited It now appears that these failures are side effects of the increased number of virtual cpus in our test VMs done for TT-928/TT-968. With that change backed out of our test environment repeating the test run for http://review.whamcloud.com/#change,4516 (build 11037) now passes where it failed before. Begs the question of if our testing is really appropriate. With more and more mulitcore servers and clients in the real world, maybe our test framework should work correctly on multicore test nodes.

          People

            bogl Bob Glossman (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: