Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8089

MGT/MDT mount fails on secondary HA node

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.9.0
    • lola
      build: master commit 71d2ea0fde17ecde0bf237f486d4bafb5d54fe3f + patches
    • 3
    • 9223372036854775807

    Description

      The error happens during soak testing of build '20160427' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160427). DNE is enabled. OSTs had been formatted with zfs, MDT's using ldiskfs as storage backend. There's 1 MDT per MDS and 4 OSTs per OSS. The OSS and MDT nodes are configured in HA active-active failover configuration.

      The configuration, especially the mapping of node to role can be found here: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration.
      For this tickets MDS nodes lola-[8,9] which form a HA cluster are of interest.

      After a hard failover of the primary node (lola-8) had been initiated using pm -c , mounting the MGT/MDT (one device) on the secondary node (lola-9) failed with error message:

      Apr 29 14:00:36 lola-9 kernel: LDISKFS-fs warning (device dm-9): ldiskfs_multi_mount_protect: MMP interval 42 higher than expected, pleas
      e wait.
      ...
      ...
      Apr 29 14:19:06 lola-9 kernel: Lustre: Skipped 2 previous similar messages
      Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(genops.c:334:class_newdev()) Device MGC192.168.1.109@o2ib10 already exists at 2, won't add
      Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_config.c:370:class_attach()) Cannot create device MGC192.168.1.109@o2ib10 of type mgc : -17
      Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount.c:198:lustre_start_simple()) MGC192.168.1.109@o2ib10 attach error -17
      Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount_server.c:1512:server_put_super()) no obd soaked-MDT0000
      Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount_server.c:140:server_deregister_mount()) soaked-MDT0000 not registered
      Apr 29 14:19:06 lola-9 kernel: Lustre: 7204:0:(service.c:2096:ptlrpc_server_handle_request()) Skipped 9 previous similar messages
      Apr 29 14:19:06 lola-9 kernel: Lustre: server umount soaked-MDT0000 complete
      Apr 29 14:19:06 lola-9 kernel: LustreError: 32539:0:(obd_mount.c:1450:lustre_fill_super()) Unable to mount  (-17)
      

      The exact sequence of events reads as:

      • 2016-04-29 13:53:59,419:fsmgmt.fsmgmt:INFO triggering fault mds_failover for lola-8
      • 2016-04-29 14:00:25,257:fsmgmt.fsmgmt:INFO lola-8 is up!!!
      • 2016-04-29 14:00:36,270:fsmgmt.fsmgmt:INFO Mounting soaked-MDT0000 on lola-9 ...
      • 2016-04-29 14:19:06 mount failed with message above
        Attached files:
        lola-9 messages, console files, debug log files lustre-log.1461964594.5065 and lustre-log.1461964707.7204

      The effect can be reproduced manually.
      The device could be mounted not before mdt-1 (primary resource of lola-9) were umounted and Lustre modules had been reloaded.

      Try manually

      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/disk/by-id/dm-name-360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0/          mount.lustre: mount /dev/mapper/360080e50002ff4f00000026952013088p1 at /mnt/soaked-mdt0 failed: File exists
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/disk/by-id/dm-name-360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0/
      mount.lustre: mount /dev/mapper/360080e50002ff4f00000026952013088p1 at /mnt/soaked-mdt0 failed: File exists
      [root@lola-9 ~]# date
      Mon May  2 00:39:25 PDT 2016
      [root@lola-9 ~]# date
      Mon May  2 00:39:52 PDT 2016
      [root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0040
      Debug log: 2321 lines, 2321 kept, 0 dropped, 0 bad.
      

      ---> attached file lustre-log.2016-05-02-0040 also attached

      umount + reload Lustre modules on lola-9

      ##### Umount and remove lustre mods
      [root@lola-9 ~]# umount  /mnt/soaked-mdt1
      [root@lola-9 ~]# lustre_rmmod
      [root@lola-9 ~]# echo $?
      0
      [root@lola-9 ~]# lustre_rmmod
      [root@lola-9 ~]# lctl dl
      [root@lola-9 ~]# echo $?
      2
      [root@lola-9 ~]# date
      Mon May  2 00:41:10 PDT 2016
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/disk/by-id/dm-name-360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026d52013098p1 /mnt/soaked-mdt1
      [root@lola-9 ~]# ^Ctl debug_kernel 
      [root@lola-9 ~]# date
      Mon May  2 00:43:52 PDT 2016
      [root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02.0044
      Debug log: 177060 lines, 177060 kept, 0 dropped, 0 bad.
      [root@lola-9 ~]# mount
      /dev/sdk1 on / type ext3 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
      10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
      10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
      nfsd on /proc/fs/nfsd type nfsd (rw)
      /dev/mapper/360080e50002ff4f00000026952013088p1 on /mnt/soaked-mdt0 type lustre (rw,user_xattr)
      /dev/mapper/360080e50002ff4f00000026d52013098p1 on /mnt/soaked-mdt1 type lustre (rw,user_xattr)
      
      ### one more time
      [root@lola-9 ~]# umount /mnt/soaked-mdt1
      [root@lola-9 ~]# umount /mnt/soaked-mdt0
      [root@lola-9 ~]# lustre_rmmod
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
      [root@lola-9 ~]# echo $?
      0
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026d52013098p1 /mnt/soaked-mdt1
      [root@lola-9 ~]# echo $?
      0
      [root@lola-9 ~]# mount
      /dev/sdk1 on / type ext3 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
      10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
      10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
      nfsd on /proc/fs/nfsd type nfsd (rw)
      /dev/mapper/360080e50002ff4f00000026952013088p1 on /mnt/soaked-mdt0 type lustre (rw,user_xattr)
      /dev/mapper/360080e50002ff4f00000026d52013098p1 on /mnt/soaked-mdt1 type lustre (rw,user_xattr)
      [root@lola-9 ~]# date
      Mon May  2 00:49:25 PDT 2016
      [root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0049
      Debug log: 78514 lines, 78514 kept, 0 dropped, 0 bad.
      

      --> attached lustre-log.2016-05-02.0044, lustre-log.2016-05-02-0049

      mount mdt-0 on lola-8 and do failover again

      # Do failoverback to lola-8
      [root@lola-8 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
      mount.lustre: increased /sys/block/dm-8/queue/max_sectors_kb from 1024 to 16383
      mount.lustre: increased /sys/block/dm-4/queue/max_sectors_kb from 1024 to 16383
      mount.lustre: increased /sys/block/sdh/queue/max_sectors_kb from 1024 to 16383
      mount.lustre: increased /sys/block/sdc/queue/max_sectors_kb from 1024 to 16383
      [root@lola-8 ~]# echo $?
      0
      [root@lola-8 ~]# lctl dl
        0 UP osd-ldiskfs soaked-MDT0000-osd soaked-MDT0000-osd_UUID 35
        1 UP mgs MGS MGS 7
        2 UP mgc MGC192.168.1.108@o2ib10 31b8074d-c86c-18f5-40f6-bbc2e7cf6f72 5
        3 UP mds MDS MDS_uuid 3
        4 UP lod soaked-MDT0000-mdtlov soaked-MDT0000-mdtlov_UUID 4
        5 UP mdt soaked-MDT0000 soaked-MDT0000_UUID 45
        6 UP mdd soaked-MDD0000 soaked-MDD0000_UUID 4
        7 UP qmt soaked-QMT0000 soaked-QMT0000_UUID 4
        8 UP osp soaked-MDT0001-osp-MDT0000 soaked-MDT0000-mdtlov_UUID 5
        9 UP osp soaked-MDT0002-osp-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       10 UP osp soaked-MDT0003-osp-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       11 UP osp soaked-OST0000-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       12 UP osp soaked-OST0001-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       13 UP osp soaked-OST0002-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       14 UP osp soaked-OST0003-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       15 UP osp soaked-OST0004-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       16 UP osp soaked-OST0005-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       17 UP osp soaked-OST0006-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       18 UP osp soaked-OST0007-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       19 UP osp soaked-OST0008-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       20 UP osp soaked-OST0009-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       21 UP osp soaked-OST000a-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       22 UP osp soaked-OST000b-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       23 UP osp soaked-OST000c-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       24 UP osp soaked-OST000d-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       25 UP osp soaked-OST000e-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       26 UP osp soaked-OST000f-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       27 UP osp soaked-OST0010-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       28 UP osp soaked-OST0011-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       29 UP osp soaked-OST0012-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       30 UP osp soaked-OST0013-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       31 UP osp soaked-OST0014-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       32 UP osp soaked-OST0015-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       33 UP osp soaked-OST0016-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       34 UP osp soaked-OST0017-osc-MDT0000 soaked-MDT0000-mdtlov_UUID 5
       35 UP lwp soaked-MDT0000-lwp-MDT0000 soaked-MDT0000-lwp-MDT0000_UUID 5
      
      
      --->  now do manual hard failover again
      [root@lola-9 ~]#  mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
      mount.lustre: mount /dev/mapper/360080e50002ff4f00000026952013088p1 at /mnt/soaked-mdt0 failed: File exists
      [root@lola-9 ~]# date
      Mon May  2 01:05:13 PDT 2016
      [root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0105
      Debug log: 560 lines, 560 kept, 0 dropped, 0 bad.
      

      --> attached file lustre-log.2016-05-02-0105

      umount mdt-1 and restart Lustre again

      # ---> no restart lustre again on secondary node:
      [root@lola-9 ~]# umount /mnt/soaked-mdt1
      [root@lola-9 ~]# lustre_rmmod
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026d52013098p1 /mnt/soaked-mdt1
      [root@lola-9 ~]# echo $?
      0
      [root@lola-9 ~]# mount -t lustre -o rw,user_xattr /dev/mapper/360080e50002ff4f00000026952013088p1 /mnt/soaked-mdt0
      [root@lola-9 ~]# echo $?
      0
      [root@lola-9 ~]# mount
      /dev/sdk1 on / type ext3 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
      10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
      10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
      nfsd on /proc/fs/nfsd type nfsd (rw)
      /dev/mapper/360080e50002ff4f00000026d52013098p1 on /mnt/soaked-mdt1 type lustre (rw,user_xattr)
      /dev/mapper/360080e50002ff4f00000026952013088p1 on /mnt/soaked-mdt0 type lustre (rw,user_xattr)
      [root@lola-9 ~]# date
      Mon May  2 01:09:20 PDT 2016
      [root@lola-9 ~]# lctl debug_kernel /tmp/lustre-log.2016-05-02-0109
      Debug log: 119979 lines, 119979 kept, 0 dropped, 0 bad.
      

      --> attached file lustre-log.2016-05-02-0109

      Attachments

        Issue Links

          Activity

            [LU-8089] MGT/MDT mount fails on secondary HA node
            pjones Peter Jones added a comment -

            Landed for 2.9

            pjones Peter Jones added a comment - Landed for 2.9

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13726/
            Subject: LU-8089 lwp: change lwp export only at first connect
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 00ffad7af3bc7295797efd21eb16e7deaa715f45

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13726/ Subject: LU-8089 lwp: change lwp export only at first connect Project: fs/lustre-release Branch: master Current Patch Set: Commit: 00ffad7af3bc7295797efd21eb16e7deaa715f45

            During the night the error (or at least symptom) happened again for build mentioned in previous box (lustre-master #3406).
            Timing of events:

            • 2016-07-20 08:08:17 - failover started for MDS node lola-8 (MDT0000 == mgs)
            • 2016-07-20 08:14:57 - MDT0000 mount command started on secondary node (lola-9)
            • 2016-07-21 01:46 - mount command nor finished on lola-9 ; node booted after creating stack trace via sysrq-trigger

            The messages inside the debug log files look different then in those from the previous event. So eventually this is a new bug.

            Attached files:

            • messages, console, debug log files of MDS lola-9
            heckes Frank Heckes (Inactive) added a comment - During the night the error (or at least symptom) happened again for build mentioned in previous box (lustre-master #3406). Timing of events: 2016-07-20 08:08:17 - failover started for MDS node lola-8 (MDT0000 == mgs) 2016-07-20 08:14:57 - MDT0000 mount command started on secondary node (lola-9) 2016-07-21 01:46 - mount command nor finished on lola-9 ; node booted after creating stack trace via sysrq-trigger The messages inside the debug log files look different then in those from the previous event. So eventually this is a new bug. Attached files: messages, console, debug log files of MDS lola-9

            The error didn't occurred for soak test of build https://build.hpdd.intel.com/job/lustre-master/3406 (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713) during a test session that is ongoing and last already 7 days.

            heckes Frank Heckes (Inactive) added a comment - The error didn't occurred for soak test of build https://build.hpdd.intel.com/job/lustre-master/3406 (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160713 ) during a test session that is ongoing and last already 7 days.

            Moved patch here from LU-4214 since that ticket was closed but the patch was never landed to master:

            Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13726
            Subject: LU-8089 lwp: fix LWP client connect logic
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 8abaa93afd61f6c28e15e035a1a06ecf7f6d748e

            adilger Andreas Dilger added a comment - Moved patch here from LU-4214 since that ticket was closed but the patch was never landed to master: Mike Pershin (mike.pershin@intel.com) uploaded a new patch: http://review.whamcloud.com/13726 Subject: LU-8089 lwp: fix LWP client connect logic Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8abaa93afd61f6c28e15e035a1a06ecf7f6d748e

            Hi Mike,

            We are moving patch http://review.whamcloud.com/#/c/13726/ to this ticket since it was added to LU-4214 after the ticket was already closed and is directly related to this issue seen here. Can you please have a look?

            Thanks.
            Joe

            jgmitter Joseph Gmitter (Inactive) added a comment - Hi Mike, We are moving patch http://review.whamcloud.com/#/c/13726/ to this ticket since it was added to LU-4214 after the ticket was already closed and is directly related to this issue seen here. Can you please have a look? Thanks. Joe
            di.wang Di Wang added a comment -

            In lustre-log.2016-05-02-0105, it fails to restart the new MGC on lola-9 because of the mgc with same name already exists. But the original MGC should be MGC192.168.1.108@o2ib10, because the original MGS is 108. Frank, could you please list device on lola-9, before doing failover on lola-8. Thanks. (Or it is a failover MGC?)

            00000020:00000080:16.0:1462176165.502813:0:219876:0:(obd_config.c:362:class_attach()) attach type mgc name: MGC192.168.1.109@o2ib10 uuid: 523ef026-efaa-85c3-2ecd-2c5b9b575258
            00000020:00020000:16.0:1462176165.502820:0:219876:0:(genops.c:334:class_newdev()) Device MGC192.168.1.109@o2ib10 already exists at 2, won't add
            00000020:00020000:16.0:1462176165.514006:0:219876:0:(obd_config.c:370:class_attach()) Cannot create device MGC192.168.1.109@o2ib10 of type mgc : -17
            
            di.wang Di Wang added a comment - In lustre-log.2016-05-02-0105, it fails to restart the new MGC on lola-9 because of the mgc with same name already exists. But the original MGC should be MGC192.168.1.108@o2ib10, because the original MGS is 108. Frank, could you please list device on lola-9, before doing failover on lola-8. Thanks. (Or it is a failover MGC?) 00000020:00000080:16.0:1462176165.502813:0:219876:0:(obd_config.c:362:class_attach()) attach type mgc name: MGC192.168.1.109@o2ib10 uuid: 523ef026-efaa-85c3-2ecd-2c5b9b575258 00000020:00020000:16.0:1462176165.502820:0:219876:0:(genops.c:334:class_newdev()) Device MGC192.168.1.109@o2ib10 already exists at 2, won't add 00000020:00020000:16.0:1462176165.514006:0:219876:0:(obd_config.c:370:class_attach()) Cannot create device MGC192.168.1.109@o2ib10 of type mgc : -17
            heckes Frank Heckes (Inactive) added a comment - - edited

            This seems to affect the MGT only. Failover for other MDSes (MDTs) run smoothly.

            heckes Frank Heckes (Inactive) added a comment - - edited This seems to affect the MGT only. Failover for other MDSes (MDTs) run smoothly.

            People

              tappro Mikhail Pershin
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: