Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7938

Recovery on secondary OSS node stalled

Details

    • Bug
    • Resolution: Not a Bug
    • Blocker
    • Lustre 2.9.0
    • Lustre 2.8.0, Lustre 2.9.0
    • lola
      build: 2.8 GA + patches
    • 3
    • 9223372036854775807

    Description

      Description

      Error happens during soak testing of build '20160324' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160324). DNE is enabled. MDTs had been formatted with ldiskfs, OSTs using zfs. MDS and OSS nodes are configured in HA active-active failover configuration. MDS nodes operated wiht 1 MDT per MDS, while OSSes running 4 OST per node.
      Nodes lola-4 and lola-5 form a HA cluster.

      Event history

      • 2016-03-29 07:48:04,307:fsmgmt.fsmgmt:INFO triggering fault oss_failover of node lola-4
        powercyle node
      • 2016-03-29 07:52:27,876:fsmgmt.fsmgmt:INFO lola-4 is up
      • 2016-03-29 07:53:09,557:fsmgmt.fsmgmt:INFO zpool import and mount of lola-4's OSTs complete
      • Recovery don't complete after recovery_time is zero
      • 2016-03-29 08:03 (approximately) Aborted recovery manually (lctl --device ... abort_recovery)

      Attached files:
      messages, console and debug log before (lustre-log-20160329-0759-recovery-stalled) and after recovery was aborted (lustre-log-20160329-0803-recovery-aborted)

      Attachments

        Activity

          [LU-7938] Recovery on secondary OSS node stalled

          Failover works after reformat of Lustre FS. Ticket can be closed.

          heckes Frank Heckes (Inactive) added a comment - Failover works after reformat of Lustre FS. Ticket can be closed.

          failover.node is specified for mkfs.lustre command via option --servicenode

          heckes Frank Heckes (Inactive) added a comment - failover.node is specified for mkfs.lustre command via option --servicenode

          reformat is onging. I'm going to start soak again, once it has completed.

          heckes Frank Heckes (Inactive) added a comment - reformat is onging. I'm going to start soak again, once it has completed.

          I set the parameter 'lustre:failover.node' for on all OSTs to the appropriate values for the failover nodes (see attached file ost-failover-setttings-20160726_0702).
          Actually this didn't solve the problem. Recovery is stalled still on failover node.

          I also reformatted the soak FS. The parameter setting will be filled (e.g.) with the following content:

          192.168.1.102: mkfs_cmd = zfs create -o canmount=off -o xattr=sa soaked-ost18/ost18
          192.168.1.102: Writing soaked-ost18/ost18 properties
          192.168.1.102:   lustre:version=1
          192.168.1.102:   lustre:flags=4194
          192.168.1.102:   lustre:index=18
          192.168.1.102:   lustre:fsname=soaked
          192.168.1.102:   lustre:svname=soaked:OST0012
          192.168.1.102:   lustre:mgsnode=192.168.1.108@o2ib10:192.168.1.109@o2ib10
          192.168.1.102:   lustre:failover.node=192.168.1.102@o2ib10:192.168.1.103@o2ib10
          

          Additionally I used option '--force-nohostid' for mkfs.lustre - command. I think the patch enabling hostid haven't landed, yet and spl fails during reformat if /etc/hostid isn't available.

          heckes Frank Heckes (Inactive) added a comment - I set the parameter ' lustre:failover.node ' for on all OSTs to the appropriate values for the failover nodes (see attached file ost-failover-setttings-20160726_0702). Actually this didn't solve the problem. Recovery is stalled still on failover node. I also reformatted the soak FS. The parameter setting will be filled (e.g.) with the following content: 192.168.1.102: mkfs_cmd = zfs create -o canmount=off -o xattr=sa soaked-ost18/ost18 192.168.1.102: Writing soaked-ost18/ost18 properties 192.168.1.102: lustre:version=1 192.168.1.102: lustre:flags=4194 192.168.1.102: lustre:index=18 192.168.1.102: lustre:fsname=soaked 192.168.1.102: lustre:svname=soaked:OST0012 192.168.1.102: lustre:mgsnode=192.168.1.108@o2ib10:192.168.1.109@o2ib10 192.168.1.102: lustre:failover.node=192.168.1.102@o2ib10:192.168.1.103@o2ib10 Additionally I used option ' --force-nohostid ' for mkfs.lustre - command. I think the patch enabling hostid haven't landed, yet and spl fails during reformat if /etc/hostid isn't available.

          Indeed it's a configuration error. Looks like NID of the secondary node (192.168.1.103@o2ib) weren't written during FS format.
          I set them explicitly again using zfs set lustre:failover.node, and check whether this resolves the issue.

          heckes Frank Heckes (Inactive) added a comment - Indeed it's a configuration error. Looks like NID of the secondary node ( 192.168.1.103@o2ib ) weren't written during FS format. I set them explicitly again using zfs set lustre:failover.node , and check whether this resolves the issue.

          Thanks Frank for the detailed logs. According to the client side log, it always tried to connect lola2 for the failed OSTs (OST0012 OST0000 OST0006 OST000c):

          00000100:00000001:30.0:1469462043.991030:0:125723:0:(import.c:508:import_select_connection()) Process entered
          00000100:00080000:30.0:1469462043.991033:0:125723:0:(import.c:523:import_select_connection()) soaked-OST0000-osc-ffff8810717d2000: connect to NID 192.168.1.102@o2ib10 last attempt 5189761512
          00000100:00080000:30.0:1469462043.991036:0:125723:0:(import.c:567:import_select_connection()) soaked-OST0000-osc-ffff8810717d2000: tried all connections, increasing latency to 50s
          00000100:00080000:30.0:1469462043.991099:0:125723:0:(import.c:601:import_select_connection()) soaked-OST0000-osc-ffff8810717d2000: import ffff880ff6706000 using connection 192.168.1.102@o2ib10/192.168.1.102@o2ib10
          00000100:00000001:30.0:1469462043.991104:0:125723:0:(import.c:605:import_select_connection()) Process leaving (rc=0 : 0 : 0)
          ...

          That means there is only one candidate can be selected to connect the OST0000, but according to your description, there should be another candidate 192.168.1.103@o2ib for connection. So I am wondering how did you config the lola2 and lola3 as the failover pairs?

          yong.fan nasf (Inactive) added a comment - Thanks Frank for the detailed logs. According to the client side log, it always tried to connect lola2 for the failed OSTs (OST0012 OST0000 OST0006 OST000c): 00000100:00000001:30.0:1469462043.991030:0:125723:0:(import.c:508:import_select_connection()) Process entered 00000100:00080000:30.0:1469462043.991033:0:125723:0:(import.c:523:import_select_connection()) soaked-OST0000-osc-ffff8810717d2000: connect to NID 192.168.1.102@o2ib10 last attempt 5189761512 00000100:00080000:30.0:1469462043.991036:0:125723:0:(import.c:567:import_select_connection()) soaked-OST0000-osc-ffff8810717d2000: tried all connections, increasing latency to 50s 00000100:00080000:30.0:1469462043.991099:0:125723:0:(import.c:601:import_select_connection()) soaked-OST0000-osc-ffff8810717d2000: import ffff880ff6706000 using connection 192.168.1.102@o2ib10/192.168.1.102@o2ib10 00000100:00000001:30.0:1469462043.991104:0:125723:0:(import.c:605:import_select_connection()) Process leaving (rc=0 : 0 : 0) ... That means there is only one candidate can be selected to connect the OST0000, but according to your description, there should be another candidate 192.168.1.103@o2ib for connection. So I am wondering how did you config the lola2 and lola3 as the failover pairs?

          debug files have been uploaded. Roles:
          lola-8: MGS
          lola-3: secondary OSS
          lola-30: Lustre client

          heckes Frank Heckes (Inactive) added a comment - debug files have been uploaded. Roles: lola-8: MGS lola-3: secondary OSS lola-30: Lustre client
          heckes Frank Heckes (Inactive) added a comment - - edited

          Resources of lola-2 (OST0012 OST0000 OST0006 OST000c) failed over to lola-3

          • 2016-07-25 08:49:12,572:fsmgmt.fsmgmt:INFO Wait for recovery to complete
          • 2016-07-25 08:55 created debug logs
          • 2016-07-25 08:57:51 aborted recovery
          • 2016-07-25 08:59 created debug logs

          Upload of debug logs has been started, but it might take some time.

          heckes Frank Heckes (Inactive) added a comment - - edited Resources of lola-2 ( OST0012 OST0000 OST0006 OST000c ) failed over to lola-3 2016-07-25 08:49:12,572:fsmgmt.fsmgmt:INFO Wait for recovery to complete 2016-07-25 08:55 created debug logs 2016-07-25 08:57:51 aborted recovery 2016-07-25 08:59 created debug logs Upload of debug logs has been started, but it might take some time.

          People

            yong.fan nasf (Inactive)
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: