Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11853

Automated update peer NID state if client changed from multi-rail to non multi-rail

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 2.12 and master
    • 3
    • 9223372036854775807

    Description

      Currently, when if client changed multi-rail to non multi-rail setting, client can't mount filesystem unless current client's peer nid state on servers removed.

      options lnet networks="o2ib10(ib0,ib2)"
      
      [root@s184 ~]# mount -t lustre 10.0.11.90@o2ib10:/cache1 /cache1
      [root@s184 ~]# lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: o2ib10
            local NI(s):
              - nid: 10.0.10.184@o2ib10
                status: up
                interfaces:
                    0: ib0
              - nid: 10.2.10.184@o2ib10
                status: up
                interfaces:
                    0: ib2
      

      if NID state changed and remount lustre on client fails unless clear all that client state on all servers.

      options lnet networks="o2ib10(ib0)"
      
      [root@s184 ~]# umount -t lustre -a
      [root@s184 ~]# lustre_rmmod 
      [root@s184 ~]# mount -t lustre 10.0.11.90@o2ib10:/cache1 /cache1
      mount.lustre: mount 10.0.11.90@o2ib10:/cache1 at /cache1 failed: Input/output error
      Is the MGS running?
      

      Server side, client peer state is still multi-rail.

      [root@es14k-vm1 ~]# lnetctl peer show
      peer:
          - primary nid: 0@lo
            Multi-Rail: False
            peer ni:
              - nid: 0@lo
                state: NA
          - primary nid: 10.0.11.92@o2ib10
            Multi-Rail: True
            peer ni:
              - nid: 10.0.11.92@o2ib10
                state: NA
              - nid: 10.1.11.92@o2ib10
                state: NA
          - primary nid: 10.0.11.91@o2ib10
            Multi-Rail: True
            peer ni:
              - nid: 10.0.11.91@o2ib10
                state: NA
              - nid: 10.1.11.91@o2ib10
                state: NA
          - primary nid: 10.0.11.93@o2ib10
            Multi-Rail: True
            peer ni:
              - nid: 10.0.11.93@o2ib10
                state: NA
              - nid: 10.1.11.93@o2ib10
                state: NA
          - primary nid: 10.0.10.184@o2ib10
            Multi-Rail: True <------ Still Multi-rail
            peer ni:
              - nid: 10.0.10.184@o2ib10
                state: NA
              - nid: 10.2.10.184@o2ib10
                state: NA
      

      a workaround is removing nid state on all servers, then mount it again. that works, but perfer automated peer state update.

      [root@es14k-vm1 ~]# clush -g oss lnetctl peer del --prim_nid 10.0.10.184@o2ib10 --nid 10.0.10.184@o2ib10
      [root@es14k-vm1 ~]# clush -g oss lnetctl peer del --prim_nid 10.0.10.184@o2ib10 --nid 10.2.10.184@o2ib10
      
      [root@s184 ~]# mount -t lustre 10.0.11.90@o2ib10:/cache1 /cache1
      [root@s184 ~]# 
      

      Attachments

        Activity

          People

            ashehata Amir Shehata (Inactive)
            sihara Shuichi Ihara
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: