Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3946

Frequent client eviction

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.4.0
    • None
    • 3
    • 10466

    Description

      Dear support,

      our customer is experiencing an eviction problem on the login/client nodes in his lustre cluster.
      The eviction seems too frequent and, since seems to be no other way to recover then reboot the node, this interrupts the users work heavily.
      The cause of eviction seems to be that the login nodes may be temporarily stuck by different applications for fractions of seconds and consequently the oss does not see the client for minutes (according to logs).
      The customer attempted to solve the issue by booting the login nodes with kernel options "notsc" and "clocksource=hpet" to no avail.
      Also, they mounted lustre over tcp instead of IB, which also did not help.

      Infrastructure info:
      2 node for MDS/MGS with pacemaker/corosync cluster
      8 node for OSS in a 2 node cluster configuration with pacemaker/corosync cluster with a dothill storage controller per pair.
      ~1000 client

      We attach the messages log file for several days.
      For example we found about 280 clients evicted in 5 days, it is normal?

      For clarity login nodes have names brutus[1-4] and IB IP 10.201.32.31-34 and Ethernet IP 10.201.0.31-34.

      Is it possible that this issue arise from the different version of lustre software btw clients and servers?

      Many thanks in advance for your help.

      Attachments

        1. brutus2_eviction.txt
          3 kB
        2. logs_timeout_200s.tar.bz2
          353 kB
        3. lustre_and_login_nodes_logs.tar.bz2aa
          0.3 kB
        4. lustre_and_login_nodes_logs.tar.bz2ab
          0.3 kB
        5. messages_brutus2.txt
          103 kB
        6. requested_outputs.tar.bz2
          1 kB
        7. vmcore-dmesg.txt
          138 kB
        8. vmcore-dmesg.txt
          134 kB

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            matteo.piccinini Matteo Piccinini (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: