Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1503

Clients application IO errors and overloaded system messages

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • None
    • Lustre 2.2.0
    • None
    • 1
    • 4042

      Dear Support,
      we experienced some problem with our Lustre FS.
      Our users complained that at the following times their jobs were killed due IO errors.

      Saturday 09 June 05:58
      Monday 11 June 12:47

      We collected the logs from the servers and clients side and actually we saw a lot of messages/errors that we have problem to "decode".
      Could you please help us to undestand why this problem arise ?
      In the specific we don't understand if it's really a overload problem related to the hardware or configuration we used otherwise some congestion/bug issue...

      Usually the FS seems to work correctly but suddently the log fill up of these messages.

      Regards
      Nicola

        1. cluster.log
          5.71 MB
        2. cluster.log-2012-06-09_05
          2.28 MB
        3. cluster.log-2012-06-11_12
          1.13 MB
        4. craylog-2012-06-09.log
          7.07 MB
        5. craylog-2012-06-11.log
          4.53 MB
        6. debug_lustre
          35 kB
        7. drop_conn.log
          3.64 MB
        8. ganglia-load-2012-06-09.pdf
          172 kB
        9. ganglia-load-2012-06-11.pdf
          173 kB
        10. log_11_jul
          99 kB
        11. log1
          1.13 MB
        12. messages_router_node.log
          154 kB

            cliffw Cliff White (Inactive)
            nbianchi Nicola Bianchi
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: