Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1503

Clients application IO errors and overloaded system messages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.2.0
    • None
    • 1
    • 4042

    Description

      Dear Support,
      we experienced some problem with our Lustre FS.
      Our users complained that at the following times their jobs were killed due IO errors.

      Saturday 09 June 05:58
      Monday 11 June 12:47

      We collected the logs from the servers and clients side and actually we saw a lot of messages/errors that we have problem to "decode".
      Could you please help us to undestand why this problem arise ?
      In the specific we don't understand if it's really a overload problem related to the hardware or configuration we used otherwise some congestion/bug issue...

      Usually the FS seems to work correctly but suddently the log fill up of these messages.

      Regards
      Nicola

      Attachments

        1. cluster.log
          5.71 MB
        2. cluster.log-2012-06-09_05
          2.28 MB
        3. cluster.log-2012-06-11_12
          1.13 MB
        4. craylog-2012-06-09.log
          7.07 MB
        5. craylog-2012-06-11.log
          4.53 MB
        6. debug_lustre
          35 kB
        7. drop_conn.log
          3.64 MB
        8. ganglia-load-2012-06-09.pdf
          172 kB
        9. ganglia-load-2012-06-11.pdf
          173 kB
        10. log_11_jul
          99 kB
        11. log1
          1.13 MB
        12. messages_router_node.log
          154 kB

        Activity

          People

            cliffw Cliff White (Inactive)
            nbianchi Nicola Bianchi
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: