Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16497

various lustre errors on clients and servers

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.2
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      We're seeing quite a few errors on clients, OSSes and the MDS.

      For example on clients:

      Jan 16 09:53:09 juliet2 kernel: LustreError: 49499:0:(mdc_request.c:1441:mdc_read_page()) juliet-MDT0000-mdc-ffff99a3723aa800: [0x200001b3e:0x5f66:0x0] lock enqueue fails: rc = -4
      Jan 16 21:30:41 juliet2 kernel: LustreError: 11-0: juliet-OST002a-osc-ffff99a3723aa800: operation ldlm_enqueue to node 10.29.22.93@tcp failed: rc = -107
      Jan 16 21:30:41 juliet2 kernel: Lustre: juliet-OST002a-osc-ffff99a3723aa800: Connection to juliet-OST002a (at 10.29.22.93@tcp) was lost; in progress operations using this service will wait for recovery to complete
      Jan 16 21:30:41 juliet2 kernel: LustreError: 167-0: juliet-OST002a-osc-ffff99a3723aa800: This client was evicted by juliet-OST002a; in progress operations using this service will fail.
      Jan 16 21:30:41 juliet2 kernel: Lustre: 4193:0:(llite_lib.c:2762:ll_dirty_page_discard_warn()) juliet: dirty page discard: 10.29.22.90@tcp:/juliet/fid: [0x20002dd8a:0x16daa:0x0]/ may get corrupted (rc -108)
      Jan 16 21:30:41 juliet2 kernel: Lustre: 4191:0:(llite_lib.c:2762:ll_dirty_page_discard_warn()) juliet: dirty page discard: 10.29.22.90@tcp:/juliet/fid: [0x20002dd8a:0x16cb1:0x0]/ may get corrupted (rc -108)
      

      OSS:

      Jan 16 06:17:54 joss1 kernel: LustreError: 6496:0:(events.c:455:server_bulk_callback()) event type 3, status -5, desc ffff92ef5dbb3000
      Jan 16 06:17:54 joss1 kernel: LustreError: 16260:0:(ldlm_lib.c:3363:target_bulk_io()) @@@ network error on bulk WRITE  req@ffff92f3dfcbb850 x1760556572171776/t0(0) o4->bd9b8fe9-b80f-7114-7b35-663a8e9d48db@10.29.22.97@tcp:446/0 lens 488/448 e 0 to 0 dl 1673867911 ref 1 fl Interpret:/0/0 rc 0/0
      Jan 16 06:17:54 joss1 kernel: Lustre: juliet-OST0009: Client bd9b8fe9-b80f-7114-7b35-663a8e9d48db (at 10.29.22.97@tcp) reconnecting
      Jan 16 06:17:54 joss1 kernel: Lustre: juliet-OST0009: Connection restored to 3d01cce1-cfce-5103-0db6-32c1aa8f728c (at 10.29.22.97@tcp)
      Jan 16 06:17:54 joss1 kernel: Lustre: juliet-OST0009: Bulk IO write error with bd9b8fe9-b80f-7114-7b35-663a8e9d48db (at 10.29.22.97@tcp), client will retry: rc = -110
      Jan 16 06:17:54 joss1 kernel: Lustre: Skipped 1 previous similar message
      Jan 16 06:17:54 joss1 kernel: LustreError: 16218:0:(ldlm_lib.c:3357:target_bulk_io()) @@@ Reconnect on bulk WRITE  req@ffff92eb76e54050 x1760556572184448/t0(0) o4->bd9b8fe9-b80f-7114-7b35-663a8e9d48db@10.29.22.97@tcp:452/0 lens 488/448 e 0 to 0 dl 1673867917 ref 1 fl Interpret:/0/0 rc 0/0
      

      MDS:

      Jan 16 19:52:10 jmds1 kernel: LustreError: 47609:0:(ldlm_lib.c:3357:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff995a6c544850 x1760579715652736/t0(0) o37->bd9b8fe9-b80f-7114-7b35-663a8e9d48db@10.29.22.97@tcp:220/0 lens 448/440 e 1 to 0 dl 1673916760 ref 1 fl Interpret:/0/0 rc 0/0
      Jan 19 12:11:29 jmds1 kernel: LustreError: 15481:0:(mgs_handler.c:282:mgs_revoke_lock()) MGS: can't take cfg lock for 0x736d61726170/0x3 : rc = -11
      

      Is it possible to give us an idea of what these errors might indicate? e.g., network issues, misconfiguration, load etc, so we can narrow down the focus of investigation. Let us know what extra details (logs, cluster settings) you might need if further information is needed.

      Attachments

        1. jmds1-messages.gz
          438 kB
        2. jmds1-messages-20230115.gz
          2.69 MB
        3. joss1-messages.gz
          415 kB
        4. joss1-messages-20230115.gz
          2.74 MB
        5. joss2-messages.gz
          443 kB
        6. joss2-messages-20230115.gz
          2.78 MB
        7. joss3-messages-20230115.gz
          2.74 MB
        8. joss4-messages-20230115.gz
          2.73 MB
        9. joss5-messages-20230115.gz
          2.73 MB
        10. joss6-messages-20230115.gz
          2.76 MB
        11. juliet1-messages.gz
          511 kB
        12. juliet1-messages-20230115.gz
          3.15 MB
        13. juliet1-messages-20230115 (1).gz
          3.15 MB
        14. juliet2-messages-20230115.gz
          1.79 MB
        15. juliet2-messages-20230115 (1).gz
          1.79 MB

        Issue Links

          Activity

            People

              cfaber Colin Faber
              dneg Dneg (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: