Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9939

Bad checksums from clients using SR-IOV

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      A Lustre client on a KVM hypervisor using SR-IOV for IB has started to generate the following errors:

      OSS (oak-io1-s1 10.0.2.101@o2ib5):

      Aug 31 11:27:04 oak-io1-s1 kernel: LustreError: 168-f: BAD WRITE CHECKSUM: oak-OST001a from 12345-10.0.2.225@o2ib5 inode [0x200002f84:0x6b6a:0x0] object 0x0:4413301 extent [726925312-727973887]: client csum 4ecd330, server csum 5610e5e5
      
      
      

      The second OSS in production also has the same errors.

      SR-IOV based client (oak-gw06 10.0.2.225@o2ib5):

      Aug 31 11:27:05 oak-gw06 kernel: LustreError: 132-0: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.0.2.101@o2ib5 inode [0x200002f84:0x6b6a:0x0] object 0x0:4413301 extent [726925312-727973887]
      
      
      

      The client also gets some read checksum errors later:

      Aug 31 11:37:42 oak-gw06 kernel: LustreError: 133-1: oak-OST001a-osc-ffff88041b99c000: BAD READ CHECKSUM: from 10.0.2.101@o2ib5 inode [0x0:0x0:0x0] object 0x0:4413301 extent [1581252608-1582301183]
      
      
      

      I will attach kernel logs of both.

      In this particular case, the client is a Globus endpoint, using Lustre a the backend. This is actually the second time we've seen this, indeed the same issue was seen on another VM running rsnapshot jobs. Rebooting the impacted VM does fix the issue.

      Are you aware of such issues when using SR-IOV? Any idea how we could troubleshoot this?

      Thanks!
      Stephane Thiell

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: