Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-521

lnet-selftest add_test failure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0, Lustre 2.4.0
    • Lustre 2.1.0, Lustre 2.3.0
    • None
    • RHEL6.1 between a PPC64 node and an x86_64 node, QDR infiniband, o2iblnd
    • 3
    • 4453

    Description

      Running the simple lnet-selftest script below, the "lst add_test" line fails with:

      add test RPC failed on 12345-172.20.203.24@o2ib1: Unknown error 18446744073709551506
      

      I am the script on the "server1" node, which is an x86_64 architecture RHEL6.1 system. The "ion" node is a ppc64 architecture with RHEL6.1. Note that ppc64 has a 64k page size now by default.

      server1 has this message on the console:

      LustreError: 21942:0:(lib-move.c:110:lnet_try_match_md()) Matching packet from 12345-172.20.203.24@o2ib1, match 11222407321460450620 length 65536 too big: 4096 left, 4096 allowed
      

      and the ion has this on the console:

      LustreError: 14210:0:(framework.c:1298:sfw_bulk_ready()) Bulk transfer failed for RPC: service test service, peer 12345-172.20.250.1@o2ib1, status -61
      

      I have attached lustre kernel logs with "+ net rpctrace" added.

      Script:

      lst new_session read/write
      lst add_group ion 172.20.203.24@o2ib1
      lst add_group server1 172.20.250.1@o2ib1
      
      lst add_batch bulk_rw
      lst add_test --batch bulk_rw --concurrency 16 --from ion --to server1 brw write size=1M
      lst run bulk_rw
      lst stat ion & sleep 30; kill $!
      lst end_session
      

      Attachments

        1. server1.log
          68 kB
        2. ion.log
          18 kB

        Activity

          People

            green Oleg Drokin
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: