Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15831

Lustre 2.15 client breaks DGXA100 MOFED

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When trying to run GPUDirect, it was found on install required software step:

      $ sudo ./mlnxofedinstall
      ...
      Installation passed successfully
      To load the new driver, run:
      /etc/init.d/openibd restart
      
      $ sudo /etc/init.d/openibd restart
      Unloading ib_uverbs [FAILED]
      rmmod: ERROR: Module ib_uverbs is in use by: nv_peer_mem
      
      $ sudo rmmod nv_peer_mem
      
      $ sudo /etc/init.d/openibd restart
      Unloading HCA driver:[ OK ]
      Loading Mellanox MLX5_IB HCA driver:  [FAILED]
      Loading Mellanox MLX5 HCA driver: [FAILED]
      Loading HCA driver and Access Layer:  [FAILED]
      Please run /usr/sbin/sysinfo-snapshot.py to collect the debug information
      and open an issue in the http://support.mellanox.com/SupportWeb/service_center/SelfService
      
      $ sudo modprobe nv_peer_mem
      modprobe: FATAL: Module nv_peer_mem not found in directory /lib/modules/5.4.0-109-generic
      
      $ sudo modprobe lustre
      modprobe: ERROR: could not insert 'lustre': Invalid argument

       

      $ sudo ./mlnxofedinstall
      
      Checking SW Requirements...Removing old packages...
      Installing new packages
      Installing ofed-scripts-5.5...
      Installing mlnx-tools-5.2.0...
      Installing mlnx-ofed-kernel-utils-5.5...
      Installing mlnx-ofed-kernel-dkms-5.5...Error: mlnx-ofed-kernel-dkms installation failed!
      Problem: mlx5_ib: module file: /lib/modules/5.4.0-105-generic/kernel/drivers/infiniband/hw/mlx5/mlx5_ib.ko, from package: linux-modules-extra-5.4.0-105-generic.
      Collecting debug info...
      See:
          /tmp/MLNX_OFED_LINUX.1302312.logs/mlnx-ofed-kernel-dkms.debinstall.log
      Removing newly installed packages...

      This prevents gds tests from running completely:

      =========================
       Platform verification error :
      nvidia-fs driver is not loadedSUCCESS
      FILESYSTEM VERSION CHECK:
      ofed_info:
      current version: MLNX_OFED_LINUX-5.5-1.0.3.2: (Supported)
      min version supported: MLNX_OFED_LINUX-4.6-1.0.1.1
      SUCCESS
      nvidia-fs driver is not loadedSUCCESS
      usage: gdscheck.py [-h] [-p] [-f FILE] [-v] [-V]GPUDirectStorage platform checkeroptional arguments:
        -h, --help  show this help message and exit
        -p          gds platform check
        -f FILE     gds file check
        -v          gds version checks
        -V          gds fs checks
      SUCCESS
      gdscheck.py python2 tests
      =========================
       Platform verification error :
      nvidia-fs driver is not loadedSUCCESS
      FILESYSTEM VERSION CHECK:
      ofed_info:
      current version: MLNX_OFED_LINUX-5.5-1.0.3.2: (Supported)
      min version supported: MLNX_OFED_LINUX-4.6-1.0.1.1
      SUCCESS
      nvidia-fs driver is not loadedSUCCESS
      usage: gdscheck.py [-h] [-p] [-f FILE] [-v] [-V]GPUDirectStorage platform checkeroptional arguments:
        -h, --help  show this help message and exit
        -p          gds platform check
        -f FILE     gds file check
        -v          gds version checks
        -V          gds fs checks
      SUCCESS
      gdscheck.py current running python tests
      =========================
       Platform verification error :
      nvidia-fs driver is not loadedSUCCESS
      FILESYSTEM VERSION CHECK:
      ofed_info:
      current version: MLNX_OFED_LINUX-5.5-1.0.3.2: (Supported)
      min version supported: MLNX_OFED_LINUX-4.6-1.0.1.1
      SUCCESS
      nvidia-fs driver is not loadedSUCCESS
      usage: gdscheck.py [-h] [-p] [-f FILE] [-v] [-V]GPUDirectStorage platform checkeroptional arguments:
        -h, --help  show this help message and exit
        -p          gds platform check
        -f FILE     gds file check
        -v          gds version checks
        -V          gds fs checks
      SUCCESS
      **************************************************
      gdscheck.py test results : 12 /  12 tests passed
      **************************************************
      Starting basic gdsio Tests
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 4k
      SUCCESS
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 3k
      SUCCESS
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 3k -o 1
      SUCCESS
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 2k
      SUCCESS
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 2k -o 1
      SUCCESS
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 1k
      SUCCESS
      /usr/local/gds/tools/gdsio -f  /data/sanity/tests//sparse1G -d 0 -f /data/sanity/tests//sparse1G -d 0 -s 128K -i 1k -o 1
      SUCCESS
      /usr/local/gds/tools/gdsio -V -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 1 -o 1
      Verifying data 
      SUCCESS
      /usr/local/gds/tools/gdsio -V -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 3 -k 1234 -o 1
      Verifying data 
      SUCCESS
      /usr/local/gds/tools/gdsio -V -D /data/sanity/tests// -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 1 -o 1
      Verifying data 
      SUCCESS
      /usr/local/gds/tools/gdsio -V -D /data/sanity/tests// -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 3 -k 1234 -o 1
      Verifying data 
      SUCCESS
      /usr/local/gds/tools/gdsio -D /data/sanity/tests// -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 3 -k 1234 -o 1 -F -R
      SUCCESS
      /usr/local/gds/tools/gdsio -V -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 1 -o 1 -b
      Verifying data 
      SUCCESS
      /usr/local/gds/tools/gdsio -V -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 8M:32M -x 0 -I 1 -o 1 -b
      Verifying data 
      SUCCESS
      /usr/local/gds/tools/gdsio -V -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 0 -o 1 -b
      SUCCESS
      /usr/local/gds/tools/gdsio -V -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 8M:32M -x 0 -I 0 -o 1 -b
      SUCCESS
      /usr/local/gds/tools/gdsio -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 32K:1024K:1K -x 0 -I 0 -o 0 -b
      SUCCESS
      /usr/local/gds/tools/gdsio -f /data/sanity/tests//sparse1G -d 0  -w 8 -s 1G -i 8M:32M -x 0 -I 0 -o 0 -b
      SUCCESS
      **************************************************
      gdiso tests : 18 /  18 tests passed
      **************************************************
      Starting Offset Tests
      TestCase:Read odd offset
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 616  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 616  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read odd gpu offsets 1, 2, 3, 4, 4K-1, 4K, 4K+1, 60K, 64K, 68K
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 4096 -o 0  -d 0 -t 1 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 4096 -o 0  -d 0 -t 1 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 1 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 2 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 3 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 4 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 4095 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 4096 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 4097 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 4097 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 61440 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 65536 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 69632 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 1 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 2 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 3 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 4 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 4095 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 4096 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 4097 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 61440 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 65536 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 69632 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read/write odd size - sync
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485761 -o 4096  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read/write odd size - async
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485761 -o 4096  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:odd offset and odd size - sync
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485748 -o 119  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:odd offset and odd size - async
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485748 -o 119  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read/write 1 byte from offset 0 (sync and async)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 1 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 1 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read/write 1 byte from offset 3 (sync and async)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 1 -o 3  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 1 -o 3  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read/write big file 10G (odd size) - sync
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 220201060 -o 4096  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read/write big file 10G (odd size) - async
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 220201060 -o 4096  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read beyond EOF (read 2G on a 1G file - async)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1G -n 1 -m 0 -s 209714688 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read beyond EOF (read 2G on a 1G file - sync)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1G -n 1 -m 1 -s 209714688 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read beyond EOF - odd size (read 2G on a 1G file - sync and async)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1G -n 1 -m 0 -s 209714689 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1G -n 1 -m 1 -s 209714689 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read beyond EOF odd offset (read 2G on a 1G file - sync and async)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1G -n 1 -m 1 -s 209714688 -o 616  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1G -n 1 -m 0 -s 209714688 -o 616  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read size beyond EOF (small file)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 1099 -o 1  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 1099 -o 1  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 1099 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 1099 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read just short of EOF
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 1000 -o 1  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 1000 -o 1  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 1000 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 1000 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read offset from EOF
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 999 -o 1024  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 999 -o 1024  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 1 -o 1024  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 1 -o 1024  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read offset beyond EOF (sync and async)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 1 -s 1 -o 1025  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparse1K -n 1 -m 0 -s 1 -o 1025  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read with odd gpu_offset
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read with odd gpu_offset and odd file offset
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 617  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 617  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read at 128k GPU page offset
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Read beyond 64k (odd gpu offset)
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 10485760 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 10485760 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Overwrite an existing file within EOF
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 52428800 -o 1  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 52428800 -o 1  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 52428805 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 52428805 -o 0  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:Offset beyond EOF writes
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 4096 -o 4099  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 4096 -o 4099  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:offset just short of EOF writes
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 3 -o 4094  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 3 -o 4094  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      TestCase:offset from EOF writes
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 0 -s 4096 -o 3  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      /usr/local/gds/tools/gdsio_verify  -f /data/sanity/tests//sparserandom_big -n 1 -m 1 -s 4096 -o 3  -d 0 -t 0 -p 0
      file register error: nvidia-fs driver is not loaded
      FAILED
      **************************************************
      File offset and GPU Buffer offset Tests : 0 /  73 tests passed
      **************************************************
      running cufile sample tests
      sample 1
      FAILED
      sample 2
      opening file /data/sanity/tests//sparse1G_sample2
      FAILED
      sample 3
      FAILED
      sample 4
      FAILED
      sample 5
      FAILED
      sample 6
      FAILED
      sample 7
      FAILED
      sample 8
      PASS: cufile success status:Success
      SUCCESS
      sample 14
      opening file /data/sanity/tests//sparse1G
      FAILED
      sample 15
      FAILED
      **************************************************
      cufile sample tests : 1 /  10 tests passed
      **************************************************
      Testing gdscp functionality
      /usr/local/gds/tools/gdscp /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G_copy 0 -v
      file register error: nvidia-fs driver is not loaded
      FAILED
      **************************************************
      gdscp tests : 0 /  1 tests passed
      **************************************************
      Testing Batch State Machine
      /usr/local/gds/tools//tests/cufile_batch_test_state_machine /data/sanity/tests//sparse1G 0 && pass || fail
      FAILED
      /usr/local/gds/tools//tests/cufile_batch_test_state_machine /data/sanity/tests//sparse1G 1 && pass || fail
      FAILED
      /usr/local/gds/tools//tests/cufile_batch_test_state_machine /data/sanity/tests//sparse1G 2 && pass || fail
      FAILED
      /usr/local/gds/tools//tests/cufile_batch_test_state_machine /data/sanity/tests//sparse1G 3 && pass || fail
      FAILED
      /usr/local/gds/tools//tests/cufile_batch_test_state_machine /data/sanity/tests//sparse1G 4 && pass || fail
      FAILED
      **************************************************
      Batch State Machine Tests : 0 /  5 tests passed
      **************************************************
      Performing cufile API tests
      /usr/local/gds/tools//api_tests/cufile_testbufregister 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 2
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 3
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 4
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 5
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 6
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 7
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 8
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 9
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufregister 10
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufderegister 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufderegister 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufderegister 2
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufderegister 3
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufderegister 4
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testbufderegister 5
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testclosefd /data/sanity/tests//sparse1G 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testclosefd /data/sanity/tests//sparse1G 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testclosefd /data/sanity/tests//sparse1G 2
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testclosefd /data/sanity/tests//sparse1G 3
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriver 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriver 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriver 2
      cufile driver close: nvidia-fs driver is not loaded
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testopenfd  /data/sanity/tests//sparse1G 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenfd  /data/sanity/tests//sparse1G 1
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testopenfd  /data/sanity/tests//sparse1G 2
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testopenfd  /data/sanity/tests//sparse1G 3
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testopenfd  /data/sanity/tests//sparse1G 4
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testopenfd  /data/sanity/tests//sparse1G 5
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_rw  /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G_VERIFY 0 
      FAILED
      /usr/local/gds/tools//api_tests/cufile_rwmanaged  /data/sanity/tests//sparse1G 0 
      FAILED
      /usr/local/gds/tools//api_tests/cufile_rw_unreg  /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G_VERIFY 0 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_rw_unreg  /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G_VERIFY 0 2
      FAILED
      /usr/local/gds/tools//api_tests/cufile_rw_unreg  /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G_VERIFY 0 3
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 2
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 3
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 4
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 5
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 6
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 7
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 8
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testopenflags /data/sanity/tests//sparse1G 9
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriverprops -p 8
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriverprops -b 8
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testdriverprops -d 8
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_testdriverprops -c 8
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriverprops -b 1024
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testdriverprops -d 1024
      FAILED
      /usr/local/gds/tools//api_tests/cufile_driver_close /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_driver_close /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G 1
      FAILED
      /usr/local/gds/tools//api_tests/cufile_driver_close /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G 2
      FAILED
      /usr/local/gds/tools//api_tests/cufile_io_race /data/sanity/tests//sparse1G 
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testvalidnvbuf  /data/sanity/tests//sparse1G 0 
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testvalidnvbuf  /data/sanity/tests//sparse1G 0 
      FAILED
      /usr/local/gds/tools//api_tests/cufile_driver_close /data/sanity/tests//sparse1G /data/sanity/tests//sparse1G 3
      FAILED
      /usr/local/gds/tools//api_tests/cufile_io_race /data/sanity/tests//sparse1G 
      FAILED
      /usr/local/gds/tools//api_tests/cufile_invalid_write /data/sanity/tests//sparse1G 0 0
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_invalid_write /data/sanity/tests//sparse1G 0 1
      SUCCESS
      /usr/local/gds/tools//api_tests/cufile_invalid_offsets /data/sanity/tests//sparse1G 0 0
      FAILED
      /usr/local/gds/tools//api_tests/cufile_testcudacontext_switch /data/sanity/tests//sparse_CTX_VERIFY 0
      FAILED
      End: nvidia-fs:
      GDS Version: 1.2.1.4 
      NVFS statistics(ver: 4.0)
      NVFS Driver(version: 2.11.0)
      Mellanox PeerDirect Supported: True
      IO stats: Disabled, peer IO stats: Disabled
      Logging level: infoActive Shadow-Buffer (MiB): 0
      Active Process: 0
      Reads                : err=0 io_state_err=0
      Sparse Reads                : n=0 io=0 holes=0 pages=0 
      Writes                : err=0 io_state_err=0 pg-cache=0 pg-cache-fail=0 pg-cache-eio=0
      Mmap                : n=0 ok=0 err=0 munmap=0
      Bar1-map            : n=0 ok=0 err=0 free=0 callbacks=0 active=0
      Error                : cpu-gpu-pages=0 sg-ext=0 dma-map=0 dma-ref=0
      Ops                : Read=0 Write=0 BatchIO=0
      **************************************************
      API Tests, : 10 /  63 tests passed
      **************************************************
      Testsuite : 41 / 182 tests passed
      done tests:Mon May 2 16:48:42 UTC 2022
      

      It seems that a patch https://review.whamcloud.com/#/c/45327/ needs to be applied to the master.

      With MLNX_OFED_LINUX-5.6-1.0.3.3-ubuntu20.04-x86_64 and MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64 the result is the same.

       

       

      Attachments

        1. mlnx_logs.txt
          14 kB
          Oleg Kulachenko

        Issue Links

          Activity

            People

              gtapase Gaurang Tapase
              okulachenko Oleg Kulachenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: