Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18263

several applications failed as POSIX_Xfer: Assertion `rc >= 0' failed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • Lustre 2.16.0
    • None
    • lustre tag=2.15.91
    • 3
    • 9223372036854775807

      There are several application errors(ior, mdtest, fio) failed as similar reason, sth like

      + srun ior -o=/lustre/sfa18k03/client/ior-ssf/1971/ior-ssf -a=POSIX -E -r -w -D=451 -t=8M -b=8G
      IOR-4.0.0: MPI Coordinated Test of Parallel I/O
      Began               : Tue Sep 24 14:10:18 2024
      Command line        : /usr/bin/ior -o=/lustre/sfa18k03/client/ior-ssf/1971/ior-ssf -a=POSIX -E -r -w -D=451 -t=8M -b=8G
      Machine             : Linux wr-es-25.wr-es.datadirectnet.com
      TestID              : 0
      StartTime           : Tue Sep 24 14:10:18 2024
      Path                : /lustre/sfa18k03/client/ior-ssf/1971/ior-ssf
      FS                  : 2055.5 TiB   Used FS: 71.1%   Inodes: 2048.0 Mi   Used Inodes: 15.0%
      
      Options: 
      api                 : POSIX
      apiVersion          : 
      test filename       : /lustre/sfa18k03/client/ior-ssf/1971/ior-ssf
      access              : single-shared-file
      type                : independent
      segments            : 1
      ordering in a file  : sequential
      ordering inter file : no tasks offsets
      nodes               : 5
      tasks               : 100
      clients per node    : 20
      repetitions         : 1
      xfersize            : 8 MiB
      blocksize           : 8 GiB
      aggregate filesize  : 800 GiB
      stonewallingTime    : 451
      stoneWallingWearOut : 0
      
      Results: 
      WARNING: task 34, partial write(), 1048576 of 8388608 bytes at offset 292183605248
      
      WARNING: write(22, 0x7f7e88554000, 7340032) failed Input/output error
      WARNING: task 34, partial write(), -1 of 7340032 bytes at offset 292184653824
      
      ior: aiori-POSIX.c:769: POSIX_Xfer: Assertion `rc >= 0' failed.
      [wr-es-27:970634] *** Process received signal ***
      [wr-es-27:970634] Signal: Aborted (6)
      [wr-es-27:970634] Signal code:  (-6)
      [wr-es-27:970634] [ 0] /usr/lib64/libpthread.so.0(+0x12cf0)[0x7f7e8bce3cf0]
      [wr-es-27:970634] [ 1] /usr/lib64/libc.so.6(gsignal+0x10f)[0x7f7e8b95aacf]
      [wr-es-27:970634] [ 2] /usr/lib64/libc.so.6(abort+0x127)[0x7f7e8b92dea5]
      [wr-es-27:970634] [ 3] /usr/lib64/libc.so.6(+0x21d79)[0x7f7e8b92dd79]
      [wr-es-27:970634] [ 4] /usr/lib64/libc.so.6(+0x47426)[0x7f7e8b953426]
      [wr-es-27:970634] [ 5] /usr/bin/ior(+0x8c77)[0x55df64188c77]
      [wr-es-27:970634] [ 6] /usr/bin/ior(+0xc0f3)[0x55df6418c0f3]
      [wr-es-27:970634] [ 7] /usr/bin/ior(+0xea6d)[0x55df6418ea6d]
      [wr-es-27:970634] [ 8] /usr/bin/ior(+0xfae3)[0x55df6418fae3]
      [wr-es-27:970634] [ 9] /usr/bin/ior(+0x11158)[0x55df64191158]
      [wr-es-27:970634] [10] /usr/lib64/libc.so.6(__libc_start_main+0xe5)[0x7f7e8b946d85]
      [wr-es-27:970634] [11] /usr/bin/ior(+0x47ee)[0x55df641847ee]
      [wr-es-27:970634] *** End of error message ***
      WARNING: task 33, partial write(), 5242880 of 8388608 bytes at offset 284382199808
      
      WARNING: write(23, 0x7ff4cecff000, 3145728) failed Input/output error
      WARNING: task 33, partial write(), -1 of 3145728 bytes at offset 284387442688
      
      ior: aiori-POSIX.c:769: POSIX_Xfer: Assertion `rc >= 0' failed.
      [wr-es-27:970633] *** Process received signal ***
      

      There are lustre logs from server vms, I can upload them if needed.

            wc-triage WC Triage
            sarah Sarah Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: