Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10236

while running fio , file is getting corrupt under /mnt/lustre/xxx

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.11.0
    • None
    • Kraken cluster,
      2 OSS, 8 OSTs
      2 MDS, 4 MDTs
      1 client

      lustre version - 2.10.55 + dom
      branch: lustre-reviews
      build - 52057
    • 3
    • 9223372036854775807

    Description

      While running FIO on above mentioned setup using command mentioned below, files in directory /mnt/lustre/xxx got corrupt. But when changing the parameter nrfiles=256 it works fine.

      fio --name=smallio --ioengine=posixaio --iodepth=32 --directory=/mnt/lustre/dom3 --nrfiles=512 --openfiles=10000  --numjobs=8 --filesize=64k --lockfile=readwrite --bs=4k --rw=ra
      ndread --buffered=1 --bs_unaligned=1 --file_service_type=random --randrepeat=0   --norandommap --group_reporting=1 --loops=4
      
      [root@kapollo04 lustre]# rm -rf dom3
      rm: cannot remove ‘dom3’: Directory not empty
      

      client dmesg

      [227470.685094] LustreError: 15069:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.686839] LustreError: 15067:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.688803] LustreError: 15069:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.690502] LustreError: 15070:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.692567] LustreError: 15068:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.694514] LustreError: 15067:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.696363] LustreError: 15070:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.698380] LustreError: 15069:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.700589] LustreError: 15068:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.702449] LustreError: 15067:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.704257] LustreError: 15068:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.706338] LustreError: 15069:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.708125] LustreError: 15067:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.710179] LustreError: 15069:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227470.712075] LustreError: 15068:0:(events.c:199:client_bulk_callback()) event type 2, status -90, desc ffff880eaafd7c00
      [227471.546843] LustreError: 12768:0:(mdc_request.c:944:mdc_getpage()) lustre-MDT0000-mdc-ffff88105e0f6800: too many resend retries: rc = -5
      

      MDS dmesg

      [259415.913026] LustreError: 137-5: nvmefs-MDT0001_UUID: not available for connect from 192.168.213.233@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [259415.913667] LustreError: Skipped 71 previous similar messages
      [259502.137146] LustreError: 20014:0:(ldlm_lib.c:3208:target_bulk_io()) @@@ timeout on bulk READ after 100+0s  req@ffff881029e1f450 x1583747470242320/t0(0) o37->24b31bec-af52-1a41-a067-af1c7d84e837@192.168.213.218@o2ib:597/0 lens 568/440 e 3 to 0 dl 1510613657 ref 1 fl Interpret:/2/0 rc 0/0
      [260015.863227] LustreError: 137-5: nvmefs-MDT0000_UUID: not available for connect from 192.168.213.233@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [260015.863971] LustreError: Skipped 71 previous similar messages
      [260643.179888] LustreError: 137-5: nvmefs-MDT0000_UUID: not available for connect from 192.168.213.126@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
      [260643.180541] LustreError: Skipped 73 previous similar messages
      

      lustre version - 2.10.55 + dom
      branch: lustre-reviews
      build - 52057
      Which should be same as lustre-master build 3671

      This needs to be investigated.

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              standan Saurabh Tandan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: