Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10356

CLIO simplification broke a direct IO sometimes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.11.0, Lustre 2.10.4
    • Lustre 2.10.1
    • None
    • RHEL7 / 2.10 ; RHEL7 / master; # ./diotest5 -b 65536 -i 1000 -f /mnt/lustre/file2; test taken from LTP suite (need set a full debug to easy hit for me).
    • 3
    • 9223372036854775807

    Description

      3.10 and some early kernels (RHEL6 also have it likely).
      have a buffered write failback if direct io can failed after some offset.

                      written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
                                                              ppos, count, ocount);
                      /*
                       * If the write stopped short of completing, fall back to
                       * buffered writes.  Some filesystems do this for writes to
                       * holes, for example.  For DAX files, a buffered write will
                       * not succeed (even if it did, DAX does not handle dirty
                       * page-cache pages correctly).
                       */
                      if (written < 0 || written == count || IS_DAX(inode))
                              goto out;
                      pos += written;
                      count -= written;
                      written_buffered = generic_file_buffered_write(iocb, iov,
                                                      nr_segs, pos, ppos, count,
                                                      written);
                      /*
      

      it caused a situation when buffered io will be send long after direct io finished for this segment

      00000080:00200000:1.0:1512485492.000543:0:6427:0:(rw26.c:496:ll_direct_IO()) VFS Op:inode=[0x200000407:0x15:0x0](ffff88007a96b690), size=65536 (max 1426063360), offset=393216=60000  pages 16 (max 348160)
      00000080:00008000:1.0:1512485492.003959:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 1.
      00000080:00008000:1.0:1512485492.003984:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 2.
      00000080:00008000:1.0:1512485492.004008:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 3.
      00000080:00008000:1.0:1512485492.004031:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 4.
      00000080:00008000:1.0:1512485492.004054:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 5.
      00000080:00008000:1.0:1512485492.004077:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 6.
      00000080:00008000:1.0:1512485492.004147:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 7.
      00000080:00008000:1.0:1512485492.004172:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 8.
      00000080:00008000:1.0:1512485492.004195:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 9.
      00000080:00008000:1.0:1512485492.004218:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 10.
      00000080:00008000:1.0:1512485492.004241:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 11.
      00000080:00008000:1.0:1512485492.004264:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 12.
      00000080:00008000:1.0:1512485492.004287:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 13.
      00000080:00008000:1.0:1512485492.004311:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 14.
      00000080:00008000:1.0:1512485492.004327:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 15.
      00000080:00008000:1.0:1512485492.004343:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 16.
      00000080:00008000:1.0:1512485492.004491:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 17.
      00000080:00008000:1.0:1512485492.004506:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 18.
      00000080:00008000:1.0:1512485492.004522:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 19.
      00000080:00008000:1.0:1512485492.004537:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 20.
      00000080:00008000:1.0:1512485492.004553:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 21.
      00000080:00008000:1.0:1512485492.004568:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 22.
      00000080:00008000:1.0:1512485492.004583:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 23.
      00000080:00008000:1.0:1512485492.004599:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 24.
      00000080:00008000:1.0:1512485492.004614:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 25.
      00000080:00008000:1.0:1512485492.004629:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 26.
      00000080:00008000:1.0:1512485492.004653:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 27.
      00000080:00008000:1.0:1512485492.004676:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 28.
      00000080:00008000:1.0:1512485492.004699:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 29.
      00000080:00008000:1.0:1512485492.004722:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 30.
      00000080:00008000:1.0:1512485492.004745:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 31.
      00000080:00008000:1.0:1512485492.004767:0:6427:0:(rw26.c:770:ll_write_end()) queued page: 32.
      00000080:00200000:1.0:1512485492.004969:0:6427:0:(rw26.c:496:ll_direct_IO()) VFS Op:inode=[0x200000407:0x15:0x0](ffff88007a96b690), size=65536 (max 1426063360), offset=589824=90000, pages 16 (max 348160)
      00000080:00200000:1.0:1512485492.007839:0:6427:0:(rw26.c:496:ll_direct_IO()) VFS Op:inode=[0x200000407:0x15:0x0](ffff88007a96b690), size=65536 (max 1426063360), offset=655360=a0000, pages 16 (max 348160)
      00000080:00200000:1.0:1512485492.010532:0:6427:0:(rw26.c:496:ll_direct_IO()) VFS Op:inode=[0x200000407:0x15:0x0](ffff88007a96b690), size=65536 (max 1426063360), offset=720896=b0000, pages 16 (max 348160)
      00000080:00200000:1.0:1512485492.013680:0:6427:0:(rw26.c:496:ll_direct_IO()) VFS Op:inode=[0x200000407:0x15:0x0](ffff88007a96b690), size=65536 (max 1426063360), offset=786432=c0000, pages 16 (max 348160)
      00000080:00200000:1.0:1512485492.016534:0:6427:0:(rw26.c:496:ll_direct_IO()) VFS Op:inode=[0x200000407:0x15:0x0](ffff88007a96b690), size=65536 (max 1426063360), offset=851968=d0000, pages 16 (max 348160)
      00000080:00000001:1.0:1512485492.024648:0:6427:0:(file.c:3093:cl_sync_file_range()) Process entered
      00000080:00200000:1.0:1512485492.024686:0:6427:0:(vvp_io.c:943:vvp_io_write_commit()) commit ffff8800792bd438/ffff88003cbd14a0 async pages: 128, from 0, to 4096
      

      it caused a two problems.
      1) don't optimal RPC generated for buffered part, DIO segment was send and committed to the OST while buffered part still on client cache and send just as part of cl_sync_file_range, so extra seek needs.
      2) data corruption potentially as first page in this queue isn't same as expected offset for this operation.

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: