Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6732

Cannot pick up EDQUOT from ll_write_begin and ll_write_end

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.8.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      When -EDQUOT happens in ll_write_begin or ll_write_end, write(2) may return 0 with no errno. This is because of the implementation of generic_perform_write().

      static ssize_t generic_perform_write( ... )
      {
              ...
              do {
                      ...
                      status = a_ops->write_begin( ... );
                      if (unlikely(status))
                              break;
                      ...
                      status = a_ops->write_end( ... );
                      if (unlikely(status))
                              break;
                      copied = status;
                      ...
                      written += copied;
                      ...
              } while (iov_iter_count(i));
              return written ? written : status;
      }
      

      when "written" already isn't zero and EDQUOT happened in ll_write_begin() or ll_write_end(), generic_perform_write() returns "written" bytes and ignores "status". So vvp_io_write_start() has no way to know the error.

      We can confirm the issue using quota function like following

      bash-4.1$ id
      uid=60000(quota_usr) gid=60000(quota_usr) groups=60000(quota_usr)
      bash-4.1$ lfs quota /mnt/lustre
      Disk quotas for user quota_usr (uid 60000):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
          /mnt/lustre 1048580*  58617   59617       -      10       0       0       -
      Disk quotas for group quota_usr (gid 60000):
           Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
          /mnt/lustre 1048580       0       0       -      10       0       0       -
      bash-4.1$ dd if=/dev/zero of=/mnt/lustre/quota bs=4M count=1 (<--- PTLRPC_MAX_BRW_PAGES)
      dd: writing `/mnt/lustre/quota': No space left on device
      1+0 records in
      0+0 records out
      0 bytes (0 B) copied, 0.028993 s, 0.0 kB/s
      

      And the following is a strace log.

      strace dd if=/dev/zero of=/mnt/lustre/quota bs=4M count=1
      ...
      read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 4194304
      write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 0
      ...
      write(2, ": No space left on device", 25: No space left on device) = 25
      write(2, "\n", 1
      )                       = 1
      close(0)                                = 0
      close(1)                                = 0
      write(2, "1+0 records in\n0+0 records out\n", 311+0 records in
      0+0 records out
      ) = 31
      write(2, "0 bytes (0 B) copied", 200 bytes (0 B) copied)    = 20
      write(2, ", 0.0365546 s, 0.0 kB/s\n", 24, 0.0365546 s, 0.0 kB/s
      ) = 24
      close(2)                                = 0
      exit_group(1)                           = ?
      

      write(2) should've return -1 with EDQUOT but, as you can see, it actually returned 0 with no errno.
      (ENOSPC was set in dd. check the source)

      Attachments

        Activity

          People

            wc-triage WC Triage
            nozaki Hiroya Nozaki (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: