Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.8.0
-
3
-
9223372036854775807
Description
When -EDQUOT happens in ll_write_begin or ll_write_end, write(2) may return 0 with no errno. This is because of the implementation of generic_perform_write().
static ssize_t generic_perform_write( ... ) { ... do { ... status = a_ops->write_begin( ... ); if (unlikely(status)) break; ... status = a_ops->write_end( ... ); if (unlikely(status)) break; copied = status; ... written += copied; ... } while (iov_iter_count(i)); return written ? written : status; }
when "written" already isn't zero and EDQUOT happened in ll_write_begin() or ll_write_end(), generic_perform_write() returns "written" bytes and ignores "status". So vvp_io_write_start() has no way to know the error.
We can confirm the issue using quota function like following
bash-4.1$ id uid=60000(quota_usr) gid=60000(quota_usr) groups=60000(quota_usr) bash-4.1$ lfs quota /mnt/lustre Disk quotas for user quota_usr (uid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 1048580* 58617 59617 - 10 0 0 - Disk quotas for group quota_usr (gid 60000): Filesystem kbytes quota limit grace files quota limit grace /mnt/lustre 1048580 0 0 - 10 0 0 - bash-4.1$ dd if=/dev/zero of=/mnt/lustre/quota bs=4M count=1 (<--- PTLRPC_MAX_BRW_PAGES) dd: writing `/mnt/lustre/quota': No space left on device 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.028993 s, 0.0 kB/s
And the following is a strace log.
strace dd if=/dev/zero of=/mnt/lustre/quota bs=4M count=1 ... read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 4194304 write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 0 ... write(2, ": No space left on device", 25: No space left on device) = 25 write(2, "\n", 1 ) = 1 close(0) = 0 close(1) = 0 write(2, "1+0 records in\n0+0 records out\n", 311+0 records in 0+0 records out ) = 31 write(2, "0 bytes (0 B) copied", 200 bytes (0 B) copied) = 20 write(2, ", 0.0365546 s, 0.0 kB/s\n", 24, 0.0365546 s, 0.0 kB/s ) = 24 close(2) = 0 exit_group(1) = ?
write(2) should've return -1 with EDQUOT but, as you can see, it actually returned 0 with no errno.
(ENOSPC was set in dd. check the source)