RHEL 6.5, kernel 2.6.32-431.29.2.el6.Bull.58.x86_64, Lustre 2.5.3 (w/ and w/o Bullpatches)
RHEL 6.6, kernel 2.6.32-504.3.3.el6.Bull.68.x86_64, Lustre 2.5.3 w/ Bullpatches
We are currently hitting an issue on several Lustre filesystems. When we do POSIX IO to/from the Lustre FS, we have a lot of short write/read.
This is easily reproductible with IOR, with one client, one thread, using a transferSize > lustreStripeSize and a lustreStripeCount > 1. However, we were only able to reproduce on 4 of 7 Lustre filesystems.
Cluster 1: 480 OST / default stripecount=2 stripesize=1M
Cluster 2: 144 OST / default stripecount=2 stripesize=1M
It looks realy close to
LU-6389. We tried its reproducer with success.
I did some tests with debug vfstrace, dlmtrace and inode. Logs will be added to this ticket later. (customer is a blacksite)
- clients = 1 (1 per node)
- xfersize = 2 MiB
- blocksize = 2 GiB
- aggregate filesize = 2 GiB
- Lustre stripe size = 1 MiB
- Lustre stripe count = 2
WARNING: Task 0, partial write(), 1048576 of 2097152 bytes at offset 987758592
The short write seems to occur when we lose the layout lock for the file while writing the first stripe. Afterwards, the IO can't continue with the second stripe and the write end.
->> seek to offset 987758592
->> io range 987758592, 989855744 = 2M (transfer size)
->> stripe 942, chunk 987758592, 988807168 = 1M (stripe size)
->> 942 * 1M = 987758592, everything is fine
->> vvp_io_write_start 987758592, 988807168
->> n * vvp_io_commit_write() commits the 4k pages
During the commit, we can observe ldlm_cli_cancel_local(), followed by the message "vvp_conf_set() [0x298c28d7a:0xac:0x0]: losing layout lock". Then come the next stripe.
->> stripe 943, chunk 988807168, 989855744 = 1M (stripe size)
->> no vvp_io_write_start() after because of lock cancellation