[LU-6585] Virtual block device (lloop) Created: 08/May/15  Updated: 19/Jun/18  Resolved: 10/Jan/17

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: New Feature Priority: Minor
Reporter: Andreas Dilger Assignee: James A Simmons
Resolution: Won't Do Votes: 0
Labels: None

Issue Links:
Related
is related to LU-2706 blockdev_attach fails Closed
is related to LU-4000 Fix build failure on ppc64 w/ 64k pages Closed
is related to LU-2707 blockdev_attach can trigger LBUG Closed
is related to LU-3481 To remove CPT_TRANSIENT completely Resolved
Rank (Obsolete): 9223372036854775807

 Description   

Tracking bug for fixing the Lustre lloop driver. There are a number of improvements to be made internally to better integrate with the loop driver in the upstream kernel, which will allow removal of a lot of code that is just copied directly from the existing loop.c file.

While most applications deal with files, in a number of cases it is desirable to export a block device interface on a client in an efficient manner. These include for making loopback images for VM hosting, containers for very small files, swap, etc. A prototype block device was created for Lustre, based on the Linux loop.c driver, but was never completed and has become outdated as kernel APIs have evolved. The goal of this project is to update or rewrite the Lustre lloop driver so that it can be used for high-performance block device access in a reliable manner.

A further goal would be to investigate and resolve deadlocks in the lloop IO path by using preallocation or memory pools to avoid allocation under memory pressure. This could be used for swapping on the client, which is useful on HPC systems where the clients do not have any disks. When running on an RDMA network (which is typical for Lustre) the space for replies is reserved in advance, so no memory allocation is needed to receive replies from the server, unlike with TCP-based networks.

  • Salvage/replace existing prototype block device driver
  • High performance loop driver for Lustre files
  • Avoid memory allocation deadlocks under load
  • Bypass kernel VFS for efficient network IO
  • Stretch Goal: swap on Lustre on RDMA network


 Comments   
Comment by Robert Read (Inactive) [ 08/May/15 ]

Another interesting use case would be to combine multiple devices with AUFS or OverlayFS to provide a COW-style filesystem, similar to what Docker does. The lower, readonly device could be a common, shared filesystem image, and upper layer could be a writable, private filesystem stored on local storage or perhaps even on Lustre as well.

Comment by Gabriele Paciucci (Inactive) [ 09/May/15 ]

In the Life Science many applications are not able to run in parallel across different nodes on the same data (especially in the genomics sector). These application can take benefit of local NVM/SSD devices, but the space available is limited.
I measured the performance of a very recent PCIe NVM devices versus Lustre versus a Loopback devices hosted on Lustre using compile bench these are the results:

LUSTRE

[cwuser2@cw-1-00 compilebench-0.6]$ ./compilebench -D /mnt/lustrefs/cwuser2/ -i 2 -r 2 --makej -n
using working directory /mnt/lustrefs/cwuser2/, 2 intial dirs 2 runs
native unpatched native-0 222MB in 69.37 seconds (3.21 MB/s)
native patched native-0 109MB in 20.61 seconds (5.32 MB/s)
native patched compiled native-0 691MB in 10.98 seconds (62.99 MB/s)
create dir kernel-0 222MB in 70.10 seconds (3.17 MB/s)
create dir kernel-1 222MB in 69.68 seconds (3.19 MB/s)
compile dir kernel-1 680MB in 12.31 seconds (55.29 MB/s)
compile dir kernel-0 680MB in 12.19 seconds (55.84 MB/s)
read dir kernel-1 in 30.01 30.09 MB/s
read dir kernel-0 in 30.09 30.01 MB/s
read dir kernel-1 in 22.83 39.55 MB/s
delete kernel-1 in 14.75 seconds
delete kernel-0 in 15.10 seconds

INTEL NVM

[cwuser2@cw-1-00 compilebench-0.6]$ ./compilebench -D /mnt/intel-nvm/ -i 2 -r 2 --makej -n
using working directory /mnt/intel-nvm/, 2 intial dirs 2 runs
native unpatched native-0 222MB in 0.83 seconds (267.92 MB/s)
native patched native-0 109MB in 0.30 seconds (365.57 MB/s)
native patched compiled native-0 691MB in 0.66 seconds (1047.87 MB/s)
create dir kernel-0 222MB in 0.82 seconds (271.19 MB/s)
create dir kernel-1 222MB in 0.98 seconds (226.91 MB/s)
compile dir kernel-1 680MB in 0.68 seconds (1000.93 MB/s)
compile dir kernel-0 680MB in 0.68 seconds (1000.93 MB/s)
read dir kernel-1 in 0.48 1881.27 MB/s
read dir kernel-0 in 0.47 1921.29 MB/s
read dir kernel-1 in 0.43 2100.02 MB/s
delete kernel-1 in 0.55 seconds
delete kernel-0 in 0.54 seconds

LUSTRE LOOPBACK

[cwuser2@cw-1-00 compilebench-0.6]$ ./compilebench -D /mnt/lustre-loopback/ -i 2 -r 2 --makej -n
using working directory /mnt/lustre-loopback/, 2 intial dirs 2 runs
native unpatched native-0 222MB in 0.70 seconds (317.68 MB/s)
native patched native-0 109MB in 0.23 seconds (476.83 MB/s)
native patched compiled native-0 691MB in 0.44 seconds (1571.81 MB/s)
create dir kernel-0 222MB in 0.68 seconds (327.02 MB/s)
create dir kernel-1 222MB in 0.68 seconds (327.02 MB/s)
compile dir kernel-1 680MB in 0.46 seconds (1479.64 MB/s)
compile dir kernel-0 680MB in 0.47 seconds (1448.16 MB/s)
read dir kernel-1 in 0.45 2006.69 MB/s
read dir kernel-0 in 0.46 1963.06 MB/s
read dir kernel-1 in 0.43 2100.02 MB/s
delete kernel-1 in 0.43 seconds
delete kernel-0 in 0.43 seconds
Comment by James A Simmons [ 10/Jan/17 ]

The llite_lloop device is no longe supported so we can close this ticket.

Comment by Jinshan Xiong [ 12/Mar/18 ]

We probably have to bring llite_loop driver back because of some obvious drawback in kernel loop back device. Even though kernel's loop back device already has direct IO support but there seems no way to increase the size of I/O.

Comment by James A Simmons [ 16/Mar/18 ]

Once of the main reason for deleting llite_loop was it sucked so bad for performance compared to upstream. Before we starting reinventing the wheel we should find out why this limitation exist. Also I strongly suggest that we contribute to the linux kernel to improve the loop back device to earn credit with the kernel community. Plus it save us with the cost of maintaince in the long term.

Comment by Jinshan Xiong [ 16/Mar/18 ]

what mode did kernel loop driver use, direct or cached IO? I will appreciate if you can share any test data(I know it was long time ago).

Comment by James A Simmons [ 20/Mar/18 ]

I did both but I don't have those numbers anymore. I sent the results to Andreas so me might have that email.

Generated at Sat Feb 10 02:01:29 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.