[LU-682] optimization for Lustre-tar on completely sparse files. Created: 12/Sep/11  Updated: 03/Jan/13  Resolved: 03/Jan/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Richard Henwood (Inactive) Assignee: Richard Henwood (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Related
is related to LU-417 block usage is reported as zero by st... Resolved
Rank (Obsolete): 9728

 Description   

Kit Westneat commented:

"Older versions of tar have to read in the entire file to figure out
what parts are sparse. Newer versions should skip that if the # of
blocks are 0, but I'm not sure if that made it into lustre-tar yet.

Here's the patch:
http://lists.gnu.org/archive/html/bug-tar/2010-08/msg00043.html
"

This hasn't made it into lustre-tar yet, and may be worth looking into.



 Comments   
Comment by Richard Henwood (Inactive) [ 20/Oct/11 ]

RHEL5 tar is now being built with this patch.

A patch for RHEL6 tar will ideally be received upstream.

Comment by Richard Henwood (Inactive) [ 20/Oct/11 ]

The patch looks something like this:

--- tar-1.19/orig/src/sparse.c
+++ tar-1.19/src/sparse.c
@@ -216,15 +216,17 @@
   struct tar_stat_info *st = file->stat_info;
   int fd = file->fd;
   char buffer[BLOCKSIZE];
-  size_t count;
+  size_t count = 0;
   off_t offset = 0;
   struct sp_array sp = {0, 0};
 
-  if (!lseek_or_error (file, 0))
-    return false;
-
   st->archive_file_size = 0;
   
+  if (ST_NBLOCKS (st->stat) == 0)
+    offset = st->stat.st_size;
+  else
+    {
+
   if (!tar_sparse_scan (file, scan_begin, NULL))
     return false;
 
@@ -254,6 +256,7 @@
 
       offset += count;
     }
+  }
 
   if (sp.numbytes == 0)
     sp.offset = offset;
Comment by Richard Henwood (Inactive) [ 29/Nov/11 ]

Use Case

A administrator wishes to perform a file-level back up of a MDT.

A MDT on a production file system may have many millions of files. Each of these file will be completely sparse (ST_NBLOCKS is zero). Tar without this patch will scan large completely sparse files even though the blocks are zero. Scanning large, completely sparse files is time-consuming.

Comment by Richard Henwood (Inactive) [ 29/Nov/11 ]

Andreas adds: "This is useful for 1.8.x MDTs right now, and once Fan Yong has implemented OI Scrub it will also be useful for 2.x MDTs."

Comment by Richard Henwood (Inactive) [ 29/Nov/11 ]

It seems that this patch is up-stream in Gnu tar, starting at version: 1.24

Comment by Richard Henwood (Inactive) [ 07/Dec/11 ]

I've been told that RHEL 6.3 will include the completely sparse file optimization patch.

Users on 6.0, 6.1 and 6.2 will be able to use tar from 6.3 when it is available.

Comment by Andreas Dilger [ 13/Dec/11 ]

This is available in the RHEL5 patched lustre-tar, and will be available in RHEL6.3 as well.

Comment by Nathan Rutman [ 06/Jun/12 ]

Doe this mean WC's tar should be replaced by mainstream tar 1.24?

Comment by Nathan Rutman [ 06/Jun/12 ]

These patches are in RHEL 6.3 beta tar 1.23-7

  1. optimization for -c --sparse for completely sparse files (#760665)
    Patch12: tar-1.23-optimize-packing-entirely-sparse-files.patch
  2. fix for filename corruption when --sparse and --posix options are used. (#656834)
    Patch9: tar-1.23-long-name-corruption.patch
Comment by Richard Henwood (Inactive) [ 07/Jun/12 ]

I haven't seen the beta, but I have had this confirmed by Red Hat - tar will be 1.23+patches for sparse files (among other I assume).

Once 6.3 is available, then you will have more choices for RHEL6 users: use vanilla Red Hat tar from 6.3, use WC tar, or roll your own gnu tar >1.24.

WC tar was created specifically to target RHEL5 with the requirement: we want to build on the same platform we run on.

WC tar achieves this. If I remember correctly: The problem with bumping the gun tar version is that more recent (>1.23) versions, that have sparse and other patches included, require a version of autoconf (>2.60) that is not readily available on RHEL5.

Comment by Richard Henwood (Inactive) [ 03/Jan/13 ]

No longer relevant: tar with sparse is used to file-level backup MDT. Restoring a MDT from a file-level backup is only supported on 2.3 and beyond. 2.3 only supports rhel6. rhel6 tar distribution includes the sparse patch.

Generated at Sat Feb 10 01:09:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.