[LU-417] block usage is reported as zero by stat call for tens of seconds after creating a file - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.2.0, Lustre 2.1.1
Affects Version/s: Lustre 2.1.0, Lustre 2.2.0, Lustre 1.8.6
Labels:
None
Environment:

Hide
- Lustre version
  1.8.5 release from Oracle

- MDS, OSS
  CentOS 5.5
  kernel: 2.6.18-194.17.1.el5_lustre.1.8.5

- Client
  SLES11SP1
  kernel: 2.6.32.19-0.3-default

Show
- Lustre version   1.8.5 release from Oracle - MDS, OSS   CentOS 5.5   kernel: 2.6.18-194.17.1.el5_lustre.1.8.5 - Client   SLES11SP1   kernel: 2.6.32.19-0.3-default

Severity:
3
Epic:
- metadata
Rank (Obsolete):
4797

Description

If a file is written on Lustre filesystem and it is copied to local(xfs)
file system immediately, copied file become sparse file.

For example:

sgiadm@recca01:~> df /work /data
Filesystem 1K-blocks Used Available Use% Mounted on
10.0.1.2@o2ib:/lustre
38446862208 25530740868 10963120932 70% /work
/dev/lxvm/IS5000-File-1
123036116992 41805493792 81230623200 34% /data

sgiadm@recca01:/data/sgi> cat test.sh
#!/bin/sh
SRC=/work/sgi
DST=/data/sgi

rm $SRC/file* $DST/file*

dd if=/dev/zero of=$SRC/file0 bs=1024k count=100
cp $SRC/file0 $DST/file0
dd if=/dev/zero of=$SRC/file1 bs=1024k count=100 oflag=direct
cp $SRC/file1 $DST/file1
sync
wait

ls -sl $SRC
ls -sl $DST
sgiadm@recca01:/data/sgi> ./test.sh
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.282088 s, 372 MB/s
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.13752 s, 92.2 MB/s
total 204804
102404 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:02 file0
102404 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:02 file1
total 102404
0 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:02 file0
102400 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:02 file1
4 -rwxr-xr-x 1 sgiadm users 338 2011-06-13 16:01 test.sh

In above case, file0 was copied as sparse file.

One minutes after, the problem no longer happens.

sgiadm@recca01:~> cp /work/sgi/file0 /data/sgi/file0-2
sgiadm@recca01:~> ls -sl /data/sgi
total 204804
0 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:02 file0
102400 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:51 file0-2
102400 ~~rw-r~~r- 1 sgiadm users 104857600 2011-06-13 16:02 file1
4 -rwxr-xr-x 1 sgiadm users 338 2011-06-13 16:01 test.sh

It looks like the problem happens if data is on cache and does not happen
while using direct i/o.
Also, I noticed stat command reports 0 block for about 30 seconds after
writing a file.

sgiadm@recca01:/work/sgi> dd if=/dev/zero of=file0 bs=1024k count=1; stat file0; sleep 60; stat file0
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00106648 s, 983 MB/s
File: `file0'
Size: 1048576 Blocks: 0 IO Block: 2097152 regular file
Device: 2c54f966h/743766374d Inode: 5801177 Links: 1
Access: (0644/~~rw-r~~r-) Uid: ( 501/ sgiadm) Gid: ( 100/ users)
Access: 2011-06-13 19:13:27.000000000 +0900
Modify: 2011-06-13 19:15:06.000000000 +0900
Change: 2011-06-13 19:15:06.000000000 +0900
File: `file0'
Size: 1048576 Blocks: 2048 IO Block: 2097152 regular file
Device: 2c54f966h/743766374d Inode: 5801177 Links: 1
Access: (0644/~~rw-r~~r-) Uid: ( 501/ sgiadm) Gid: ( 100/ users)
Access: 2011-06-13 19:13:27.000000000 +0900
Modify: 2011-06-13 19:15:06.000000000 +0900
Change: 2011-06-13 19:15:06.000000000 +0900

I guess the problem happens when the file copied before data blocks are
allocated to OSTs.
~~LU-274~~ has already reported which is file size issue on MDS.
But, This problem is block usage issue on OSS. I think those are very
similar but might be different problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cp.strace.tgz
318 kB
17/Jun/11 12:12 AM

Issue Links

is related to

LU-2580 cp with FIEMAP support creates completely sparse file

Resolved

is related to

LU-682 optimization for Lustre-tar on completely sparse files.

Closed

Trackbacks

Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA

Changelog 2.1 Changes from version 2.1.0 to version 2.1.1 Server support for kernels: 2.6.18274.12.1.el5 (RHEL5) 2.6.32220.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.12.1.el5 (RHEL5) 2.6.32220.el6 (RHEL6) 2.6.32.360....

Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....

block usage is reported as zero by stat call for tens of seconds after creating a file

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates