[LU-6460] LLIF_FILE_RESTORING is not cleared at end of restore Created: 13/Apr/15  Updated: 31/Jul/16  Resolved: 31/Jul/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Critical
Reporter: John Hammond Assignee: Bruno Faccini (Inactive)
Resolution: Fixed Votes: 0
Labels: hsm, medium

Issue Links:
Related
is related to LU-4727 Lhsmtool_posix process stuck in ll_la... Resolved
is related to LU-7040 Interop 2.7.0<->master sanity-hsm tes... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The LLIF_FILE_RESTORING flag is not cleared until an IO is performed on the inode. This may cause stale file attributes to be cached if the file is stated during restore. To reproduce:

export MOUNT_2=y
llmount.sh
lctl conf_param lustre-MDT0000.mdt.hsm_control=enabled
mkdir -p /mnt/lustre-hsm
mount $HOSTNAME@tcp:/lustre /mnt/lustre-hsm -t lustre -o user_xattr,flock
mkdir -p /tmp/arc1
lhsmtool_posix -vvvv --hsm_root=/tmp/arc1 --daemon /mnt/lustre-hsm 2> /tmp/hsm.log
echo XXX > /mnt/lustre/f0
lfs hsm_archive /mnt/lustre/f0
sleep 1
lfs hsm_release /mnt/lustre/f0
killall lhsmtool_posix
cat /mnt/lustre/f0 &
sleep 1
stat /mnt/lustre2/f0
lhsmtool_posix -vvvv --hsm_root=/tmp/arc1 --daemon /mnt/lustre-hsm 2> /tmp/hsm.log
wait
dd if=/dev/zero of=/mnt/lustre/f0 count=1
stat /mnt/lustre/f0
stat /mnt/lustre2/f0
sleep 60
stat /mnt/lustre/f0
stat /mnt/lustre2/f0

Output

...
t:~# lhsmtool_posix -vvvv --hsm_root=/tmp/arc1 --daemon /mnt/lustre-hsm 2> /tmp/hsm.log
t:~# echo XXX > /mnt/lustre/f0
t:~# lfs hsm_archive /mnt/lustre/f0
t:~# sleep 1
t:~# lfs hsm_release /mnt/lustre/f0
t:~# killall lhsmtool_posix
t:~# cat /mnt/lustre/f0 &
[1] 10620
t:~# sleep 1
t:~# stat /mnt/lustre2/f0
  File: `/mnt/lustre2/f0'
  Size: 4         	Blocks: 1          IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115205255725063  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-04-13 16:58:17.000000000 -0500
Modify: 2015-04-13 16:58:17.000000000 -0500
Change: 2015-04-13 16:58:17.000000000 -0500
t:~# lhsmtool_posix -vvvv --hsm_root=/tmp/arc1 --daemon /mnt/lustre-hsm 2> /tmp/hsm.log
t:~# wait
XXX
[1]+  Done                    cat /mnt/lustre/f0
t:~# dd if=/dev/zero of=/mnt/lustre/f0 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000714119 s, 717 kB/s
t:~# stat /mnt/lustre/f0
  File: `/mnt/lustre/f0'
  Size: 512       	Blocks: 8          IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115205255725063  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-04-13 16:58:17.000000000 -0500
Modify: 2015-04-13 16:58:28.000000000 -0500
Change: 2015-04-13 16:58:28.000000000 -0500
t:~# stat /mnt/lustre2/f0
  File: `/mnt/lustre2/f0'
  Size: 4         	Blocks: 1          IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115205255725063  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-04-13 16:58:17.000000000 -0500
Modify: 2015-04-13 16:58:28.000000000 -0500
Change: 2015-04-13 16:58:28.000000000 -0500
t:~# sleep 60
t:~# stat /mnt/lustre/f0
  File: `/mnt/lustre/f0'
  Size: 512       	Blocks: 8          IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115205255725063  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-04-13 16:58:17.000000000 -0500
Modify: 2015-04-13 16:58:28.000000000 -0500
Change: 2015-04-13 16:58:28.000000000 -0500
t:~# stat /mnt/lustre2/f0
  File: `/mnt/lustre2/f0'
  Size: 4         	Blocks: 1          IO Block: 4194304 regular file
Device: 2c54f966h/743766374d	Inode: 144115205255725063  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-04-13 16:58:17.000000000 -0500
Modify: 2015-04-13 16:58:28.000000000 -0500
Change: 2015-04-13 16:58:28.000000000 -0500


 Comments   
Comment by Peter Jones [ 21/Apr/15 ]

Bruno

Could you please look into this one?

Peter

Comment by Andreas Dilger [ 21/Apr/15 ]

John, what is the impact of this bug? What does userspace do with LLIF_FILE_RESTORING?

Comment by John Hammond [ 21/Apr/15 ]

> John, what is the impact of this bug? What does userspace do with LLIF_FILE_RESTORING?

The impact is that the wrong size and attributes may be reported by stat. A reproducer is shown in the description.

Userspace cannot see LLIF_FILE_RESTORING. llite uses this flag to determine if the attributes from the MDT are sufficient for stat().

Comment by Gerrit Updater [ 27/Apr/15 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/14609
Subject: LU-6460 llite: clear LLIF_FILE_RESTORING when done
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8a3f8dc28924784f3c06467c25a692779e65dfde

Comment by Gerrit Updater [ 31/Jul/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/14609/
Subject: LU-6460 llite: clear LLIF_FILE_RESTORING when done
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a2c4b0583a84165b867b18e1446c187d18335879

Comment by Peter Jones [ 31/Jul/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:00:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.