[LU-13030] pcc: auto_attach doesn't work after client cache cleared Created: 28/Nov/19  Updated: 05/Dec/19  Resolved: 05/Dec/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Shuichi Ihara Assignee: Qian Yingjin
Resolution: Fixed Votes: 0
Labels: PCC
Environment:

pcc


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

PCC auto_attach feature doesn't work and no re-attach after clear caches on client.
Here is a reproducer.

PCC auto_attach feature doesn't work and no re-attach after clear caches on client.

Enable HSM
[root@c01 ~]# clush -w mds[01-02] lctl set_param mdt.*.hsm_control=enabled
mds01: mdt.ai400-MDT0000.hsm_control=enabled
mds02: mdt.ai400-MDT0001.hsm_control=enabled

Create PCC cache space
[root@c01 ~]# mkfs.ext4 /dev/sdb 
[root@c01 ~]# mount /dev/sdb /pcc 

Start HSM copytool
[root@c01 ~]# lhsmtool_posix --daemon --hsm-root /pcc --archive=1 /ai400
[root@c01 ~]# ps -ef | grep hsm
root      2872     1  0 15:09 ?        00:00:00 lhsmtool_posix --daemon --hsm-root /pcc --archive=1 /ai400
root      2874  2340  0 15:09 pts/0    00:00:00 grep --color=auto hsm

Enable PCC with projid=100
[root@c01 ~]# lctl pcc add /ai400 /pcc -p "projid={100} auto_attach=1 rwid=1"
[root@c01 ~]# lctl pcc list /ai400
/pcc:
  rwid: 1
  flags: 1f
  autocache: projid={100}

Reproducer
[root@c01 ~]# mkdir /ai400/lpcc
[root@c01 ~]# lfs project -sp 100 /ai400/lpcc/
[root@c01 ~]# echo "QQQ" > /ai400/lpcc/test
[root@c01 ~]# lfs pcc state /ai400/lpcc/test
file: /ai400/lpcc/test, type: readwrite, PCC file: /0082/0000/0403/0000/0002/0000/0x200000403:0x82:0x0, user number: 0, flags: 1c
[root@c01 ~]# echo 3 > /proc/sys/vm/drop_caches

[root@c01 ~]# lfs pcc state /ai400/lpcc/test
file: /ai400/lpcc/test, type: none
[root@c01 ~]# lfs pcc state /ai400/lpcc/test
file: /ai400/lpcc/test, type: none

After drop cache on client, 'lfs pcc state' should trigger auto_attach, but it doesn't.



 Comments   
Comment by Gerrit Updater [ 28/Nov/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/36892
Subject: LU-13030 pcc: auto attach not work after chient cache clear
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 248b0baf1b9c87e97350884e8a8e7944c4fc65d0

Comment by Gerrit Updater [ 03/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36892/
Subject: LU-13030 pcc: auto attach not work after client cache clear
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a5ef2d6e068eae5a055746b0f79ce6749f4c9a6d

Comment by Peter Jones [ 03/Dec/19 ]

Landed for 2.13

Comment by Shuichi Ihara [ 04/Dec/19 ]

It seems patch is not enough? and still strange behaviors when if it triggers multiple "drop caches" on client. Here is case.
cerate 8 x 100M file with fio.

# /work/tools/bin/fio -name=randread -ioengine=sync -rw=randread -blocksize=4k -iodepth=1 -direct=1 -size=100m -runtime=10 -numjobs=8 -group_reporting -directory=/ai400/proj100 -create_serialize=0 -filename_format='f.$jobnum.$filenum'

[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: readwrite, PCC file: /0008/0000/0401/0000/0002/0000/0x200000401:0x8:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.1.0, type: readwrite, PCC file: /0006/0000/0401/0000/0002/0000/0x200000401:0x6:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.2.0, type: readwrite, PCC file: /0003/0000/0401/0000/0002/0000/0x200000401:0x3:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.3.0, type: readwrite, PCC file: /0005/0000/0401/0000/0002/0000/0x200000401:0x5:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.4.0, type: readwrite, PCC file: /0004/0000/0401/0000/0002/0000/0x200000401:0x4:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.5.0, type: readwrite, PCC file: /0007/0000/0401/0000/0002/0000/0x200000401:0x7:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.6.0, type: readwrite, PCC file: /0009/0000/0401/0000/0002/0000/0x200000401:0x9:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.7.0, type: readwrite, PCC file: /000a/0000/0401/0000/0002/0000/0x200000401:0xa:0x0, user number: 0, flags: 0

state is fine here.

[root@c01 ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: readwrite, PCC file: /0008/0000/0401/0000/0002/0000/0x200000401:0x8:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.1.0, type: readwrite, PCC file: /0006/0000/0401/0000/0002/0000/0x200000401:0x6:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.2.0, type: readwrite, PCC file: /0003/0000/0401/0000/0002/0000/0x200000401:0x3:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.3.0, type: readwrite, PCC file: /0005/0000/0401/0000/0002/0000/0x200000401:0x5:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.4.0, type: readwrite, PCC file: /0004/0000/0401/0000/0002/0000/0x200000401:0x4:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.5.0, type: readwrite, PCC file: /0007/0000/0401/0000/0002/0000/0x200000401:0x7:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.6.0, type: readwrite, PCC file: /0009/0000/0401/0000/0002/0000/0x200000401:0x9:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.7.0, type: readwrite, PCC file: /000a/0000/0401/0000/0002/0000/0x200000401:0xa:0x0, user number: 0, flags: 0

state is fine even after drop cache.

However, if drop cache on client again, all state are none and doesn't re-attach even after 'lfs pcc state' command.

[root@c01 ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: none
file: /ai400/proj100/f.1.0, type: none
file: /ai400/proj100/f.2.0, type: none
file: /ai400/proj100/f.3.0, type: none
file: /ai400/proj100/f.4.0, type: none
file: /ai400/proj100/f.5.0, type: none
file: /ai400/proj100/f.6.0, type: none
file: /ai400/proj100/f.7.0, type: none

[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: none
file: /ai400/proj100/f.1.0, type: none
file: /ai400/proj100/f.2.0, type: none
file: /ai400/proj100/f.3.0, type: none
file: /ai400/proj100/f.4.0, type: none
file: /ai400/proj100/f.5.0, type: none
file: /ai400/proj100/f.6.0, type: none
file: /ai400/proj100/f.7.0, type: none

server and client are running rc2 build

[root@c01 ~]# clush -w c01,es400nv-vm1 lctl get_param version
es400nv-vm1: version=2.13.0_RC2
c01: version=2.13.0_RC2
Comment by Gerrit Updater [ 04/Dec/19 ]

Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/36923
Subject: LU-13030 pcc: Init saved dataset flags wrongly
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a1f3b70af6c5b400df8e16219748b0ad2cc777da

Comment by Shuichi Ihara [ 04/Dec/19 ]

Yingjin found a root cause and he advised me the following fix.

diff --git a/lustre/llite/llite_lib.c b/lustre/llite/llite_lib.c
index 36e34ed092..54a0db1fb6 100644
--- a/lustre/llite/llite_lib.c
+++ b/lustre/llite/llite_lib.c
@@ -1007,7 +1007,7 @@ void ll_lli_init(struct ll_inode_info *lli)
                mutex_init(&lli->lli_pcc_lock);
                lli->lli_pcc_state = PCC_STATE_FL_NONE;
                lli->lli_pcc_inode = NULL;
-               lli->lli_pcc_dsflags = PCC_DATASET_NONE;
+               lli->lli_pcc_dsflags = PCC_DATASET_INVALID;
                lli->lli_pcc_generation = 0;
                mutex_init(&lli->lli_group_mutex);
                lli->lli_group_users = 0;

I've just confirmed this fix solved original problem finally! here is results. even after multiple drop caches on clients, pcc state is back again automatically after "lfs pcc state" command.

/work/tools/bin/fio -name=randread -ioengine=sync -rw=randread -blocksize=4k -iodepth=1 -direct=1 -size=100m -runtime=10 -numjobs=8 -group_reporting -directory=/ai400/proj100 -create_serialize=0 -filename_format='f.$jobnum.$filenum'

[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: readwrite, PCC file: /0009/0000/0401/0000/0002/0000/0x200000401:0x9:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.1.0, type: readwrite, PCC file: /000a/0000/0401/0000/0002/0000/0x200000401:0xa:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.2.0, type: readwrite, PCC file: /0003/0000/0401/0000/0002/0000/0x200000401:0x3:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.3.0, type: readwrite, PCC file: /0004/0000/0401/0000/0002/0000/0x200000401:0x4:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.4.0, type: readwrite, PCC file: /0008/0000/0401/0000/0002/0000/0x200000401:0x8:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.5.0, type: readwrite, PCC file: /0005/0000/0401/0000/0002/0000/0x200000401:0x5:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.6.0, type: readwrite, PCC file: /0006/0000/0401/0000/0002/0000/0x200000401:0x6:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.7.0, type: readwrite, PCC file: /0007/0000/0401/0000/0002/0000/0x200000401:0x7:0x0, user number: 0, flags: 0
[root@c01 ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: readwrite, PCC file: /0009/0000/0401/0000/0002/0000/0x200000401:0x9:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.1.0, type: readwrite, PCC file: /000a/0000/0401/0000/0002/0000/0x200000401:0xa:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.2.0, type: readwrite, PCC file: /0003/0000/0401/0000/0002/0000/0x200000401:0x3:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.3.0, type: readwrite, PCC file: /0004/0000/0401/0000/0002/0000/0x200000401:0x4:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.4.0, type: readwrite, PCC file: /0008/0000/0401/0000/0002/0000/0x200000401:0x8:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.5.0, type: readwrite, PCC file: /0005/0000/0401/0000/0002/0000/0x200000401:0x5:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.6.0, type: readwrite, PCC file: /0006/0000/0401/0000/0002/0000/0x200000401:0x6:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.7.0, type: readwrite, PCC file: /0007/0000/0401/0000/0002/0000/0x200000401:0x7:0x0, user number: 0, flags: 0
[root@c01 ~]# echo 3 > /proc/sys/vm/drop_caches 
[root@c01 ~]# lfs pcc state /ai400/proj100/f.*
file: /ai400/proj100/f.0.0, type: readwrite, PCC file: /0009/0000/0401/0000/0002/0000/0x200000401:0x9:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.1.0, type: readwrite, PCC file: /000a/0000/0401/0000/0002/0000/0x200000401:0xa:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.2.0, type: readwrite, PCC file: /0003/0000/0401/0000/0002/0000/0x200000401:0x3:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.3.0, type: readwrite, PCC file: /0004/0000/0401/0000/0002/0000/0x200000401:0x4:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.4.0, type: readwrite, PCC file: /0008/0000/0401/0000/0002/0000/0x200000401:0x8:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.5.0, type: readwrite, PCC file: /0005/0000/0401/0000/0002/0000/0x200000401:0x5:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.6.0, type: readwrite, PCC file: /0006/0000/0401/0000/0002/0000/0x200000401:0x6:0x0, user number: 0, flags: 0
file: /ai400/proj100/f.7.0, type: readwrite, PCC file: /0007/0000/0401/0000/0002/0000/0x200000401:0x7:0x0, user number: 0, flags: 0

Thanks Yingijin for your prompt fix!

Comment by Gerrit Updater [ 05/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36923/
Subject: LU-13030 pcc: Init saved dataset flags properly
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e467a421c7aa175b9316f4ee78f392050d2f587f

Comment by Peter Jones [ 05/Dec/19 ]

Landed for 2.13

Generated at Sat Feb 10 02:57:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.