[LU-12839] ‘lctl pcc add …’ fails with “: Not a directory (20)” Created: 08/Oct/19  Updated: 08/Oct/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: PCC

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Running the first manual test in the PCC test plan, “Pressure Test via compilebench”, I get an error when adding a PCC backend on a client (trevis-62vm8):

# lctl pcc add /lustre/scratch/lpcc /dev/vda3 -p "projid={100} rwid=2"
lctl pcc pcc: error: setting llite.scratch-ffff9383a119e000.pcc="add /dev/vda3 projid={100} rwid=2"
: Not a directory (20)

On the client (vm8) dmesg, I see many of these messages:

[68485.718305] LustreError: 11-0: scratch-OST0003-osc-ffff9383f9c47800: operation ost_connect to node 10.9.6.176@tcp failed: rc = -30

Running ‘lfs df’ on the client (vm8), hangs before printing the information for OST0003. Yet running the same command on other clients (vm7) completes successfully.

Since it seems like there’s a problem on an OSS, we look at the dmesg on the OSS (vm4) and see the following messages

[68313.219321] LDISKFS-fs (dm-1): error count since last fsck: 54
[68313.220003] LDISKFS-fs (dm-1): initial error at time 1570550293: ldiskfs_mb_check_ondisk_bitmap:3719
[68313.220929] LDISKFS-fs (dm-1): last error at time 1570559969: ldiskfs_lookup:1816: inode 399

and then many of the following messages

[68368.364485] LustreError: 24267:0:(tgt_lastrcvd.c:1027:tgt_client_new()) scratch-OST0003: Failed to write client lcd at idx 3, rc -30
[68368.365722] LustreError: 24267:0:(tgt_lastrcvd.c:1027:tgt_client_new()) Skipped 477 previous similar messages
[68630.939863] LustreError: 25860:0:(tgt_grant.c:248:tgt_grant_sanity_check()) ofd_destroy_export: tot_granted 277312 != fo_tot_granted 8715072
[68630.941251] LustreError: 25860:0:(tgt_grant.c:248:tgt_grant_sanity_check()) Skipped 232 previous similar messages 

What did we do to get here? We set project quotas on the MDTs and OSTs and enabled HSM coordinators on all (4) MDTs. The on the client (vm8), we mount Lustre and ran the following commands:

# mount | grep lustre
10.9.6.172@tcp:/scratch on /lustre/scratch type lustre (rw,flock,user_xattr,lazystatfs)
10.9.6.172@tcp:/scratch on /lustre/scratch2 type lustre (rw,flock,user_xattr,lazystatfs)
# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    253:0    0  120G  0 disk 
├─vda1 253:1    0   20G  0 part /
├─vda2 253:2    0    2G  0 part [SWAP]
└─vda3 253:3    0   98G  0 part 
# lhsmtool_posix --daemon --hsm-root /dev/vda3 --archive=2 /lustre/scratch < /dev/null >
-bash: syntax error near unexpected token `newline'
# lhsmtool_posix --daemon --hsm-root /dev/vda3 --archive=2 /lustre/scratch
lhsmtool_posix: 1570566464.446881 lhsmtool_posix[1256]: action=0 src=(null) dst=(null) mount_point=/lustre/scratch
# lhsmtool_posix: 1570566464.459653 lhsmtool_posix[1257]: waiting for message from kernel

# ps -aux | grep hsm
root      1257  0.0  0.0  18500  1464 ?        Ss   20:27   0:00 lhsmtool_posix --daemon --hsm-root /dev/vda3 --archive=2 /lustre/scratch
# 
# /tmp/copytool_log 2>&1
-bash: /tmp/copytool_log: No such file or directory
# mkdir /lustre/scratch/lpcc
# lfs project -sp 100 /lustre/scratch/lpcc
# lctl pcc add /lustre/scratch /dev/vda3 -p "projid={100} rwid=2"
lctl pcc pcc: error: setting llite.scratch-ffff9383a119e000.pcc="add /dev/vda3 projid={100} rwid=2"
: Not a directory (20)


Generated at Sat Feb 10 02:56:06 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.