Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.13.0
-
3
-
9223372036854775807
Description
Running the first manual test in the PCC test plan, “Pressure Test via compilebench”, I get an error when adding a PCC backend on a client (trevis-62vm8):
# lctl pcc add /lustre/scratch/lpcc /dev/vda3 -p "projid={100} rwid=2" lctl pcc pcc: error: setting llite.scratch-ffff9383a119e000.pcc="add /dev/vda3 projid={100} rwid=2" : Not a directory (20)
On the client (vm8) dmesg, I see many of these messages:
[68485.718305] LustreError: 11-0: scratch-OST0003-osc-ffff9383f9c47800: operation ost_connect to node 10.9.6.176@tcp failed: rc = -30
Running ‘lfs df’ on the client (vm8), hangs before printing the information for OST0003. Yet running the same command on other clients (vm7) completes successfully.
Since it seems like there’s a problem on an OSS, we look at the dmesg on the OSS (vm4) and see the following messages
[68313.219321] LDISKFS-fs (dm-1): error count since last fsck: 54 [68313.220003] LDISKFS-fs (dm-1): initial error at time 1570550293: ldiskfs_mb_check_ondisk_bitmap:3719 [68313.220929] LDISKFS-fs (dm-1): last error at time 1570559969: ldiskfs_lookup:1816: inode 399
and then many of the following messages
[68368.364485] LustreError: 24267:0:(tgt_lastrcvd.c:1027:tgt_client_new()) scratch-OST0003: Failed to write client lcd at idx 3, rc -30 [68368.365722] LustreError: 24267:0:(tgt_lastrcvd.c:1027:tgt_client_new()) Skipped 477 previous similar messages [68630.939863] LustreError: 25860:0:(tgt_grant.c:248:tgt_grant_sanity_check()) ofd_destroy_export: tot_granted 277312 != fo_tot_granted 8715072 [68630.941251] LustreError: 25860:0:(tgt_grant.c:248:tgt_grant_sanity_check()) Skipped 232 previous similar messages
What did we do to get here? We set project quotas on the MDTs and OSTs and enabled HSM coordinators on all (4) MDTs. The on the client (vm8), we mount Lustre and ran the following commands:
# mount | grep lustre 10.9.6.172@tcp:/scratch on /lustre/scratch type lustre (rw,flock,user_xattr,lazystatfs) 10.9.6.172@tcp:/scratch on /lustre/scratch2 type lustre (rw,flock,user_xattr,lazystatfs) # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 253:0 0 120G 0 disk ├─vda1 253:1 0 20G 0 part / ├─vda2 253:2 0 2G 0 part [SWAP] └─vda3 253:3 0 98G 0 part # lhsmtool_posix --daemon --hsm-root /dev/vda3 --archive=2 /lustre/scratch < /dev/null > -bash: syntax error near unexpected token `newline' # lhsmtool_posix --daemon --hsm-root /dev/vda3 --archive=2 /lustre/scratch lhsmtool_posix: 1570566464.446881 lhsmtool_posix[1256]: action=0 src=(null) dst=(null) mount_point=/lustre/scratch # lhsmtool_posix: 1570566464.459653 lhsmtool_posix[1257]: waiting for message from kernel # ps -aux | grep hsm root 1257 0.0 0.0 18500 1464 ? Ss 20:27 0:00 lhsmtool_posix --daemon --hsm-root /dev/vda3 --archive=2 /lustre/scratch # # /tmp/copytool_log 2>&1 -bash: /tmp/copytool_log: No such file or directory # mkdir /lustre/scratch/lpcc # lfs project -sp 100 /lustre/scratch/lpcc # lctl pcc add /lustre/scratch /dev/vda3 -p "projid={100} rwid=2" lctl pcc pcc: error: setting llite.scratch-ffff9383a119e000.pcc="add /dev/vda3 projid={100} rwid=2" : Not a directory (20)