[LU-6574] chown doesn't update object ownership on ost Created: 06/May/15  Updated: 08/May/15  Resolved: 08/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Niu Yawei (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

doing a chown doesn't update object ownership on ost. This has caused our quota to be way off.
here is a example

pfe22 ~ #  stat  /nobacku....180.3d.vtk
  File: `/nobacku...180.3d.vtk'
  Size: 65820817  	Blocks: 128568     IO Block: 4194304 regular file
Device: 48b492b4h/1219793588d	Inode: 144118607926790745  Links: 1
Access: (0644/-rw-r--r--)  Uid: (11467/  XXXX)   Gid: (30730/   XXXXX)
Access: 2014-01-17 17:09:36.000000000 -0800
Modify: 2014-01-17 17:09:36.000000000 -0800
Change: 2015-05-06 11:33:29.000000000 -0700
 Birth: -
pfe22 ~ #  chown root:root /noback....180.3d.vtk

lfs getstripe /nobackup....180.3d.vtk
/nobacku....180.3d.vtk
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  0
	obdidx		 objid		 objid		 group
	     0	       7286499	     0x6f2ee3	             0

debugfs:  pwd
[pwd]   INODE: 6815751  PATH: /O/0/d3
[root]  INODE:      2  PATH: /
debugfs:  stat 7286499
Inode: 117   Type: regular    Mode:  0666   Flags: 0x80000
Generation: 2012045754    Version: 0x00000009:04f812d0
User: 11467   Group:     0   Size: 65820817
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 128568
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x52d9d450:00000000 -- Fri Jan 17 17:09:36 2014
 atime: 0x00000000:00000000 -- Wed Dec 31 16:00:00 1969
 mtime: 0x52d9d450:00000000 -- Fri Jan 17 17:09:36 2014
crtime: 0x52d9c7d5:451115c8 -- Fri Jan 17 16:16:21 2014
Size of extra inode fields: 28
Extended attributes stored in inode body: 
  lma = "00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 e3 2e 6f 00 00 00 00 00 " (24)
  lma: fid=[0x100000000:0x6f2ee3:0x0] compat=0 incompat=0
  fid = "3f 1c 03 00 02 00 00 00 59 0a 00 00 00 00 00 00 " (16)
  fid: parent=[0x200031c3f:0xa59:0x0] stripe=0
EXTENTS:
(ETB0):170790783, (0-511):170587392-170587903, (512-1023):170558464-170558975, (1024-2047):170557440-170558463, (2048-6143):170784768-170788863, (6144-16069):1708011
52-170811077

I will upload debug from mds when i did a chown to root:root and then back to the user.
file fid is [0x200031c3f:0xa59:0x0]



 Comments   
Comment by Mahmoud Hanafi [ 06/May/15 ]

uploaded debugs logs to ftp.whamcloud.com:/uploads/LU6574/out.2.bz2

Comment by Mahmoud Hanafi [ 06/May/15 ]

looks like this is where it is failing

00000004:00000001:13.0:1430953684.928752:0:5251:0:(osp_sync.c:481:osp_sync_new_setattr_job()) Process entered
00000004:00020000:13.0:1430953684.928753:0:5251:0:(osp_sync.c:487:osp_sync_new_setattr_job()) nbp7-OST0000-osc-MDT0000: invalid setattr record, lsr_valid:100
00000004:00000001:13.0:1430953684.951012:0:5251:0:(osp_sync.c:494:osp_sync_new_setattr_job()) Process leaving (rc=0 : 0 : 0)
00000004:00080000:13.0:1430953684.951013:0:5251:0:(osp_sync.c:666:osp_sync_process_record()) found record 10692401, 64, idx 64371, id 89608251: 0
Comment by Peter Jones [ 07/May/15 ]

Niu

Could you please look into this issue?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 07/May/15 ]

Any patches applied to your 2.4.3? The messages in osp_sync_new_setattr_job() should not be in 2.4.3 tree.

Can this be reproduced on any files? file created by 'lfs setstripe'? Existing file? And newly created file? Thanks.

Comment by Peter Jones [ 07/May/15 ]

NASA's tree is at https://github.com/jlan/lustre-nas

Comment by Niu Yawei (Inactive) [ 07/May/15 ]

Thank you, Peter.

Mahmoud,
The error message "osp_sync_new_setattr_job()) nbp7-OST0000-osc-MDT0000: invalid setattr record, lsr_valid:100" looks strange to me, I don't see how can the 'valid' be 100 from the code. The log you uploaded to ftp doesn't have such error messages, perhaps it was truncated? Could you send me a complete log if possible? Do you have a reproducer or this can only be reproduced on certain files? Thank you.

Comment by Jay Lan (Inactive) [ 08/May/15 ]

The trouble MDS were running 2.4.3-14.2nasS. Mahmoud has tested that the 2.4.3-14.3nasS fixed the problem. We are planning an upgrade.

Comment by Peter Jones [ 08/May/15 ]

Jay

Does this mean that there is a particular fix that you can point to that probably has resolved this issue? Can we close out this ticket as a duplicate?

Peter

Comment by Jay Lan (Inactive) [ 08/May/15 ]

Peter,

It was LU-4345.

In dealing with LU-6269 about 2 months ago, Oleg Drokin pointed me to LU-4345, and we (very) unhappily found out we had an earlier version of LU-4345 patch (I think it was patchset #1) that seemed to have worked for us back then without knowing that LU-4345 have gradually evolved itself to patchset #7.

I upgraded lustre-nas 2.4.3 from 14.2nasS to 14.3nasS by picking up the LU-4345 patchset #7 back then, but our servers stayed at 14.2nasS.

Mahmoud is going to upgrade the servers today.

Comment by Mahmoud Hanafi [ 08/May/15 ]

We can close this ticket.

Comment by Peter Jones [ 08/May/15 ]

ok - thanks!

Generated at Sat Feb 10 02:01:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.