[LU-324] chgrp can't be successful to many files Created: 13/May/11  Updated: 31/Aug/12  Resolved: 31/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: None

Attachments: File t2a006166.0417     File t2a006166.0427     File t2a006169.0417    
Severity: 3
Rank (Obsolete): 10458

 Description   

When a user tried to run chgrp command (recursively) to an directory which has a lot of files (4-5 million files), the client hanged and became no response. They tried a couple of time (even other client), but it couldn't be successful due to the following errors.

Apr 27 10:16:25 t2a006166 kernel: [1641802.777783] Lustre: work0-MDT0000-mdc-ffff880d8ae6c800: Connection to service work0-MDT0000 via nid 10.1.7.17@o2ib was lost; in progress operations using this service will wait for recovery to complete.
Apr 27 10:16:25 t2a006166 kernel: [1641802.777991] LustreError: 167-0: This client was evicted by work0-MDT0000; in progress operations using this service will fail.
Apr 27 10:16:25 t2a006166 kernel: [1641802.778116] LustreError: 16507:0:(llite_lib.c:1520:ll_setattr_raw()) mdc_setattr fails: rc = -4



 Comments   
Comment by Peter Jones [ 16/May/11 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Lai Siyao [ 16/May/11 ]

Okay!

Comment by Shuichi Ihara (Inactive) [ 05/Jun/11 ]

Lai, any progress?

Comment by Lai Siyao [ 05/Jun/11 ]

Shuichi, I don't have machine to test this in the past week.

I saw all the logs are client side, and it shows network disconnect every a few minutes. Could you get the log on MDS?

Comment by Lai Siyao [ 06/Jun/11 ]

Hi Shuichi, I tested chgrp on a directory with 1M files, and it succeeded.

What's the average file size on your setup? And the stripe count? After the client stalls, can you get the console messages?

I will test with more files up to 4-5M.

Comment by Lai Siyao [ 15/Jun/11 ]

Hi Ihara,

Do you have more informations on this now? And if possible could you try chmod recursively on a directory? chgrp will send SETATTR RPC to OSS, while chmod won't, this could help narrow down the cause of this hang.

Thanks,

  • Lai
Comment by Kit Westneat (Inactive) [ 01/Aug/12 ]

We have been unable to reproduce, and the customer has since upgraded, so this can be closed

Generated at Sat Feb 10 01:05:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.