[LU-324] chgrp can't be successful to many files Created: 13/May/11 Updated: 31/Aug/12 Resolved: 31/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Lai Siyao |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 10458 |
| Description |
|
When a user tried to run chgrp command (recursively) to an directory which has a lot of files (4-5 million files), the client hanged and became no response. They tried a couple of time (even other client), but it couldn't be successful due to the following errors. Apr 27 10:16:25 t2a006166 kernel: [1641802.777783] Lustre: work0-MDT0000-mdc-ffff880d8ae6c800: Connection to service work0-MDT0000 via nid 10.1.7.17@o2ib was lost; in progress operations using this service will wait for recovery to complete. |
| Comments |
| Comment by Peter Jones [ 16/May/11 ] |
|
Lai Could you please look into this one? Thanks Peter |
| Comment by Lai Siyao [ 16/May/11 ] |
|
Okay! |
| Comment by Shuichi Ihara (Inactive) [ 05/Jun/11 ] |
|
Lai, any progress? |
| Comment by Lai Siyao [ 05/Jun/11 ] |
|
Shuichi, I don't have machine to test this in the past week. I saw all the logs are client side, and it shows network disconnect every a few minutes. Could you get the log on MDS? |
| Comment by Lai Siyao [ 06/Jun/11 ] |
|
Hi Shuichi, I tested chgrp on a directory with 1M files, and it succeeded. What's the average file size on your setup? And the stripe count? After the client stalls, can you get the console messages? I will test with more files up to 4-5M. |
| Comment by Lai Siyao [ 15/Jun/11 ] |
|
Hi Ihara, Do you have more informations on this now? And if possible could you try chmod recursively on a directory? chgrp will send SETATTR RPC to OSS, while chmod won't, this could help narrow down the cause of this hang. Thanks,
|
| Comment by Kit Westneat (Inactive) [ 01/Aug/12 ] |
|
We have been unable to reproduce, and the customer has since upgraded, so this can be closed |