[LU-1206] Executing unwriteable, recently touched file returns ETXTBSY (Text file busy) Created: 12/Mar/12 Updated: 27/Sep/12 Resolved: 08/Apr/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.1 |
| Fix Version/s: | Lustre 2.2.0, Lustre 2.3.0, Lustre 2.1.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Prakash Surya (Inactive) | Assignee: | Sarah Liu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 4688 | ||||||||
| Description |
|
An issue with Lustre 2.1.1 (haven't tested other versions) is causing execve to fail with ETXTBUSY. There appear to be two keys to reproducing the problem: 1) The file must not be writable Here's a simple reproducer that break under lustre 2.1 on hype: $ touch foo # create it the first time It doesn't matter if the file has real contents or not to reproduce the problem. NOTE: I haven't been able to reproduce this on my VM in a single node configuration. I only see this on our test cluster. |
| Comments |
| Comment by Peter Jones [ 12/Mar/12 ] |
|
Sarah will try and reproduce this on Toro |
| Comment by Prakash Surya (Inactive) [ 12/Mar/12 ] |
|
Using stap, it makes two calls to ll_file_open when trying to execute the file: 0 bash(20065):->ll_file_open inode=0xffff8801c8ad73b8 file=0xffff88021929c9c0
59 bash(20065):<-ll_file_open return=0xffffffffffffffe6
0 bash(20065):->ll_file_open inode=0xffff8801c8ad73b8 file=0xffff8801c7394e00
78 bash(20065):<-ll_file_open return=0x0
The first returning the error, the second returning 0. Digging into the first call which returns the error, we have: 00000080:00000001:0.0:1331584249.384384:0:18125:0:(file.c:509:ll_file_open()) Process entered 00000080:00200000:0.0:1331584249.384384:0:18125:0:(file.c:512:ll_file_open()) VFS Op:inode=144219945415213110/33578822(ffff8801d679aaf8), flags 100040 00000080:00000010:0.0:1331584249.384386:0:18125:0:(file.c:61:ll_file_data_get()) slab-alloced '(fd)': 192 at ffff8801eda9ab00. 00000080:00000010:0.0:1331584249.384387:0:18125:0:(file.c:620:ll_file_open()) kmalloced '*och_p': 40 at ffff8802192cb8c0. 00000080:00000001:0.0:1331584249.384388:0:18125:0:(file.c:633:ll_file_open()) Process leaving via out_och_free (rc=18446744073709551590 : -26 : 0xffffffffffffffe6) 00000080:00000010:0.0:1331584249.384389:0:18125:0:(file.c:672:ll_file_open()) kfreed '*och_p': 40 at ffff8802192cb8c0. Which shows it failing the following check: 626 /* md_intent_lock() didn't get a request ref if there was an 627 * open error, so don't do cleanup on the request here 628 * (bug 3430) */ 629 /* XXX (green): Should not we bail out on any error here, not 630 * just open error? */ 631 rc = it_open_error(DISP_OPEN_OPEN, it); 632 if (rc) 633 GOTO(out_och_free, rc); |
| Comment by Prakash Surya (Inactive) [ 12/Mar/12 ] |
|
Backtracking to the MDS, here's what I believe originates the ETXTBSY error: 00000004:00000001:1.0:1331602989.995422:0:5278:0:(mdt_open.c:536:mdt_write_deny()) Process entered 00000004:00000001:1.0:1331602989.995422:0:5278:0:(mdt_open.c:543:mdt_write_deny()) Process leaving (rc=18446744073709551590 : -26 : ffffffffffffffe6) |
| Comment by Prakash Surya (Inactive) [ 13/Mar/12 ] |
|
From what I can tell, mdt_write_put is not called when touching an unwritable file. In this case, mdt_write_get is called which increments the mdt_object's mot_writecount field, but mdt_write_put is never called to decrement it. Thus, mot_writecount is continually increasing by one with each invocation of touch. This begs the question, why isn't mdt_write_put being called when the file is unwriteable? |
| Comment by Sarah Liu [ 13/Mar/12 ] |
|
Cannot reproduce it on Toro. The configuration is one MDS(client-17), 6 OST(fat-amd-3) and two clients. [root@client-19 lustre]# mount |
| Comment by Prakash Surya (Inactive) [ 13/Mar/12 ] |
|
See Patch: http://review.whamcloud.com/2300 |
| Comment by Prakash Surya (Inactive) [ 13/Mar/12 ] |
|
Sarah, Did you 'touch' the file *after* removing it's write permission? That is key to reproducing the issue. |
| Comment by Prakash Surya (Inactive) [ 13/Mar/12 ] |
|
Also, this needs to be run as a non-root user. |
| Comment by Sarah Liu [ 13/Mar/12 ] |
|
I did the same you listed in the description, and I will try with non-root user. |
| Comment by Sarah Liu [ 14/Mar/12 ] |
|
I tried this again with non-root and it can be reproduced on Toro. [root@client-18 ~]# su sanityusr |
| Comment by Prakash Surya (Inactive) [ 14/Mar/12 ] |
|
Perfect! Can you apply the patch I posted earlier and try the same test? That patch fixes the issue for me in my very limited testing. |
| Comment by Christopher Morrone [ 14/Mar/12 ] |
|
Please also assign reviewers for Prakash's patch. |
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 16/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 20/Mar/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 08/Apr/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|
| Comment by Build Master (Inactive) [ 02/May/12 ] |
|
Integrated in Result = SUCCESS
|