[LU-1673] Locking issue with 1.8.x clients talking to 2.2 Servers Created: 25/Jul/12 Updated: 19/Nov/12 Resolved: 19/Nov/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | ETHz Support (Inactive) | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | client, locking, server | ||
| Environment: |
client 1.8.x server 2.2 |
||
| Severity: | 2 |
| Epic: | client, locking, server |
| Rank (Obsolete): | 4007 |
| Description |
|
We noticed that clients running Lustre 1.8.x seem to have trouble locking files hosted on 2.2 Servers. |
| Comments |
| Comment by ETHz Support (Inactive) [ 25/Jul/12 ] |
|
This simple C code is enough to reproduce the problem: $ cat test.c int main() { printf("-- starting --\n"); fd = open("locktest.txt", O_RDWR); r = flock(fd, LOCK_EX|LOCK_NB); flock(fd, LOCK_UN); Creating 'locktest.txt' on a 2.2 server (while using the 1.8) client + starting the application ~2-3 times causes flock() to fail: rm -f locktest.txt ; touch locktest.txt ; for x in {0..5} ; do ./a.out ; sleep 1 ; done The 'EAGAIN' error will be gone after a couple of seconds (i suppose that's when the leaked Note that exactly the same code works fine on:
The 1.8 client in my test is running: The 2.2 servers are on: bash-4.1$ uname -r bash-4.1$ cat /proc/fs/lustre/version The filesystem is mounted via: |
| Comment by Peter Jones [ 25/Jul/12 ] |
|
Oleg is look into this one |
| Comment by Oleg Drokin [ 25/Jul/12 ] |
|
Hm, I was under impression that the fix for this landed in time for 2.2, but alas. The patch that fixes this could be found here: http://review.whamcloud.com/#change,2193 |
| Comment by ETHz Support (Inactive) [ 26/Jul/12 ] |
|
Would we have to patch only the MDS or also all OSTs? Could you provide us the rpms patched for 2.2 servers? Thanks in advance. |
| Comment by Oleg Drokin [ 27/Jul/12 ] |
|
flocks are only taken on MDS, so updating just MDS is fine. |
| Comment by ETHz Support (Inactive) [ 27/Jul/12 ] |
|
Could you provide us patched rpms? Thanks in advance |
| Comment by Peter Jones [ 27/Jul/12 ] |
|
Yangsheng is working on creating patched RPMs |
| Comment by James A Simmons [ 27/Jul/12 ] |
|
Don't forget the patch for http://review.whamcloud.com/#change,3008 as well since it is need to 2.3 <-> 2.2 interop testing, |
| Comment by Peter Jones [ 27/Jul/12 ] |
| Comment by Oleg Drokin [ 28/Jul/12 ] |
|
James, like you correctly mention, that change is only needed for 2.3 interop which is not the case here, so there is no rush to get it included, esp. since there is no official 2.3 release and won't be for some time. You can get RPMs here: http://build.whamcloud.com/job/lustre-reviews/7975/arch=x86_64,build_type=server,distro=el6,ib_stack=inkernel/artifact/artifacts/RPMS/ |
| Comment by Yang Sheng [ 19/Nov/12 ] |
|
As Lurii commented in gerrit: Iurii Golovach Nov 2 Patch Set 1: I would prefer that you didn't submit this This patch looks obsolete since there are already number of patches which cover this issue: http://review.whamcloud.com/#change,3722 http://review.whamcloud.com/#change,3202 http://review.whamcloud.com/#change,3203 http://review.whamcloud.com/#change,3725 http://review.whamcloud.com/#change,3727 So close this bug. |