[LU-11948] parallel-scale-nfsv4 test connectathon fails ''connectathon failed: 1'' Created: 08/Feb/19 Updated: 23/Nov/22 |
|
| Status: | In Progress |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Alex Deiter |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
DNE/ZFS |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
parallel-scale-nfsv4 test_connectathon fails with ''connectathon failed: 1'' . So far, we’ve only seen this issue once, a DNE with ZFS test session; https://testing.whamcloud.com/test_sets/a1fe9392-2b96-11e9-90fb-52540065bddc . Looking at the client test_log, we see a segmentation fault Congratulations, you passed the basic tests! ... Pass 4 ... Starting BASIC tests: test directory /mnt/lustre/d0.parallel-scale-nfs/d0.connectathon (arg: -f) ./test1: File and directory creation test ./test1: (/opt/connectathon/basic) runtests: line 28: 8630 Segmentation fault ./test1 $TESTARG basic tests failed parallel-scale-nfsv4 test_connectathon: @@@@@@ FAIL: connectathon failed: 1 Looking at the MDS1, 3 (vm11) console log, we see the following errors [124786.514802] Lustre: DEBUG MARKER: ./runtests -N 10 -l -f /mnt/lustre/d0.parallel-scale-nfs/d0.connectathon [124859.432032] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [124928.294500] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125035.913851] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125065.916330] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125105.547691] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125135.550497] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125175.230096] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125244.986669] LustreError: 28274:0:(file.c:3941:ll_file_flock()) unknown fcntl lock command: 1029 [125244.987704] LustreError: 28274:0:(file.c:3941:ll_file_flock()) Skipped 1 previous similar message [125251.814692] Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null |
| Comments |
| Comment by Andreas Dilger [ 08/Feb/19 ] |
|
Looking at the kernel code, it looks like 1029 is: #define FL_POSIX 1 #define FL_DELEG 4 /* NFSv4 delegation */ #define FL_OFDLCK 1024 /* lock is "owned" by struct file */ FL_OFDLCK was "added" in v3.15-rc1-15-gcff2fce58b2b, but that is really just a rename of FL_FILE_PVT added in v3.14-rc1-10-gc918d42a27a9. This is a new type of lock that is attached to a process file struct rather than the file descriptor table (which may be shared between tasks). This avoids problems with file locks being accidentally dropped if a file descriptor is cloned due to fork/exec, and then the file is closed by the second process. |
| Comment by Andreas Dilger [ 08/Feb/19 ] |
|
Andriy, it looks like you have been working in this area most recently? |
| Comment by James Nunez (Inactive) [ 05/Jan/21 ] |
|
We do still see the "unknown fcntl lock command: 1029" messages when running connectathon for parallel-scale-nfsv*, but it does not cause test connectathon to fail. One example is at https://testing.whamcloud.com/test_sets/bc5183ad-2cad-4b97-aba4-604b73b9765f where the messages can be seen on the MDS console log. There are several other examples. |
| Comment by Andreas Dilger [ 05/Jan/21 ] |
|
It would be good to add support for the this type of lock, since I expect userspace servers like Samba and Ganesha are/will use this to avoid complexities in flock lock handling. However, I have no idea about how easy/hard it is to implement. Maybe just adding a flag to quiet the error message, or maybe a complete protocol change because the MDS needs different locking semantics. I'm not really sure, but I hope/suspect it is on the easier side. |