[LU-9940] posix no sub tests failed: Created: 01/Sep/17 Updated: 19/Nov/20 Resolved: 18/May/20 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0, Lustre 2.10.2, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.10.5, Lustre 2.13.0, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Trevis, full |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
https://testing.whamcloud.com/test_sessions/0cca9fbf-af0d-4ad7-ad37-7797a1864e19 From suite_log: Setup mgs, mdt, osts CMD: trevis-10vm4 mkdir -p /mnt/lustre-mds1 CMD: trevis-10vm4 test -b /dev/lvm-Role_MDS/P1 CMD: trevis-10vm4 e2label /dev/lvm-Role_MDS/P1 Starting mds1: /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1 CMD: trevis-10vm4 mkdir -p /mnt/lustre-mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/lustre-mds1 trevis-10vm4: mount.lustre: according to /etc/mtab /dev/mapper/lvm--Role_MDS-P1 is already mounted on /mnt/lustre-mds1 Note: The "already mounted" message was also present in |
| Comments |
| Comment by James Nunez (Inactive) [ 01/Sep/17 ] |
|
IF you look at the posix.test_complete.stack_trace.trevis-10vm1.log for this test suite, you'll see multiple "errors" 23:16:40:[10995.167125] nfs: server trevis-10vm4 not responding, timed out 23:16:40:[10995.168097] nfs: server trevis-10vm4 not responding, timed out So, something is wrong with the NFS server upon entering this test suite. If you look at the preceding test suite, parallel-scale-nfsv4, at the parallel-scale-nfsv4.suite_stdout.trevis-10vm1.log, we see an issue unmounting the Lustre NFS mount 22:15:35:PASS racer_on_nfs (307s) 22:15:35:== parallel-scale-nfsv4 test complete, duration 2509 sec ============================================= 22:15:26 (1503526526) 22:15:35: 22:15:35:Unmounting NFS clients... 22:15:35:CMD: trevis-10vm1.trevis.hpdd.intel.com,trevis-10vm2 umount -f /mnt/lustre 22:15:35:trevis-10vm1: umount.nfs4: /mnt/lustre: device is busy 22:15:35: 22:15:35:Unexporting Lustre filesystem... 22:15:35:CMD: trevis-10vm1.trevis.hpdd.intel.com,trevis-10vm2 chkconfig --list rpcidmapd 2>/dev/null | 22:15:35: grep -q rpcidmapd && service rpcidmapd stop || 22:15:35: true 22:15:35:CMD: trevis-10vm4 chkconfig --list nfsserver > /dev/null 2>&1 && 22:15:35: service nfsserver stop || service nfs stop 22:15:35:trevis-10vm4: Redirecting to /bin/systemctl stop nfs.service 22:15:35:CMD: trevis-10vm4 exportfs -u *:/mnt/lustre 22:15:35:trevis-10vm4: exportfs: Could not find '*:/mnt/lustre' to unexport. |
| Comment by James Nunez (Inactive) [ 25/Apr/19 ] |
|
We’re seeing a similar issue for ARM architectures with Lustre 2.12.1 RC1; https://testing.whamcloud.com/test_sets/4fdace66-66c7-11e9-bd0e-52540065bddc . In the console log for client 2 (vm26), we see ======================================= 18:16:30 \(1556129790\) [79692.698703] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test racer_on_nfs: racer on NFS client ======================================= 18:16:30 (1556129790) [79693.576177] Lustre: DEBUG MARKER: MDSCOUNT=4 OSTCOUNT=8 LFS=/usr/bin/lfs /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/d0.parallel-scale-nfs [79796.948911] 16[32169]: unhandled level 3 translation fault (11) at 0x00000008, esr 0x92000007, in ld-2.17.so[ffffa3810000+20000] [79796.964022] CPU: 1 PID: 32169 Comm: 16 Kdump: loaded Tainted: G OE ------------ 4.14.0-115.2.2.el7a.aarch64 #1 [79796.970552] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [79796.974656] task: ffff8000389dcc00 task.stack: ffff000010320000 [79796.978150] PC is at 0xffffa381ab3c [79796.980218] LR is at 0xffffa3813ce4 [79796.982295] pc : [<0000ffffa381ab3c>] lr : [<0000ffffa3813ce4>] pstate: 60000000 [79796.986812] sp : 0000ffffd4d2c3c0 [79796.988760] x29: 0000ffffd4d2c3c0 x28: 0000ffffa3840ff8 [79796.991864] x27: 0000000000000000 x26: 0000000000000000 [79796.995014] x25: 0000ffffa3840000 x24: 0000ffffa3840a20 [79796.998168] x23: 0000000000000000 x22: 0000ffffa3841168 [79797.001282] x21: 0000ffffa383f000 x20: 0000000000000001 [79797.004501] x19: 0000000000000001 x18: 0000000000000000 [79797.007655] x17: 0000ffffa382587c x16: 0000ffffa383ff80 [79797.010772] x15: 0000ffffa38252e4 x14: 0000ffffa3840000 [79797.013994] x13: 0000000000010000 x12: 0000000400000006 [79797.017088] x11: 756e694c00000000 x10: 00000078756e694c [79797.020238] x9 : 0000000000000000 x8 : 0000ffffa3840000 [79797.023462] x7 : 000000000000001c x6 : 0000ffffa383fc70 [79797.026778] x5 : 0000ffffa3842260 x4 : 0000ffffa3840000 [79797.029896] x3 : 0000000000000000 x2 : 0000ffffa383f000 [79797.033018] x1 : 0000000000000000 x0 : 0000000000000000 [79994.698005] NFS: server trevis-54vm11 error: fileid changed for racer_on_nfs in parallel-scale-nfsv3 and nfs-v4. We see a similar message in the client 1 console log. |
| Comment by James Nunez (Inactive) [ 18/May/20 ] |
|
We will not fix this issue because we’ve replaced the POSIX test suite with pjdfstest. |