[LU-11803] sanity test 255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0' Created: 18/Dec/18 Updated: 01/May/20 Resolved: 23/Jan/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | James Nunez (Inactive) | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ubuntu | ||
| Environment: |
Ubuntu 18.04 clients |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
sanity test_255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0'. It looks like this is only impacting Ubuntu 18.04 client testing. Looking at the client test_log from https://testing.whamcloud.com/test_sets/d1b23f66-fdd4-11e8-a97c-52540065bddc , a problem is that we can’t find lock_unused_count == sanity test 255c: suite of ladvise lockahead tests ================================================ 19:46:58 (1544471218) CMD: trevis-19vm3 /usr/sbin/lctl get_param -n version 2>/dev/null || /usr/sbin/lctl lustre_build_version 2>/dev/null || /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2 Starting test test10 at 1544471219 Finishing test test10 at 1544471219 CMD: trevis-19vm3 /usr/sbin/lctl get_param -n ost.OSS.ost.stats Starting test test11 at 1544471219 Finishing test test11 at 1544471219 CMD: trevis-19vm3 /usr/sbin/lctl get_param -n ost.OSS.ost.stats error: get_param: param_path 'ldlm/namespaces/lustre-OST0000*osc-f*/lock_unused_count': No such file or directory Starting test test12 at 1544471220 Finishing test test12 at 1544471220 error: get_param: param_path 'ldlm/namespaces/lustre-OST0000*osc-f*/lock_unused_count': No such file or directory There are several example of this failure all for Ubuntu at |
| Comments |
| Comment by James Nunez (Inactive) [ 18/Dec/18 ] |
|
Since this is a similar failure, we are also seeing sanity-flr tests 0g and 31 fail with similar issues == sanity-flr test 0g: lfs mirror create flags support =============================================== 00:45:20 (1544489120) osc.lustre-OST0000-osc- (ptrval).stats=clear osc.lustre-OST0001-osc- (ptrval).stats=clear osc.lustre-OST0002-osc- (ptrval).stats=clear osc.lustre-OST0003-osc- (ptrval).stats=clear osc.lustre-OST0004-osc- (ptrval).stats=clear osc.lustre-OST0005-osc- (ptrval).stats=clear osc.lustre-OST0006-osc- (ptrval).stats=clear error: get_param: param_path 'osc/lustre-OST0000-osc-ffff*/stats': No such file or directory sanity-flr test_0g: @@@@@@ FAIL: read was not provided by OST1 and == sanity-flr test 31: make sure glimpse request can be retried ====================================== 00:46:46 (1544489206) fail_loc=0x1A00 CMD: trevis-19vm3 grep -c /mnt/lustre-ost1' ' /proc/mounts || true Stopping /mnt/lustre-ost1 (opts:) on trevis-19vm3 CMD: trevis-19vm3 umount -d /mnt/lustre-ost1 CMD: trevis-19vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' || true CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n at_min can't get osc.lustre-OST0000-osc-ffff*.ost_server_uuid in 40 secs error: get_param: param_path 'ldlm/namespaces/lustre-OST0000-osc-ffff*/lock_count': No such file or directory error: get_param: param_path 'ldlm/namespaces/lustre-OST0001-osc-ffff*/lock_count': No such file or directory sanity-flr test_31: @@@@@@ FAIL: OST 1: no glimpse request was sent |
| Comment by Andreas Dilger [ 18/Dec/18 ] |
|
simmonsja this looks like fallout from moving parameters from /proc to /sys? The Ubuntu kernel is 4.15, which is definitely the newest one we have running, so may be more likely to be affected by these changes. |
| Comment by James A Simmons [ 18/Dec/18 ] |
|
Let me look |
| Comment by James A Simmons [ 18/Dec/18 ] |
|
Its the uuid issue. So newer kernels no long allow you to expose the pointer address to user land. We need to create a new UUID method that is not the pointer of an internal kernel object |
| Comment by James A Simmons [ 19/Dec/18 ] |
|
I see the issue. Its a test problem in this case. The people that wrote these test assume the obd device name is of the format $FSNAME-OST0000-osc-ffff* but that is not always the case. In newer kernels the ffff is filtered out. This also impacts older kernels running on non x86 platforms. For example on Power8 is lustre-OST0000-osc-c000001e4de63000. I will push a patch to fix this. |
| Comment by Gerrit Updater [ 19/Dec/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33894 |
| Comment by Gerrit Updater [ 26/Dec/18 ] |
|
James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33916 |
| Comment by Gerrit Updater [ 18/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33894/ |
| Comment by Gerrit Updater [ 30/Apr/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34783 |
| Comment by Gerrit Updater [ 08/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33916/ |
| Comment by James A Simmons [ 08/May/19 ] |
|
Some more work is needed. |
| Comment by Gerrit Updater [ 10/May/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34783/ |
| Comment by Gerrit Updater [ 11/Jul/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35480 |
| Comment by James A Simmons [ 23/Jan/20 ] |
|
We can migrate the remaining work to LU-13118 |