[LU-11803] sanity test 255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0' Created: 18/Dec/18  Updated: 01/May/20  Resolved: 23/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Critical
Reporter: James Nunez (Inactive) Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: ubuntu
Environment:

Ubuntu 18.04 clients


Issue Links:
Duplicate
is duplicated by LU-11806 sanity-hsm, sanity-dom, racer fails t... Resolved
is duplicated by LU-11808 sanity-flr test 0b and 0c fails with ... Resolved
is duplicated by LU-11823 sanity-flr test 0b and 0c fails with ... Resolved
is duplicated by LU-12247 sanityn test 34 fails with 'test_34 r... Resolved
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
is related to LU-11838 Support linux kernel version 4.18 Resolved
is related to LU-13499 client UUID is truncated Resolved
is related to LU-11809 conf-sanity test 28A hangs on file sy... Resolved
is related to LU-13118 change client instance to respect ASLR Open
is related to LU-12521 print_instance() incorrect if fsname ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test_255c fails with 'Ladvise test 12, bad lock count, returned 1, actual 0'. It looks like this is only impacting Ubuntu 18.04 client testing.

Looking at the client test_log from https://testing.whamcloud.com/test_sets/d1b23f66-fdd4-11e8-a97c-52540065bddc , a problem is that we can’t find lock_unused_count

== sanity test 255c: suite of ladvise lockahead tests ================================================ 19:46:58 (1544471218)
CMD: trevis-19vm3 /usr/sbin/lctl get_param -n version 2>/dev/null ||
				/usr/sbin/lctl lustre_build_version 2>/dev/null ||
				/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
Starting test test10 at 1544471219
Finishing test test10 at 1544471219
CMD: trevis-19vm3 /usr/sbin/lctl get_param -n ost.OSS.ost.stats
Starting test test11 at 1544471219
Finishing test test11 at 1544471219
CMD: trevis-19vm3 /usr/sbin/lctl get_param -n ost.OSS.ost.stats
error: get_param: param_path 'ldlm/namespaces/lustre-OST0000*osc-f*/lock_unused_count': No such file or directory
Starting test test12 at 1544471220
Finishing test test12 at 1544471220
error: get_param: param_path 'ldlm/namespaces/lustre-OST0000*osc-f*/lock_unused_count': No such file or directory

There are several example of this failure all for Ubuntu at
https://testing.whamcloud.com/test_sets/bd048afa-f713-11e8-b67f-52540065bddc
https://testing.whamcloud.com/test_sets/b223ccee-f778-11e8-b67f-52540065bddc
https://testing.whamcloud.com/test_sets/50f5970e-fa16-11e8-8a18-52540065bddc



 Comments   
Comment by James Nunez (Inactive) [ 18/Dec/18 ]

Since this is a similar failure, we are also seeing sanity-flr tests 0g and 31 fail with similar issues

== sanity-flr test 0g: lfs mirror create flags support =============================================== 00:45:20 (1544489120)
osc.lustre-OST0000-osc-        (ptrval).stats=clear
osc.lustre-OST0001-osc-        (ptrval).stats=clear
osc.lustre-OST0002-osc-        (ptrval).stats=clear
osc.lustre-OST0003-osc-        (ptrval).stats=clear
osc.lustre-OST0004-osc-        (ptrval).stats=clear
osc.lustre-OST0005-osc-        (ptrval).stats=clear
osc.lustre-OST0006-osc-        (ptrval).stats=clear
error: get_param: param_path 'osc/lustre-OST0000-osc-ffff*/stats': No such file or directory
 sanity-flr test_0g: @@@@@@ FAIL: read was not provided by OST1 

and

== sanity-flr test 31: make sure glimpse request can be retried ====================================== 00:46:46 (1544489206)
fail_loc=0x1A00
CMD: trevis-19vm3 grep -c /mnt/lustre-ost1' ' /proc/mounts || true
Stopping /mnt/lustre-ost1 (opts:) on trevis-19vm3
CMD: trevis-19vm3 umount -d /mnt/lustre-ost1
CMD: trevis-19vm3 lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
CMD: trevis-19vm1.trevis.whamcloud.com lctl get_param -n at_min
can't get osc.lustre-OST0000-osc-ffff*.ost_server_uuid in 40 secs
error: get_param: param_path 'ldlm/namespaces/lustre-OST0000-osc-ffff*/lock_count': No such file or directory
error: get_param: param_path 'ldlm/namespaces/lustre-OST0001-osc-ffff*/lock_count': No such file or directory
 sanity-flr test_31: @@@@@@ FAIL: OST 1: no glimpse request was sent 
Comment by Andreas Dilger [ 18/Dec/18 ]

simmonsja this looks like fallout from moving parameters from /proc to /sys? The Ubuntu kernel is 4.15, which is definitely the newest one we have running, so may be more likely to be affected by these changes.

Comment by James A Simmons [ 18/Dec/18 ]

Let me look

Comment by James A Simmons [ 18/Dec/18 ]

Its the uuid issue. So newer kernels no long allow you to expose the pointer address to user land. We need to create a new UUID method that is not the pointer of an internal kernel object

Comment by James A Simmons [ 19/Dec/18 ]

I see the issue. Its a test problem in this case. The people that wrote these test assume the obd device name is of the format $FSNAME-OST0000-osc-ffff* but that is not always the case. In newer kernels the ffff is filtered out. This also impacts older kernels running on non x86 platforms. For example on Power8 is lustre-OST0000-osc-c000001e4de63000. I will push a patch to fix this.

Comment by Gerrit Updater [ 19/Dec/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33894
Subject: LU-11803 tests: don't assume obd device name
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ce075a8726fe822803f24d1b336301ad87e9ce12

Comment by Gerrit Updater [ 26/Dec/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33916
Subject: LU-11803 llite: use sget() to find client super block
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 9df5133ac346b807bcb671fdea0adb7ccb628a5e

Comment by Gerrit Updater [ 18/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33894/
Subject: LU-11803 tests: don't assume obd device name
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b8e6c8bdca9bd0e12d78cd4a06800c13f4293325

Comment by Gerrit Updater [ 30/Apr/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34783
Subject: LU-11803 tests: don't assume obd device name
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 2ab9df22c74f2404ac0d8e2632073f11d167eeba

Comment by Gerrit Updater [ 08/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33916/
Subject: LU-11803 obd: replace class_uuid with linux kernel version.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 604c266a175b72500ef99793652b64ed4f842b2c

Comment by James A Simmons [ 08/May/19 ]

Some more work is needed.

Comment by Gerrit Updater [ 10/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34783/
Subject: LU-11803 tests: don't assume obd device name
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 0909a1c8d7aa4d9e4f95077da4b3b31e3d4c392f

Comment by Gerrit Updater [ 11/Jul/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35480
Subject: LU-11803 obd: replace class_uuid with linux kernel version.
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: b4f322de10fbaf05c564643e8f42da770d2987c8

Comment by James A Simmons [ 23/Jan/20 ]

We can migrate the remaining work to LU-13118

Generated at Sat Feb 10 02:47:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.