[LU-7802] set_param lru_size fails with 'error: set_param: setting /proc/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-*/lru_size=clear: Invalid argument' Created: 22/Feb/16 Updated: 24/Oct/17 Resolved: 22/Sep/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0, Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Patrick Farrell (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Environment: |
autotest and manual testing |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
lctl set_param -n ldlm.namespaces.*$1*.lru_size=clear fails with error message error: set_param: setting /proc/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-ffff880077f04000/lru_size=clear: Invalid argument I've seen this error message in the test_log for a few sanity tests. The error does not seem to make the test fail (should it?) and the error is not consistent meaning that a test could hit the error on one test run and not experience the error the next. Here are a few instances of this error I've come across: The error comes from a call to 'cancel_lru_locks osc'. From tests/test-framework.sh, we see cancel_lru_locks() {
#$LCTL mark "cancel_lru_locks $1 start"
$LCTL set_param -n ldlm.namespaces.*$1*.lru_size=clear
$LCTL get_param ldlm.namespaces.*$1*.lock_unused_count | grep -v '=0'
#$LCTL mark "cancel_lru_locks $1 stop"
It's not clear what is causing this error. Since this error does not cause the test to fail, it's hard to find other occurrences of this error and when it first started. |
| Comments |
| Comment by Oleg Drokin [ 22/Feb/16 ] |
if (strncmp(dummy, "clear", 5) == 0) { CDEBUG(D_DLMTRACE, "dropping all unused locks from namespace %s\n", ldlm_ns_name(ns)); if (ns_connect_lru_resize(ns)) { int canceled, unused = ns->ns_nr_unused; /* Try to cancel all @ns_nr_unused locks. */ canceled = ldlm_cancel_lru(ns, unused, 0, LDLM_LRU_FLAG_PASSED); if (canceled < unused) { CDEBUG(D_DLMTRACE, "not all requested locks are canceled, " "requested: %d, canceled: %d\n", unused, canceled); return -EINVAL; } This seems racy and perhaps there were other cancellers in parallel or something? Probaly need to revisit taht code? |
| Comment by James A Simmons [ 09/Feb/17 ] |
|
With the migration to sysfs I can take a look at it. |
| Comment by Sarah Liu [ 27/Mar/17 ] |
|
another instance on master branch: https://testing.hpdd.intel.com/test_sets/b24483ae-0a02-11e7-9053-5254006e85c2 |
| Comment by Bob Glossman (Inactive) [ 14/Jul/17 ] |
|
another on master: |
| Comment by James Nunez (Inactive) [ 28/Jul/17 ] |
|
It looks like sanity test 101g also suffers from this issue and, from the test log, fails with ... error: set_param: setting /sys/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-ffff880068917800/lru_size=clear: Invalid argument ldlm.namespaces.lustre-OST0000-osc-ffff880068917800.lock_unused_count=1 10+0 records in 10+0 records out 41943040 bytes (42 MB) copied, 0.00800729 s, 5.2 GB/s sanity test_101g: @@@@@@ FAIL: 0 != 10 read RPCs Logs are at: |
| Comment by Bob Glossman (Inactive) [ 07/Aug/17 ] |
|
another on master: |
| Comment by James A Simmons [ 14/Aug/17 ] |
|
Removed LU-8066 link since this is a race condition and not a sysfs issue. What I do see is a potential patch from |
| Comment by Steve Guminski (Inactive) [ 15/Aug/17 ] |
|
Another on master: https://testing.hpdd.intel.com/test_sessions/d7870a08-73b3-4f95-898b-f4f0908c9214 |
| Comment by Patrick Farrell (Inactive) [ 15/Aug/17 ] |
|
This isn't racy so much as just wrong. Sometimes locks are in use, so we don't cancel them. That's intended behavior. The fix for this is just not to return -EINVAL. This isn't a condition that should generate that sort of error. I'll push a patch. |
| Comment by Gerrit Updater [ 15/Aug/17 ] |
|
Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/28560 |
| Comment by Bob Glossman (Inactive) [ 24/Aug/17 ] |
|
another on master: |
| Comment by Sebastien Buisson (Inactive) [ 30/Aug/17 ] |
|
another on master: |
| Comment by Gerrit Updater [ 13/Sep/17 ] |
|
Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/28975 |
| Comment by Gerrit Updater [ 22/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28560/ |
| Comment by Peter Jones [ 22/Sep/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 24/Oct/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28975/ |