[LU-9966] sanity test_411: fail to trigger a memory allocation error Created: 09/Sep/17 Updated: 06/Oct/18 Resolved: 06/Oct/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/32b0aa4c-9502-11e7-ba84-5254006e85c2. The sub-test test_411 failed with the following error: fail to trigger a memory allocation error test_411 is very new. has been failing since 9/1 more: Info required for matching: sanity 411 |
| Comments |
| Comment by Bob Glossman (Inactive) [ 09/Sep/17 ] |
|
this may be a 100% fail on any sles12. It may never have worked except on RHEL 7. |
| Comment by Andreas Dilger [ 09/Sep/17 ] |
|
Bob, it would be helpful if you linked this (and other regressions) to the Jira ticket and patch that added this new test. |
| Comment by Bob Glossman (Inactive) [ 09/Sep/17 ] |
|
The patch that added test 411 was https://review.whamcloud.com/21745, " |
| Comment by Bob Glossman (Inactive) [ 10/Sep/17 ] |
|
In sles12 there is no /sys/fs/cgroup/memory/memory.kmem.limit_in_bytes |
| Comment by Peter Jones [ 10/Sep/17 ] |
|
Is there an equivalent function that could be used instead or should we just skip the test for sles12 (and presumably any other newer kernels)? |
| Comment by Bob Glossman (Inactive) [ 10/Sep/17 ] |
Needs the Author of the test to answer that question. If the solution is to skip the test when the needed /sys entry isn't there I can push a patch for that. There is already some skip logic there, I would just need to extend it a bit. |
| Comment by Yang Sheng [ 13/Sep/17 ] |
|
https://testing.hpdd.intel.com/test_sets/57f429a2-97f8-11e7-b9c6-5254006e85c2 Looks like this test is failed by permission issue. == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 01:03:59 (1505203439) 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 1.78497 s, 58.7 MB/s /usr/lib64/lustre/tests/sanity.sh: line 16400: /sys/fs/cgroup/memory/osc_slab_alloc/memory.kmem.limit_in_bytes: Permission denied 204800+0 records in 204800+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 23.5257 s, 4.5 MB/s sanity test_411: @@@@@@ FAIL: fail to trigger a memory allocation error Trace dump: The 'osc_slab_alloc/memory.kmem.limit_in_bytes' cannot be changed so trigger action is failed. I'll try to find the cause. Thanks, |
| Comment by Bob Glossman (Inactive) [ 13/Sep/17 ] |
|
it reports as "permission denied" but pretty sure it's due to the entry not existing. Easy fix to check for the entry & skip if it doesn't exist, but not sure that's the right approach. I can push a mod that does that for inspection. |
| Comment by Gerrit Updater [ 13/Sep/17 ] |
|
Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/28974 |
| Comment by Yang Sheng [ 13/Sep/17 ] |
|
The 'CONFIG_MEMCG_KMEM' is disabled in sles12 default. So kmem.limit_in_bytes is absent. Then skipping is right solution. |
| Comment by Bob Glossman (Inactive) [ 13/Sep/17 ] |
|
I see that 'CONFIG_MEMCG_KMEM' is enabled in rhel7 by default. Totally explains why test 411 works on rhel7 and doesn't work on sles12. |
| Comment by Gerrit Updater [ 18/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28974/ |
| Comment by Peter Jones [ 18/Sep/17 ] |
|
Landed for 2.11 |
| Comment by Joseph Gmitter (Inactive) [ 21/Nov/17 ] |
|
Seeing this failure on the flr branch: |
| Comment by Jian Yu [ 01/Feb/18 ] |
|
Two failure instances occurred on master branch yesterday: |
| Comment by Bruno Faccini (Inactive) [ 08/Feb/18 ] |
|
Looks like some allocation errors did occur anyway during these failed test sessions : [ 6308.394164] Lustre: DEBUG MARKER: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 22:34:17 (1517438057) [ 6311.637464] SLUB: Unable to allocate memory on node -1 (gfp=0x8050) [ 6311.638238] cache: kmalloc-512(0:osc_slab_alloc), object size: 512, buffer size: 512, default order: 1, min order: 0 [ 6311.638238] node 0: slabs: 13, objs: 208, free: 0 [ 6311.670203] SLUB: Unable to allocate memory on node -1 (gfp=0x0) [ 6311.670957] cache: kmalloc-192(0:osc_slab_alloc), object size: 192, buffer size: 192, default order: 0, min order: 0 [ 6311.670957] node 0: slabs: 1, objs: 21, free: 0 [ 6360.020975] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_411: @@@@@@ FAIL: fail to trigger a memory allocation error [ 6360.203970] Lustre: DEBUG MARKER: sanity test_411: @@@@@@ FAIL: fail to trigger a memory allocation error But not causing "dd" command to fail as it is expected in sanity/test_411: == sanity test 411: Slab allocation error with cgroup does not LBUG ================================== 22:34:17 (1517438057)
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 3.13092 s, 33.5 MB/s
204800+0 records in
204800+0 records out
104857600 bytes (105 MB) copied, 48.1542 s, 2.2 MB/s
sanity test_411: @@@@@@ FAIL: fail to trigger a memory allocation error
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:5718:error()
= /usr/lib64/lustre/tests/sanity.sh:17667:test_411()
= /usr/lib64/lustre/tests/test-framework.sh:5994:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:6033:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:5880:run_test()
= /usr/lib64/lustre/tests/sanity.sh:17673:main()
Dumping lctl log to /home/autotest/autotest/logs/test_logs/2018-01-31/lustre-reviews-el7-x86_64--review-ldiskfs--1_8_1__54107___60bc072b-48d1-4e5e-bb15-747752d7c9b7/sanity.test_411.*.1517438110.log
CMD: trevis-10vm10,trevis-10vm11,trevis-10vm12,trevis-10vm9.trevis.hpdd.intel.com /usr/sbin/lctl dk > /home/autotest/autotest/logs/test_logs/2018-01-31/lustre-reviews-el7-x86_64--review-ldiskfs--1_8_1__54107___60bc072b-48d1-4e5e-bb15-747752d7c9b7/sanity.test_411.debug_log.\$(hostname -s).1517438110.log;
dmesg > /home/autotest/autotest/logs/test_logs/2018-01-31/lustre-reviews-el7-x86_64--review-ldiskfs--1_8_1__54107___60bc072b-48d1-4e5e-bb15-747752d7c9b7/sanity.test_411.dmesg.\$(hostname -s).1517438110.log
Resetting fail_loc on all nodes...CMD: trevis-10vm10,trevis-10vm11,trevis-10vm12,trevis-10vm9.trevis.hpdd.intel.com lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
done.
Since as per my |
| Comment by Bruno Faccini (Inactive) [ 19/Feb/18 ] |
|
+1 on master review for |
| Comment by Yang Sheng [ 23/Feb/18 ] |
|
Hi, Bruno, I think test_411 just intends to verify not hit on LBUG. So we can avoid to check dd whether success or not. Does it necessary? Thanks, |
| Comment by Bob Glossman (Inactive) [ 24/Mar/18 ] |
|
another on master: |
| Comment by Bruno Faccini (Inactive) [ 30/Mar/18 ] |
|
> I think test_411 just intends to verify not hit on LBUG. So we can avoid to check dd whether success or not. YangSheng, yes we may do that, but we could also try to find a way to set the conditions that may cause dd to fail. |
| Comment by Gerrit Updater [ 04/May/18 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/32293 |
| Comment by Chris Horn [ 29/May/18 ] |
|
Just a note: This issue is also seen with Lustre 2.11 on SLES15 RC4 |
| Comment by James A Simmons [ 20/Aug/18 ] |
|
We see a different but related bug with Ubuntu18. |
| Comment by John Hammond [ 05/Sep/18 ] |
|
Logs show that some allocations are failing but dd is succeeding. Perhaps we should weaken the test to just check that we don't crash. |
| Comment by Yang Sheng [ 05/Sep/18 ] |
|
Yes, since the allocation is really depend on situation. So we should avoid to verify dd whether success or not. Anyway, I'll update the patch as this way. |
| Comment by Jian Yu [ 26/Sep/18 ] |
|
+1 on master branch: |
| Comment by Gerrit Updater [ 05/Oct/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32293/ |
| Comment by Peter Jones [ 06/Oct/18 ] |
|
Landed for 2.12 |