[LU-6956] multiple test failures on el6.7 Created: 04/Aug/15 Updated: 10/Oct/21 Resolved: 10/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Bob Glossman (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
I have now seen the exact same test fails repeat on more than a singe test run on el6.7 client/server. similar fails aren't seen in similar runs on el6.6. This suggests that the root cause is in the kernel update, but I can't see it. https://testing.hpdd.intel.com/test_sessions/0f3acfdc-3a25-11e5-8f15-5254006e85c2 failed tests include |
| Comments |
| Comment by Bob Glossman (Inactive) [ 06/Aug/15 ] |
|
on yet another retry the failures in sanity did not reproduce, but some of the fails in sanity-sec did. https://testing.hpdd.intel.com/test_sessions/cc30db56-3c30-11e5-af2f-5254006e85c2 |
| Comment by Andreas Dilger [ 07/Aug/15 ] |
|
Bob, is this failure always related to the nodemap feature,or only in the sanity-sec tests? |
| Comment by Andreas Dilger [ 07/Aug/15 ] |
|
Also, is the problem related to the 6.7 client or the 6.7 server? |
| Comment by Bob Glossman (Inactive) [ 07/Aug/15 ] |
|
Andreas., the 'Error: 'adding fops nodemaps failed 1' is seen on the sanity-sec fails, not on other fails. Not clear to me if they are due to client or servers. all 6.7 runs reported have 6.7 on both client and server. sometimes the error text is Error: 'nodemap_add failed with 1' |
| Comment by Kit Westneat [ 07/Aug/15 ] |
|
Did something change with proc fs? They all seem to fail after lctl get_param. |
| Comment by Yang Sheng [ 10/Aug/15 ] |
|
Most tests failed relate to proc fs. There haven't a significant difference between 6.6 & 6.7 fs codes. But i suspect some scheduler changes maybe impact proc fs behaviors. For example, the sanity-sec test_7 failure looks like read a proc entry after create. maybe some create action still not finished while read coming. This test case just operating on MGS node locally, so haven't any network & lock been involved. I have submit a debug patch(http://review.whamcloud.com/15916), hope hit a instance. |
| Comment by Bob Glossman (Inactive) [ 11/Aug/15 ] |
|
there's now at least 1 counterexample. Here is a test run with el6.7 on master with no fails in either sanity or sanity-sec; https://testing.hpdd.intel.com/test_sessions/e835d3fe-4074-11e5-8abf-5254006e85c2 As far as I know there were no significant changes from all those repeated fails and this one good run. |
| Comment by Brian Murrell (Inactive) [ 14/Aug/15 ] |
|
bogl: And now there is this errata kernel. Maybe you will get lucky and it will resolve your issues here. |
| Comment by Bob Glossman (Inactive) [ 14/Aug/15 ] |
|
Brian: thanks, I'm on it. see latest comment in |