[LU-6956] multiple test failures on el6.7 Created: 04/Aug/15  Updated: 10/Oct/21  Resolved: 10/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Bob Glossman (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None

Issue Links:
Related
is related to LU-6894 Kernel update for RHEL6.7 [2.6.32-573... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I have now seen the exact same test fails repeat on more than a singe test run on el6.7 client/server. similar fails aren't seen in similar runs on el6.6. This suggests that the root cause is in the kernel update, but I can't see it.

https://testing.hpdd.intel.com/test_sessions/0f3acfdc-3a25-11e5-8f15-5254006e85c2
https://testing.hpdd.intel.com/test_sessions/8d2ae154-3ae1-11e5-9384-5254006e85c2

failed tests include
sanity: 77i, 120a, 127a, 133c, 151, 154f, 156, 224c
sanity-sec: 7-24 all Error: 'adding fops nodemaps failed 1'



 Comments   
Comment by Bob Glossman (Inactive) [ 06/Aug/15 ]

on yet another retry the failures in sanity did not reproduce, but some of the fails in sanity-sec did.

https://testing.hpdd.intel.com/test_sessions/cc30db56-3c30-11e5-af2f-5254006e85c2

Comment by Andreas Dilger [ 07/Aug/15 ]

Bob, is this failure always related to the nodemap feature,or only in the sanity-sec tests?

Comment by Andreas Dilger [ 07/Aug/15 ]

Also, is the problem related to the 6.7 client or the 6.7 server?

Comment by Bob Glossman (Inactive) [ 07/Aug/15 ]

Andreas., the 'Error: 'adding fops nodemaps failed 1' is seen on the sanity-sec fails, not on other fails. Not clear to me if they are due to client or servers. all 6.7 runs reported have 6.7 on both client and server.

sometimes the error text is Error: 'nodemap_add failed with 1'
always in sanity-sec

Comment by Kit Westneat [ 07/Aug/15 ]

Did something change with proc fs? They all seem to fail after lctl get_param.

Comment by Yang Sheng [ 10/Aug/15 ]

Most tests failed relate to proc fs. There haven't a significant difference between 6.6 & 6.7 fs codes. But i suspect some scheduler changes maybe impact proc fs behaviors. For example, the sanity-sec test_7 failure looks like read a proc entry after create. maybe some create action still not finished while read coming. This test case just operating on MGS node locally, so haven't any network & lock been involved. I have submit a debug patch(http://review.whamcloud.com/15916), hope hit a instance.

Comment by Bob Glossman (Inactive) [ 11/Aug/15 ]

there's now at least 1 counterexample. Here is a test run with el6.7 on master with no fails in either sanity or sanity-sec;

https://testing.hpdd.intel.com/test_sessions/e835d3fe-4074-11e5-8abf-5254006e85c2

As far as I know there were no significant changes from all those repeated fails and this one good run.

Comment by Brian Murrell (Inactive) [ 14/Aug/15 ]

bogl: And now there is this errata kernel. Maybe you will get lucky and it will resolve your issues here.

Comment by Bob Glossman (Inactive) [ 14/Aug/15 ]

Brian: thanks, I'm on it. see latest comment in LU-6894. The new kernel arrived in Centos late yesterday. revised 6.7 mods in flight.

Generated at Sat Feb 10 02:04:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.