[LU-9893] replay-single test_70c: test failed to respond and timed out Created: 18/Aug/17 Updated: 24/Aug/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.1, Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Casper | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Trevis2, failover |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
https://testing.hpdd.intel.com/test_sessions/9b7c7e8e-7b5a-4f4d-af09-400c586a8340 Looks like another mds hang on umount issue for this build. It may be related to these tickets:
However, this one does not show this message in the MDS console log: BUG: unable to handle kernel NULL pointer dereference at (null) What this shares in common with the above three is they all have an mds umount at the end of the suite_log. No further activity is seen. From suite_log: test_70c fail mds1 1 times Failing mds1 on trevis-41vm3 CMD: trevis-41vm3 grep -c /mnt/lustre-mds1' ' /proc/mounts Stopping /mnt/lustre-mds1 (opts:) on trevis-41vm3 CMD: trevis-41vm3 umount -d /mnt/lustre-mds1 (end of log) |
| Comments |
| Comment by John Hammond [ 23/Aug/17 ] |
|
Jim, can you grab /kdumproot/scratch//dumps/trevis-41vm3.trevis.hpdd.intel.com/10.9.5.239-2017-08-04-12:28:28/vmcore-dmesg.txt. See https://testing.hpdd.intel.com/test_logs/f0c4415a-799c-11e7-8e1f-5254006e85c2/show_text |
| Comment by James Casper [ 23/Aug/17 ] |
|
Looks like that directory no longer exists: [root@trevis-41 trevis-41vm3.trevis.hpdd.intel.com]# pwd |
| Comment by James Nunez (Inactive) [ 23/Aug/17 ] |
|
I looked in Maloo for all replay-single test 70c timeouts (hangs) this year. I found 14 occurrences of this test hanging, but none of them are hanging on umount. If we see this issue again, we need to look for the vmcore-dmesg.txt file as early as possible. |
| Comment by John Hammond [ 24/Aug/17 ] |
|
James, maybe try something like find /scratch/dumps -name vmcore-dmesg.txt -exec grep --with-filename test_70c {} \;
to find other instances of this crash. |
| Comment by Andreas Dilger [ 26/Jan/22 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/079de990-f2fe-47b8-a70f-cb455e084ec8 |
| Comment by Qian Yingjin [ 24/Aug/22 ] |
|
+1 on master: https://testing.whamcloud.com/test_sets/102cd218-cda2-4092-b4f1-991fa8aeda2e |