[LU-10361] Ubuntu1404 client sanity test_205: FAIL: No jobstats for id.205.mkdir.19052 found on mds Created: 07/Oct/16 Updated: 17/Mar/20 Resolved: 17/Mar/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Emoly Liu |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server: EE3.1 tag-2.7.18.2 build#115 RHEL7.2 ldiskfs |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
client info root@onyx-23vm1:/tmp/test_logs/2016-10-06/171222# uname -a Linux onyx-23vm1.onyx.hpdd.intel.com 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux root@onyx-23vm1:/tmp/test_logs/2016-10-06/171222# lctl get_param version version= lustre: 2.7.18.2 kernel: patchless_client build: jenkins-arch=x86_64,build_type=client,distro=ubuntu1404,ib_stack=inkernel-115--PRISTINE-3.19.0-33-generic root@onyx-23vm1:/tmp/test_logs/2016-10-06/171222# test log == sanity test 205: Verify job stats ============================================ ====================== 18:26:39 (1475803599) Waiting 90 secs for update Updated after 9s: wanted 'nodelocal' got 'nodelocal' Registered as changelog user cl5 mdt.lustre-MDT0000.job_cleanup_interval=5 jobid_name=id.205.mkdir.19052 Test: mkdir /mnt/lustre/d205.sanity Using JobID environment variable nodelocal=id.205.mkdir.19052 onyx-27: error: get_param: *//job_stats: Found no match onyx-27: error: get_param: *//job_stats: Found no match sanity test_205: @@@@@@ FAIL: No jobstats for id.205.mkdir.19052 found on mds::: *..job_stats Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4805:error_noexit() |
| Comments |
| Comment by Sarah Liu [ 07/Oct/16 ] |
|
please see attached for logs |
| Comment by Evan D. Chen (Inactive) [ 07/Oct/16 ] |
|
Emoly, can you take a look of this issue? Thanks! |
| Comment by Emoly Liu [ 14/Oct/16 ] |
|
This issue was caused by a wrong "$(convert_facet2label $facet)" output, what's why we got "*..job_stats" there. sarah, I built a ubuntu 3.13.0-32-generic 1404 on my local VM, I can mount a client but can't run any test scripts due to some issues. So can I access your ubuntu env? Or can you reproduce this issue with "sh -x", I want to see what happened to convert_facet2label? Thanks. |
| Comment by Sarah Liu [ 14/Oct/16 ] |
|
Hello Emoly, Sorry but I will be out of office next week, so cannot get back to you promptly. The best suggestion I could have is logging on ONYX and set up the env there. Here is an example of what I did to setup the Ubuntu client env, hope this is helpful. 1. provision Ubuntu client by loadjenkins command
loadjenkinsbuild -p test -d ubuntu1404 -a x86_64 -n xx -r
2. install the matched kernel which lustre client based on
apt-get install linux-image-xxx-generic # I think the kernel version should be "3.19.0-33-generic"
apt-get install linux-image-extra-xxx-generic # same as above
reboot with the right kernel
3. download deb files from jenkins and install all of them
4. install pdsh and the dependency
apt-get -f install
apt-get -f install pdsh
5. change sh to bash instead of the default dash
rm /bin/sh
ln -s /bin/bash /bin/sh
6. make link to lib64 instead of the default lib
ln -s lib lib64
After finish the above, you should be able to run test scripts. |
| Comment by Emoly Liu [ 17/Oct/16 ] |
|
sarah, thanks, I will have a try. |
| Comment by Emoly Liu [ 17/Oct/16 ] |
|
With Sarah's setup steps, I can make my ubuntu client work for me. But I can't reproduce this issue with ubuntu client and centos server. onyx-23vm1: Host key verification failed. onyx-23vm1: rsync: connection unexpectedly closed (0 bytes received so far) [sender] onyx-23vm1: rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.0] onyx-23vm4: Host key verification failed. onyx-23vm4: rsync: connection unexpectedly closed (0 bytes received so far) [sender] onyx-23vm4: rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9] onyx-28: Host key verification failed. onyx-28: rsync: connection unexpectedly closed (0 bytes received so far) [sender] onyx-28: rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9] onyx-27: Host key verification failed. onyx-27: rsync: connection unexpectedly closed (0 bytes received so far) [sender] onyx-27: rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]" So I suggest to close this ticket and reopen it if we hit it again. |
| Comment by Andreas Dilger [ 09/Dec/17 ] |
|
This is failing when running the command "lctl get_param ..job_stats | grep -c 'job_id.*mkdir'". At first guess, I'd think that this is caused by the remote shell expansion dropping the "*" or something, but it might relate to a problem with /proc or /sys not having the job_stats file on the Ubuntu client? |
| Comment by Andreas Dilger [ 17/Mar/20 ] |
|
Closing old bug not seen in a long time. |