[LU-10361] Ubuntu1404 client sanity test_205: FAIL: No jobstats for id.205.mkdir.19052 found on mds Created: 07/Oct/16  Updated: 17/Mar/20  Resolved: 17/Mar/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: Emoly Liu
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

server: EE3.1 tag-2.7.18.2 build#115 RHEL7.2 ldiskfs
client: EE3.1 tag-2.7.18.2 build#115 Ubuntu14.04


Attachments: File ldev-500.tgz    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

client info

root@onyx-23vm1:/tmp/test_logs/2016-10-06/171222# uname -a
Linux onyx-23vm1.onyx.hpdd.intel.com 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@onyx-23vm1:/tmp/test_logs/2016-10-06/171222# lctl get_param version
version=
lustre: 2.7.18.2
kernel: patchless_client
build:  jenkins-arch=x86_64,build_type=client,distro=ubuntu1404,ib_stack=inkernel-115--PRISTINE-3.19.0-33-generic
root@onyx-23vm1:/tmp/test_logs/2016-10-06/171222# 

test log

== sanity test 205: Verify job stats ============================================
====================== 18:26:39 (1475803599)
Waiting 90 secs for update
Updated after 9s: wanted 'nodelocal' got 'nodelocal'
Registered as changelog user cl5
mdt.lustre-MDT0000.job_cleanup_interval=5
jobid_name=id.205.mkdir.19052
Test: mkdir /mnt/lustre/d205.sanity
Using JobID environment variable nodelocal=id.205.mkdir.19052
onyx-27: error: get_param: *//job_stats: Found no match
onyx-27: error: get_param: *//job_stats: Found no match
 sanity test_205: @@@@@@ FAIL: No jobstats for id.205.mkdir.19052 found on mds:::
*..job_stats
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4805:error_noexit()


 Comments   
Comment by Sarah Liu [ 07/Oct/16 ]

please see attached for logs

Comment by Evan D. Chen (Inactive) [ 07/Oct/16 ]

Emoly, can you take a look of this issue? Thanks!

Comment by Emoly Liu [ 14/Oct/16 ]

This issue was caused by a wrong "$(convert_facet2label $facet)" output, what's why we got "*..job_stats" there.

sarah, I built a ubuntu 3.13.0-32-generic 1404 on my local VM, I can mount a client but can't run any test scripts due to some issues. So can I access your ubuntu env? Or can you reproduce this issue with "sh -x", I want to see what happened to convert_facet2label? Thanks.

Comment by Sarah Liu [ 14/Oct/16 ]

Hello Emoly,

Sorry but I will be out of office next week, so cannot get back to you promptly. The best suggestion I could have is logging on ONYX and set up the env there. Here is an example of what I did to setup the Ubuntu client env, hope this is helpful.

1. provision Ubuntu client by loadjenkins command
    loadjenkinsbuild -p test -d ubuntu1404 -a x86_64 -n xx -r 
2. install the matched kernel which lustre client based on
    apt-get install linux-image-xxx-generic   # I think the kernel version should be "3.19.0-33-generic"
    apt-get install linux-image-extra-xxx-generic # same as above
    reboot with the right kernel
3. download deb files from jenkins and install all of them
4. install pdsh and the dependency
    apt-get -f install
    apt-get -f install pdsh
5. change sh to bash instead of the default dash
    rm /bin/sh
    ln -s /bin/bash /bin/sh
6. make link to lib64 instead of the default lib
   ln -s lib lib64

After finish the above, you should be able to run test scripts.

Comment by Emoly Liu [ 17/Oct/16 ]

sarah, thanks, I will have a try.

Comment by Emoly Liu [ 17/Oct/16 ]

With Sarah's setup steps, I can make my ubuntu client work for me. But I can't reproduce this issue with ubuntu client and centos server.
BTW, the logs showed the following errors after the failure, I suspect some network connect issue happened at that time.

onyx-23vm1: Host key verification failed.
onyx-23vm1: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
onyx-23vm1: rsync error: error in rsync protocol data stream (code 12) at io.c(226) [sender=3.1.0]
onyx-23vm4: Host key verification failed.
onyx-23vm4: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
onyx-23vm4: rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]
onyx-28: Host key verification failed.
onyx-28: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
onyx-28: rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]
onyx-27: Host key verification failed.
onyx-27: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
onyx-27: rsync error: unexplained error (code 255) at io.c(605) [sender=3.0.9]"

So I suggest to close this ticket and reopen it if we hit it again.

Comment by Andreas Dilger [ 09/Dec/17 ]

This is failing when running the command "lctl get_param ..job_stats | grep -c 'job_id.*mkdir'".

At first guess, I'd think that this is caused by the remote shell expansion dropping the "*" or something, but it might relate to a problem with /proc or /sys not having the job_stats file on the Ubuntu client?

Comment by Andreas Dilger [ 17/Mar/20 ]

Closing old bug not seen in a long time.

Generated at Sat Feb 10 02:34:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.