[LU-15774] Rolling upgrade 2.12.8 -> 2.15 fails, sanity : @@@@@@ FAIL: unable to write to /mnt/lustre/d0_runas_test as UID 500 Created: 21/Apr/22 Updated: 22/Apr/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Cliff White (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Trevis test cluster. |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Upgrading from 2.12.8 build 150 to 2.15 build 4283, OSS upgrade works fine. MDS upgrade fails sanity.sh run with this error: [94020.104051] Lustre: DEBUG MARKER: trevis-86vm1.trevis.whamcloud.com: executing check_config_client /mnt/lustre [94022.084620] Lustre: DEBUG MARKER: Using TIMEOUT=100 [94023.414760] Lustre: DEBUG MARKER: sanity : @@@@@@ FAIL: unable to write to /mnt/lustre/d0_runas_test as UID 500. Appears to be a test script issue. looks very much like
[root@trevis-86vm1 205355]# pdsh -w trevis-86vm[1-3] "grep 500 /etc/passwd" trevis-86vm1: sanityusr:x:500:500::/mnt/lustre:/bin/bash trevis-86vm3: sanityusr:x:500:500::/mnt/lustre:/bin/bash |
| Comments |
| Comment by Andreas Dilger [ 22/Apr/22 ] |
|
This looks more like a test environment (ljb) issue than a Lustre or test issue? Should probably be moved to ATM? |
| Comment by Charlie Olmstead [ 22/Apr/22 ] |
|
I looked at trevis-86vm2, chef failed to complete so the node was not ready for testing. This is an issue with how loadjenkinsbuild works; it kicks off the installation and then exits. It is then up to the user to determine if the node is ready which is more than just checking if the node is ssh-able. This is a reason to switch over to ljb which takes over that responsibility. It waits for the OS installation to complete, verifies chef has completed and installs Lustre packages, kernel, etc. Once ljb exits (with 0), then the node is ready. * directory[/mnt/lustre] action create ================================================================================
Error executing action `create` on resource 'directory[/mnt/lustre]'
================================================================================ Errno::EROFS
------------
Read-only file system @ apply2files - /mnt/lustre Resource Declaration:
---------------------
# In /var/tmp/chef-client/roles/lib/config_helper.rb 50: directory d do
51: mode mode
52: owner own
53: group group || own
54: end
55: } Compiled Resource:
------------------
# Declared in /var/tmp/chef-client/roles/lib/config_helper.rb:50:in `block in mkdir' directory("/mnt/lustre") do
action [:create]
default_guard_interpreter :default
declared_type :directory
cookbook_name "test_node"
recipe_name "default"
mode "0777"
owner "root"
group "root"
path "/mnt/lustre"
end System Info:
------------
chef_version=16.17.18
platform=centos
platform_version=7.9.2009
ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
program_name=/usr/bin/chef-client
ruby=ruby 2.7.5p203 (2021-11-24 revision f69aeb8314) [x86_64-linux]
program_name=/usr/bin/chef-client
executable=/opt/chef/bin/chef-client
|