Description
sanity-sec test_31 was added by the patch at https://review.whamcloud.com/#/c/32590/ and merged with master on September 10, 2018. So far, the test is either failing or crashing for review-dne-zfs-part-2 only.
Looking at the logs for the failure https://testing.whamcloud.com/test_sets/c7881c1e-b5b7-11e8-8c12-52540065bddc, from the test_log, for every target, we see a problem when tunefs is called
CMD: trevis-5vm8 tunefs.lustre --quiet --writeconf lustre-mdt1/mdt1 trevis-5vm8: trevis-5vm8: tunefs.lustre FATAL: Device lustre-mdt1/mdt1 has not been formatted with mkfs.lustre trevis-5vm8: tunefs.lustre: exiting with 19 (No such device) checking for existing Lustre data: not found
From there, we see a variety of other errors
Started lustre-MDT0003 CMD: trevis-5vm9 lctl get_param -n mdt.lustre-MDT0003.identity_upcall /usr/lib64/lustre/tests/test-framework.sh: line 4452: mdt.lustre-MDT0000.identity_upcall: command not found CMD: trevis-5vm9 lctl set_param -n mdt.lustre-MDT0003.identity_upcall "NONE" CMD: trevis-5vm9 lctl set_param -n mdt/lustre-MDT0003/identity_flush=-1 … CMD: trevis-5vm5.trevis.whamcloud.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l error: get_param: param_path 'mdc/*/connect_flags': No such file or directory jobstats not supported by server disable quota as required CMD: trevis-5vm8 /usr/sbin/lctl list_nids | grep tcp999 Starting client: trevis-5vm5.trevis.whamcloud.com: -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre CMD: trevis-5vm5.trevis.whamcloud.com mkdir -p /mnt/lustre CMD: trevis-5vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre mount.lustre: mount 10.9.5.8@tcp999:/lustre at /mnt/lustre failed: Invalid argument This may have multiple causes. Is 'lustre' the correct filesystem name? Are the mount options correct? Check the syslog for more info. unconfigure: - lnet: errno: -16 descr: "LNet unconfigure error: Device or resource busy" Starting client: trevis-5vm5.trevis.whamcloud.com: -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre CMD: trevis-5vm5.trevis.whamcloud.com mkdir -p /mnt/lustre CMD: trevis-5vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre mount.lustre: mount 10.9.5.8@tcp999:/lustre at /mnt/lustre failed: No such file or directory Is the MGS specification correct? Is the filesystem name correct? If upgrading, is the copied client log valid? (see upgrade docs) sanity-sec test_31: @@@@@@ FAIL: unable to remount client
The following are links to logs for other test session failures for this test
https://testing.whamcloud.com/test_sets/6d51eee0-b54f-11e8-b86b-52540065bddc
https://testing.whamcloud.com/test_sets/a0a5d418-b555-11e8-a7de-52540065bddc
https://testing.whamcloud.com/test_sets/6070a87e-b59f-11e8-8c12-52540065bddc
When sanity-sec test_31 crashes, we see the following in the kernel-crash log
[ 9311.019503] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,network=tcp999 10.2.8.122@tcp999:/lustre /mnt/lustre [ 9311.029516] LustreError: 21790:0:(obd_mount.c:1422:lmd_parse()) LNet Dynamic Peer Discovery is enabled on this node. 'network' mount option cannot be taken into account. [ 9311.031037] LustreError: 21790:0:(obd_mount.c:1520:lmd_parse()) Bad mount options user_xattr,flock,network=tcp999,device=10.2.8.122@tcp999:/lustre [ 9311.032361] LustreError: 21790:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-22) [ 9312.035556] LNet: Removed LNI 10.2.8.119@tcp999 [ 9312.170496] Key type lgssc unregistered [ 9312.171026] Lustre: 21892:0:(gss_mech_switch.c:80:lgss_mech_unregister()) Unregister krb5 mechanism [ 9314.495561] LNet: Removed LNI 10.2.8.119@tcp [ 9314.657567] LNet: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 1 [ 9314.661048] alg: No test for adler32 (adler32-zlib) [ 9315.459156] Lustre: Lustre: Build Version: 2.11.54_104_gd365ea2 [ 9315.529642] LNet: Added LNI 10.2.8.119@tcp [8/256/0/180] [ 9315.530284] LNet: Accept all, port 7988 [ 9315.537592] LNet: Added LNI 10.2.8.119@tcp999 [8/256/0/180] [ 9315.541706] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre [ 9315.550513] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,network=tcp999 10.2.8.122@tcp999:/lustre /mnt/lustre [ 9315.605193] LustreError: 22006:0:(ldlm_lib.c:492:client_obd_setup()) can't add initial connection [ 9315.606173] LustreError: 22006:0:(obd_config.c:559:class_setup()) setup lustre-MDT0000-mdc-ffff8c373b3f5000 failed (-2) [ 9315.607252] LustreError: 22006:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.2.8.122@tcp999: cfg command failed: rc = -2 [ 9315.608409] Lustre: cmd=cf003 0:lustre-MDT0000-mdc 1:lustre-MDT0000_UUID 2:10.2.8.122@tcp [ 9315.609546] LustreError: 108:0:(connection.c:96:ptlrpc_connection_put()) ASSERTION( atomic_read(&conn->c_refcount) > 1 ) failed: [ 9315.609934] LustreError: 15c-8: MGC10.2.8.122@tcp999: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [ 9315.613151] LustreError: 108:0:(connection.c:96:ptlrpc_connection_put()) LBUG [ 9315.613864] Pid: 108, comm: kworker/1:2 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 [ 9315.614783] Call Trace: [ 9315.615088] [<ffffffffc07847cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 9315.615779] [<ffffffffc078487c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 9315.616419] [<ffffffffc0a7aac3>] ptlrpc_connection_put+0x213/0x220 [ptlrpc] [ 9315.617180] [<ffffffffc08b4c15>] obd_zombie_imp_cull+0x65/0x3e0 [obdclass] [ 9315.617705] LustreError: 21994:0:(obd_config.c:610:class_cleanup()) Device 3 not setup [ 9315.617739] Lustre: Unmounted lustre-client [ 9315.619443] [<ffffffffbd8b35ef>] process_one_work+0x17f/0x440 [ 9315.620210] [<ffffffffbd8b4686>] worker_thread+0x126/0x3c0 [ 9315.620798] [<ffffffffbd8bb621>] kthread+0xd1/0xe0 [ 9315.621336] [<ffffffffbdf205f7>] ret_from_fork_nospec_end+0x0/0x39 [ 9315.622164] [<ffffffffffffffff>] 0xffffffffffffffff [ 9315.622720] Kernel panic - not syncing: LBUG [ 9315.623235] CPU: 1 PID: 108 Comm: kworker/1:2 Kdump: loaded Tainted: G OE ------------ 3.10.0-862.9.1.el7.x86_64 #1 [ 9315.624371] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 9315.624956] Workqueue: obd_zombid obd_zombie_imp_cull [obdclass] [ 9315.625577] Call Trace: [ 9315.625859] [<ffffffffbdf0e84e>] dump_stack+0x19/0x1b [ 9315.626383] [<ffffffffbdf08b50>] panic+0xe8/0x21f [ 9315.626868] [<ffffffffc07848cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 9315.627502] [<ffffffffc0a7aac3>] ptlrpc_connection_put+0x213/0x220 [ptlrpc] [ 9315.628222] [<ffffffffc08b4c15>] obd_zombie_imp_cull+0x65/0x3e0 [obdclass] [ 9315.628918] [<ffffffffbd8b35ef>] process_one_work+0x17f/0x440 [ 9315.629498] [<ffffffffbd8b4686>] worker_thread+0x126/0x3c0 [ 9315.630059] [<ffffffffbd8b4560>] ? manage_workers.isra.24+0x2a0/0x2a0 [ 9315.630732] [<ffffffffbd8bb621>] kthread+0xd1/0xe0 [ 9315.631234] [<ffffffffbd8bb550>] ? insert_kthread_work+0x40/0x40 [ 9315.631839] [<ffffffffbdf205f7>] ret_from_fork_nospec_begin+0x21/0x21 [ 9315.632490] [<ffffffffbd8bb550>] ? insert_kthread_work+0x40/0x40
Logs for when sanity-sec test 31 crashes are at
https://testing.whamcloud.com/test_sets/4ec4717a-b5b6-11e8-b86b-52540065bddc
https://testing.whamcloud.com/test_sets/fe8c7708-b569-11e8-a7de-52540065bddc
Attachments
Issue Links
- is related to
-
LU-11057 Client mount option "-o network=net" does not work with LNet dynamic peer discovery
- Resolved