Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11363

sanity-sec test 31 fails with 'unable to remount client'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.0
    • DNE/ZFS
    • 3
    • 9223372036854775807

    Description

      sanity-sec test_31 was added by the patch at https://review.whamcloud.com/#/c/32590/ and merged with master on September 10, 2018. So far, the test is either failing or crashing for review-dne-zfs-part-2 only.

      Looking at the logs for the failure https://testing.whamcloud.com/test_sets/c7881c1e-b5b7-11e8-8c12-52540065bddc, from the test_log, for every target, we see a problem when tunefs is called

      CMD: trevis-5vm8 tunefs.lustre --quiet --writeconf lustre-mdt1/mdt1
      trevis-5vm8: 
      trevis-5vm8: tunefs.lustre FATAL: Device lustre-mdt1/mdt1 has not been formatted with mkfs.lustre
      trevis-5vm8: tunefs.lustre: exiting with 19 (No such device)
      checking for existing Lustre data: not found
      

      From there, we see a variety of other errors

      Started lustre-MDT0003
      CMD: trevis-5vm9 lctl get_param -n mdt.lustre-MDT0003.identity_upcall
      /usr/lib64/lustre/tests/test-framework.sh: line 4452: mdt.lustre-MDT0000.identity_upcall: command not found
      CMD: trevis-5vm9 lctl set_param -n mdt.lustre-MDT0003.identity_upcall "NONE"
      CMD: trevis-5vm9 lctl set_param -n mdt/lustre-MDT0003/identity_flush=-1
      …
      CMD: trevis-5vm5.trevis.whamcloud.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
      error: get_param: param_path 'mdc/*/connect_flags': No such file or directory
      jobstats not supported by server
      disable quota as required
      CMD: trevis-5vm8 /usr/sbin/lctl list_nids | grep tcp999
      Starting client: trevis-5vm5.trevis.whamcloud.com:  -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre
      CMD: trevis-5vm5.trevis.whamcloud.com mkdir -p /mnt/lustre
      CMD: trevis-5vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre
      mount.lustre: mount 10.9.5.8@tcp999:/lustre at /mnt/lustre failed: Invalid argument
      This may have multiple causes.
      Is 'lustre' the correct filesystem name?
      Are the mount options correct?
      Check the syslog for more info.
      unconfigure:
          - lnet:
                errno: -16
                descr: "LNet unconfigure error: Device or resource busy"
      Starting client: trevis-5vm5.trevis.whamcloud.com:  -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre
      CMD: trevis-5vm5.trevis.whamcloud.com mkdir -p /mnt/lustre
      CMD: trevis-5vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock,network=tcp999 10.9.5.8@tcp999:/lustre /mnt/lustre
      mount.lustre: mount 10.9.5.8@tcp999:/lustre at /mnt/lustre failed: No such file or directory
      Is the MGS specification correct?
      Is the filesystem name correct?
      If upgrading, is the copied client log valid? (see upgrade docs)
       sanity-sec test_31: @@@@@@ FAIL: unable to remount client 
      

      The following are links to logs for other test session failures for this test
      https://testing.whamcloud.com/test_sets/6d51eee0-b54f-11e8-b86b-52540065bddc
      https://testing.whamcloud.com/test_sets/a0a5d418-b555-11e8-a7de-52540065bddc
      https://testing.whamcloud.com/test_sets/6070a87e-b59f-11e8-8c12-52540065bddc

      When sanity-sec test_31 crashes, we see the following in the kernel-crash log

      [ 9311.019503] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,network=tcp999 10.2.8.122@tcp999:/lustre /mnt/lustre
      [ 9311.029516] LustreError: 21790:0:(obd_mount.c:1422:lmd_parse()) LNet Dynamic Peer Discovery is enabled on this node. 'network' mount option cannot be taken into account.
      [ 9311.031037] LustreError: 21790:0:(obd_mount.c:1520:lmd_parse()) Bad mount options user_xattr,flock,network=tcp999,device=10.2.8.122@tcp999:/lustre
      [ 9311.032361] LustreError: 21790:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-22)
      [ 9312.035556] LNet: Removed LNI 10.2.8.119@tcp999
      [ 9312.170496] Key type lgssc unregistered
      [ 9312.171026] Lustre: 21892:0:(gss_mech_switch.c:80:lgss_mech_unregister()) Unregister krb5 mechanism
      [ 9314.495561] LNet: Removed LNI 10.2.8.119@tcp
      [ 9314.657567] LNet: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 1
      [ 9314.661048] alg: No test for adler32 (adler32-zlib)
      [ 9315.459156] Lustre: Lustre: Build Version: 2.11.54_104_gd365ea2
      [ 9315.529642] LNet: Added LNI 10.2.8.119@tcp [8/256/0/180]
      [ 9315.530284] LNet: Accept all, port 7988
      [ 9315.537592] LNet: Added LNI 10.2.8.119@tcp999 [8/256/0/180]
      [ 9315.541706] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre
      [ 9315.550513] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock,network=tcp999 10.2.8.122@tcp999:/lustre /mnt/lustre
      [ 9315.605193] LustreError: 22006:0:(ldlm_lib.c:492:client_obd_setup()) can't add initial connection
      [ 9315.606173] LustreError: 22006:0:(obd_config.c:559:class_setup()) setup lustre-MDT0000-mdc-ffff8c373b3f5000 failed (-2)
      [ 9315.607252] LustreError: 22006:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.2.8.122@tcp999: cfg command failed: rc = -2
      [ 9315.608409] Lustre:    cmd=cf003 0:lustre-MDT0000-mdc  1:lustre-MDT0000_UUID  2:10.2.8.122@tcp  
      
      [ 9315.609546] LustreError: 108:0:(connection.c:96:ptlrpc_connection_put()) ASSERTION( atomic_read(&conn->c_refcount) > 1 ) failed: 
      [ 9315.609934] LustreError: 15c-8: MGC10.2.8.122@tcp999: The configuration from log 'lustre-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      [ 9315.613151] LustreError: 108:0:(connection.c:96:ptlrpc_connection_put()) LBUG
      [ 9315.613864] Pid: 108, comm: kworker/1:2 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018
      [ 9315.614783] Call Trace:
      [ 9315.615088]  [<ffffffffc07847cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 9315.615779]  [<ffffffffc078487c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 9315.616419]  [<ffffffffc0a7aac3>] ptlrpc_connection_put+0x213/0x220 [ptlrpc]
      [ 9315.617180]  [<ffffffffc08b4c15>] obd_zombie_imp_cull+0x65/0x3e0 [obdclass]
      [ 9315.617705] LustreError: 21994:0:(obd_config.c:610:class_cleanup()) Device 3 not setup
      [ 9315.617739] Lustre: Unmounted lustre-client
      [ 9315.619443]  [<ffffffffbd8b35ef>] process_one_work+0x17f/0x440
      [ 9315.620210]  [<ffffffffbd8b4686>] worker_thread+0x126/0x3c0
      [ 9315.620798]  [<ffffffffbd8bb621>] kthread+0xd1/0xe0
      [ 9315.621336]  [<ffffffffbdf205f7>] ret_from_fork_nospec_end+0x0/0x39
      [ 9315.622164]  [<ffffffffffffffff>] 0xffffffffffffffff
      [ 9315.622720] Kernel panic - not syncing: LBUG
      [ 9315.623235] CPU: 1 PID: 108 Comm: kworker/1:2 Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.9.1.el7.x86_64 #1
      [ 9315.624371] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 9315.624956] Workqueue: obd_zombid obd_zombie_imp_cull [obdclass]
      [ 9315.625577] Call Trace:
      [ 9315.625859]  [<ffffffffbdf0e84e>] dump_stack+0x19/0x1b
      [ 9315.626383]  [<ffffffffbdf08b50>] panic+0xe8/0x21f
      [ 9315.626868]  [<ffffffffc07848cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [ 9315.627502]  [<ffffffffc0a7aac3>] ptlrpc_connection_put+0x213/0x220 [ptlrpc]
      [ 9315.628222]  [<ffffffffc08b4c15>] obd_zombie_imp_cull+0x65/0x3e0 [obdclass]
      [ 9315.628918]  [<ffffffffbd8b35ef>] process_one_work+0x17f/0x440
      [ 9315.629498]  [<ffffffffbd8b4686>] worker_thread+0x126/0x3c0
      [ 9315.630059]  [<ffffffffbd8b4560>] ? manage_workers.isra.24+0x2a0/0x2a0
      [ 9315.630732]  [<ffffffffbd8bb621>] kthread+0xd1/0xe0
      [ 9315.631234]  [<ffffffffbd8bb550>] ? insert_kthread_work+0x40/0x40
      [ 9315.631839]  [<ffffffffbdf205f7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 9315.632490]  [<ffffffffbd8bb550>] ? insert_kthread_work+0x40/0x40
      

      Logs for when sanity-sec test 31 crashes are at
      https://testing.whamcloud.com/test_sets/4ec4717a-b5b6-11e8-b86b-52540065bddc
      https://testing.whamcloud.com/test_sets/fe8c7708-b569-11e8-a7de-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: