Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1259

Test failure on test suite conf-sanity, subtest test_41a

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0
    • None
    • None
    • 3
    • 4170

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/0908f616-76c9-11e1-ae2e-5254004bbbd3.

      The sub-test test_41a failed with the following error:

      test failed to respond and timed out

      03:07:11:Lustre: DEBUG MARKER: == conf-sanity test 41a: mount mds with --nosvc and --nomgs ========================================== 03:07:01 (1332670021)
      03:07:11:LDISKFS-fs (dm-0): warning: maximal mount count reached, running e2fsck is recommended
      03:07:11:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts:
      03:07:11:LDISKFS-fs (dm-0): warning: maximal mount count reached, running e2fsck is recommended
      03:07:11:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts:
      03:07:11:LustreError: 166-1: MGC10.10.4.72@tcp: Connection to service MGS via nid 0@lo was lost; in progress operations using this service will fail.
      03:07:11:Lustre: 3481:0:(ldlm_lib.c:633:target_handle_reconnect()) MGS: a1661115-9b9b-aae9-6c6f-3eb85b686ad4 reconnecting
      03:07:11:LustreError: 3481:0:(obd_class.h:521:obd_set_info_async()) obd_set_info_async: dev 0 no operation
      03:07:11:LustreError: 3484:0:(ldlm_lock.c:726:ldlm_lock_decref_internal_nolock()) ASSERTION( lock->l_writers > 0 ) failed:
      03:07:11:LustreError: 3484:0:(ldlm_lock.c:726:ldlm_lock_decref_internal_nolock()) LBUG
      03:07:12:Pid: 3484, comm: mgs_lustre_noti
      03:07:12:
      03:07:12:Call Trace:
      03:07:12: [<ffffffffa043a835>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      03:07:12: [<ffffffffa043ad67>] lbug_with_loc+0x47/0xb0 [libcfs]
      03:07:12: [<ffffffffa0687a75>] ldlm_lock_decref_internal_nolock+0x115/0x120 [ptlrpc]
      03:07:12: [<ffffffffa068b370>] ldlm_lock_decref_internal+0x60/0x700 [ptlrpc]
      03:07:12: [<ffffffffa068c21d>] ldlm_lock_decref_and_cancel+0x7d/0x120 [ptlrpc]
      03:07:12: [<ffffffffa0aad49b>] mgs_completion_ast_ir+0xfb/0x110 [mgs]
      03:07:12: [<ffffffffa06a3540>] ldlm_cli_enqueue_local+0x1f0/0x4d0 [ptlrpc]
      03:07:12: [<ffffffffa0aad3a0>] ? mgs_completion_ast_ir+0x0/0x110 [mgs]
      03:07:12: [<ffffffffa06a2670>] ? ldlm_blocking_ast+0x0/0x130 [ptlrpc]
      03:07:12: [<ffffffffa0aad2ac>] mgs_revoke_lock+0x13c/0x230 [mgs]
      03:07:12: [<ffffffffa06a2670>] ? ldlm_blocking_ast+0x0/0x130 [ptlrpc]
      03:07:12: [<ffffffffa0aad3a0>] ? mgs_completion_ast_ir+0x0/0x110 [mgs]
      03:07:12: [<ffffffffa04444f1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
      03:07:12: [<ffffffffa0ac348c>] mgs_ir_notify+0x11c/0x230 [mgs]
      03:07:12: [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
      03:07:13: [<ffffffffa0ac3370>] ? mgs_ir_notify+0x0/0x230 [mgs]
      03:07:13: [<ffffffff8100c14a>] child_rip+0xa/0x20
      03:07:13: [<ffffffffa0ac3370>] ? mgs_ir_notify+0x0/0x230 [mgs]
      03:07:13: [<ffffffffa0ac3370>] ? mgs_ir_notify+0x0/0x230 [mgs]
      03:07:13: [<ffffffff8100c140>] ? child_rip+0x0/0x20
      03:07:13:
      03:07:13:Kernel panic - not syncing: LBUG

      Attachments

        Activity

          [LU-1259] Test failure on test suite conf-sanity, subtest test_41a

          Please reopen ticket if additional work is needed.

          jlevi Jodi Levi (Inactive) added a comment - Please reopen ticket if additional work is needed.
          di.wang Di Wang added a comment -

          https://maloo.whamcloud.com/test_logs/5ed51f1e-f934-11e1-b9a7-52540035b04c/show_text
          https://maloo.whamcloud.com/sub_tests/58136d70-f934-11e1-b9a7-52540035b04c

          07:46:44:Lustre: DEBUG MARKER: == conf-sanity test 41b: mount mds with --nosvc and --nomgs on first mount =========================== 07:46:24 (1347029184)
          07:46:45:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
          07:46:45:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
          07:46:56:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
          07:46:56:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
          07:47:08:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
          07:47:08:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
          07:47:08:Lustre: DEBUG MARKER: /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre --mountfsoptions=errors=remount-ro,iopen_nopriv,user_xattr,acl --param sys.timeout=20 --device-size=2097152 --mkfsoptions="-E lazy_itable_init" --backfstype ldiskfs --reformat /dev/lvm-MDS/P1
          07:47:08:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts:
          07:47:30:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
          07:47:30:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl -o nosvc -n /dev/lvm-MDS/P1 /mnt/mds1
          07:47:30:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts:
          07:47:30:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts:
          07:47:30:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust
          07:47:30:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1
          07:47:30:Lustre: MGS: Regenerating lustre-OSTffff log by user request.
          07:47:30:Lustre: Skipped 1 previous similar message
          07:47:30:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
          07:47:41:Lustre: 27767:0:(mgc_request.c:1518:mgc_process_recover_log()) Process recover log lustre-mdtir error -22
          07:47:41:Lustre: 27767:0:(mgc_request.c:1518:mgc_process_recover_log()) Skipped 7 previous similar messages
          07:47:41:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl -o nomgs,force /dev/lvm-MDS/P1 /mnt/mds1
          07:47:42:LustreError: 166-1: MGC10.10.4.186@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail
          07:47:42:Lustre: MGS: Client 922ea2e1-d90d-e20d-a899-485cc5ff5d21 (at 0@lo) reconnecting
          07:47:42:LustreError: 27764:0:(obd_class.h:525:obd_set_info_async()) obd_set_info_async: dev 0 no operation
          07:47:42:LustreError: 27765:0:(ldlm_lock.c:833:ldlm_lock_decref_and_cancel()) ASSERTION( lock != ((void *)0) ) failed:
          07:47:42:LustreError: 27765:0:(ldlm_lock.c:833:ldlm_lock_decref_and_cancel()) LBUG
          07:47:42:Pid: 27765, comm: ll_mgs_02
          07:47:42:
          07:47:42:Call Trace:
          07:47:42: [<ffffffffa043a905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
          07:47:42: [<ffffffffa043af17>] lbug_with_loc+0x47/0xb0 [libcfs]
          07:47:42: [<ffffffffa06cb0dc>] ldlm_lock_decref_and_cancel+0x14c/0x150 [ptlrpc]
          07:47:42: [<ffffffffa0ad9685>] mgs_completion_ast_config+0x135/0x140 [mgs]
          07:47:42: [<ffffffffa06e64a6>] ldlm_cli_enqueue_local+0x1e6/0x560 [ptlrpc]
          07:47:42: [<ffffffffa0ad9550>] ? mgs_completion_ast_config+0x0/0x140 [mgs]
          07:47:42: [<ffffffffa06e5480>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
          07:47:42: [<ffffffffa0ad85bf>] mgs_revoke_lock+0x12f/0x290 [mgs]
          07:47:42: [<ffffffffa06e5480>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
          07:47:42: [<ffffffffa0ad9550>] ? mgs_completion_ast_config+0x0/0x140 [mgs]
          07:47:42: [<ffffffffa0ad8e64>] mgs_handle_target_reg+0x744/0xcc0 [mgs]
          07:47:42: [<ffffffffa0adbe16>] mgs_handle+0xa16/0x1190 [mgs]
          07:47:42: [<ffffffffa044a241>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
          07:47:42: [<ffffffffa071d782>] ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc]
          07:47:42: [<ffffffffa043b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs]
          07:47:42: [<ffffffffa044bd9f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs]
          07:47:42: [<ffffffffa07165e2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc]
          07:47:42: [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
          07:47:42: [<ffffffffa071e9f7>] ptlrpc_main+0x7d7/0x1610 [ptlrpc]
          07:47:42: [<ffffffffa071e220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
          07:47:42: [<ffffffff8100c14a>] child_rip+0xa/0x20
          07:47:42: [<ffffffffa071e220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
          07:47:42: [<ffffffffa071e220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc]
          07:47:42: [<ffffffff8100c140>] ? child_rip+0x0/0x20

          Seems different problem? or same?

          di.wang Di Wang added a comment - https://maloo.whamcloud.com/test_logs/5ed51f1e-f934-11e1-b9a7-52540035b04c/show_text https://maloo.whamcloud.com/sub_tests/58136d70-f934-11e1-b9a7-52540035b04c 07:46:44:Lustre: DEBUG MARKER: == conf-sanity test 41b: mount mds with --nosvc and --nomgs on first mount =========================== 07:46:24 (1347029184) 07:46:45:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 07:46:45:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 07:46:56:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 07:46:56:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 07:47:08:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 07:47:08:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 07:47:08:Lustre: DEBUG MARKER: /usr/sbin/mkfs.lustre --mgs --mdt --fsname=lustre --mountfsoptions=errors=remount-ro,iopen_nopriv,user_xattr,acl --param sys.timeout=20 --device-size=2097152 --mkfsoptions="-E lazy_itable_init" --backfstype ldiskfs --reformat /dev/lvm-MDS/P1 07:47:08:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: 07:47:30:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1 07:47:30:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl -o nosvc -n /dev/lvm-MDS/P1 /mnt/mds1 07:47:30:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: 07:47:30:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: 07:47:30:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lust 07:47:30:Lustre: DEBUG MARKER: e2label /dev/lvm-MDS/P1 07:47:30:Lustre: MGS: Regenerating lustre-OSTffff log by user request. 07:47:30:Lustre: Skipped 1 previous similar message 07:47:30:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1 07:47:41:Lustre: 27767:0:(mgc_request.c:1518:mgc_process_recover_log()) Process recover log lustre-mdtir error -22 07:47:41:Lustre: 27767:0:(mgc_request.c:1518:mgc_process_recover_log()) Skipped 7 previous similar messages 07:47:41:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre -o user_xattr,acl -o nomgs,force /dev/lvm-MDS/P1 /mnt/mds1 07:47:42:LustreError: 166-1: MGC10.10.4.186@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail 07:47:42:Lustre: MGS: Client 922ea2e1-d90d-e20d-a899-485cc5ff5d21 (at 0@lo) reconnecting 07:47:42:LustreError: 27764:0:(obd_class.h:525:obd_set_info_async()) obd_set_info_async: dev 0 no operation 07:47:42:LustreError: 27765:0:(ldlm_lock.c:833:ldlm_lock_decref_and_cancel()) ASSERTION( lock != ((void *)0) ) failed: 07:47:42:LustreError: 27765:0:(ldlm_lock.c:833:ldlm_lock_decref_and_cancel()) LBUG 07:47:42:Pid: 27765, comm: ll_mgs_02 07:47:42: 07:47:42:Call Trace: 07:47:42: [<ffffffffa043a905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 07:47:42: [<ffffffffa043af17>] lbug_with_loc+0x47/0xb0 [libcfs] 07:47:42: [<ffffffffa06cb0dc>] ldlm_lock_decref_and_cancel+0x14c/0x150 [ptlrpc] 07:47:42: [<ffffffffa0ad9685>] mgs_completion_ast_config+0x135/0x140 [mgs] 07:47:42: [<ffffffffa06e64a6>] ldlm_cli_enqueue_local+0x1e6/0x560 [ptlrpc] 07:47:42: [<ffffffffa0ad9550>] ? mgs_completion_ast_config+0x0/0x140 [mgs] 07:47:42: [<ffffffffa06e5480>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] 07:47:42: [<ffffffffa0ad85bf>] mgs_revoke_lock+0x12f/0x290 [mgs] 07:47:42: [<ffffffffa06e5480>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] 07:47:42: [<ffffffffa0ad9550>] ? mgs_completion_ast_config+0x0/0x140 [mgs] 07:47:42: [<ffffffffa0ad8e64>] mgs_handle_target_reg+0x744/0xcc0 [mgs] 07:47:42: [<ffffffffa0adbe16>] mgs_handle+0xa16/0x1190 [mgs] 07:47:42: [<ffffffffa044a241>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 07:47:42: [<ffffffffa071d782>] ptlrpc_server_handle_request+0x412/0xeb0 [ptlrpc] 07:47:42: [<ffffffffa043b65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] 07:47:42: [<ffffffffa044bd9f>] ? lc_watchdog_touch+0x6f/0x180 [libcfs] 07:47:42: [<ffffffffa07165e2>] ? ptlrpc_wait_event+0xb2/0x2c0 [ptlrpc] 07:47:42: [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20 07:47:42: [<ffffffffa071e9f7>] ptlrpc_main+0x7d7/0x1610 [ptlrpc] 07:47:42: [<ffffffffa071e220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] 07:47:42: [<ffffffff8100c14a>] child_rip+0xa/0x20 07:47:42: [<ffffffffa071e220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] 07:47:42: [<ffffffffa071e220>] ? ptlrpc_main+0x0/0x1610 [ptlrpc] 07:47:42: [<ffffffff8100c140>] ? child_rip+0x0/0x20 Seems different problem? or same?
          jay Jinshan Xiong (Inactive) added a comment - patch is at: http://review.whamcloud.com/#change,2390

          People

            jay Jinshan Xiong (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: