Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.7.0
    • 3
    • 15545

    Description

      This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

      This issue relates to the following test suite run:
      https://testing.hpdd.intel.com/test_sets/4e98188c-1fe2-11e4-8610-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/6798b742-32d9-11e4-aefc-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/a8565b52-3286-11e4-aefc-5254006e85c2

      The sub-test test_41c failed with the following error:

      test failed to respond and timed out

      Info required for matching: conf-sanity 41c

      Attachments

        Issue Links

          Activity

            [LU-5573] Test timeout conf-sanity test_41c

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12353/
            Subject: LU-5573 obdclass: strengthen against concurrent server mounts
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: 684a7db576eb03ec2c74c89dabcef7991010ee11

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12353/ Subject: LU-5573 obdclass: strengthen against concurrent server mounts Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: 684a7db576eb03ec2c74c89dabcef7991010ee11
            yong.fan nasf (Inactive) added a comment - What plan for the http://review.whamcloud.com/12353 (b2_5)? We hit the failure on b2_5: https://testing.hpdd.intel.com/test_sets/024235d2-683d-11e4-a449-5254006e85c2
            pjones Peter Jones added a comment -

            ok thanks Bruno

            pjones Peter Jones added a comment - ok thanks Bruno
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Yes Peter, I think this ticket can be closed

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Yes Peter, I think this ticket can be closed
            pjones Peter Jones added a comment -

            This fix has landed on master. Is that all that is needed to correct this issue for 2.7?

            pjones Peter Jones added a comment - This fix has landed on master. Is that all that is needed to correct this issue for 2.7?
            yujian Jian Yu added a comment -

            Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12353

            yujian Jian Yu added a comment - Here is the back-ported patch for Lustre b2_5 branch: http://review.whamcloud.com/12353

            Li Wei is verifying this fix.

            jlevi Jodi Levi (Inactive) added a comment - Li Wei is verifying this fix.
            yujian Jian Yu added a comment - One more instance on master branch: https://testing.hpdd.intel.com/test_sets/d687f1f4-4e81-11e4-ae94-5254006e85c2

            Master patch, to strengthen previous change for LU-5299 against the same issues/races during Server devices concurrent mounts/starts, is at http://review.whamcloud.com/12114.

            bfaccini Bruno Faccini (Inactive) added a comment - Master patch, to strengthen previous change for LU-5299 against the same issues/races during Server devices concurrent mounts/starts, is at http://review.whamcloud.com/12114 .

            I checked conf-sanity/test_41c failures since my patch has landed and found about 15 failures in about a month with less frequent MDS LBUG during concurrent MDT mounts/starts in test_41c that always looks like :

            19:58:50:Lustre: DEBUG MARKER: == conf-sanity test 41c: concurent mounts of MDT/OST should all fail but one == 01:58:15 (1409623095)
            19:58:50:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts
            19:58:50:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            19:58:50:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x703
            19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
            19:58:50:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
            19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
            19:58:50:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 
            19:58:50:LustreError: 27017:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 703 sleeping for 40000ms
            19:58:50:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x0
            19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1
            19:58:50:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1
            19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre   		                   /dev/lvm-Role_MDS/P1 /mnt/mds1
            19:58:50:LustreError: 15d-9: The MGS service was already started from server
            19:58:50:LustreError: 27234:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
            19:58:50:LustreError: 27234:0:(obd_mount_server.c:1443:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
            19:58:50:LustreError: 27234:0:(obd_mount_server.c:1473:server_put_super()) no obd lustre-MDT0000
            19:58:50:LustreError: 27234:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-MDT0000 not registered
            19:58:50:Lustre: MGS: Not available for connect from 0@lo (stopping)
            19:58:50:LustreError: 26988:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff88006abf3400 x1478095423936308/t0(0) o253->MGC10.1.6.34@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
            19:58:50:LustreError: 26988:0:(obd_mount_server.c:1140:server_register_target()) lustre-MDT0000: error registering with the MGS: rc = -5 (not fatal)
            19:58:50:LustreError: 26988:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880069a1f400 x1478095423936316/t0(0) o101->MGC10.1.6.34@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
            19:58:50:LustreError: 26988:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880069a1f800 x1478095423936324/t0(0) o101->MGC10.1.6.34@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
            19:58:50:LustreError: 15c-8: MGC10.1.6.34@tcp: The configuration from log 'lustre-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
            19:58:50:LustreError: 26988:0:(obd_mount_server.c:1274:server_start_targets()) failed to start server lustre-MDT0000: -5
            19:58:50:LustreError: 26988:0:(obd_mount_server.c:1716:server_fill_super()) Unable to start targets: -5
            19:58:50:LustreError: 26988:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client.
            19:58:50:LustreError: 26988:0:(obd_mount_server.c:1443:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2)
            19:59:12:LustreError: 26988:0:(obd_mount_server.c:1473:server_put_super()) no obd lustre-MDT0000
            19:59:12:LustreError: 26988:0:(obd_config.c:626:class_cleanup()) OBD 1 already stopping
            19:59:12:LustreError: 26988:0:(obd_config.c:585:class_detach()) OBD device 1 still set up
            19:59:12:LustreError: 26988:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount /dev/mapper/lvm--Role_MDS-P1 (-5)
            19:59:12:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
            19:59:12:LustreError: 27017:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 703 awake
            19:59:12:LustreError: 27017:0:(obd_class.h:1008:obd_connect()) Device 1 not setup
            19:59:12:Lustre: 27017:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:20s); client may timeout.  req@ffff880058849050 x1478095423936304/t0(0) o250-><?>@<?>:0/0 lens 400/264 e 0 to 0 dl 1409623123 ref 1 fl Complete:/0/0 rc -19/-19
            19:59:12:LNet: Service thread pid 27017 stopped after 40.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
            19:59:12:LustreError: 27234:0:(genops.c:1570:obd_exports_barrier()) ASSERTION( list_empty(&obd->obd_exports) ) failed: 
            19:59:12:LustreError: 27234:0:(genops.c:1570:obd_exports_barrier()) LBUG
            19:59:12:Pid: 27234, comm: mount.lustre
            19:59:12:
            19:59:12:Call Trace:
            19:59:12: [<ffffffffa0b29895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            19:59:12: [<ffffffffa0b29e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            19:59:12: [<ffffffffa0c5794a>] obd_exports_barrier+0x16a/0x170 [obdclass]
            19:59:12: [<ffffffffa0474a06>] mgs_device_fini+0xf6/0x5a0 [mgs]
            19:59:12: [<ffffffff81510ee2>] ? _spin_lock+0x12/0x30
            19:59:12: [<ffffffffa0c84c07>] class_cleanup+0x577/0xda0 [obdclass]
            19:59:12: [<ffffffffa0c59b36>] ? class_name2dev+0x56/0xe0 [obdclass]
            19:59:12: [<ffffffffa0c864ec>] class_process_config+0x10bc/0x1c80 [obdclass]
            19:59:12: [<ffffffffa0b34d98>] ? libcfs_log_return+0x28/0x40 [libcfs]
            19:59:12: [<ffffffffa0c7fd51>] ? lustre_cfg_new+0x391/0x7e0 [obdclass]
            19:59:12: [<ffffffffa0c87229>] class_manual_cleanup+0x179/0x6f0 [obdclass]
            19:59:12: [<ffffffffa0c59b36>] ? class_name2dev+0x56/0xe0 [obdclass]
            19:59:12: [<ffffffffa0cbc0cd>] server_put_super+0x46d/0xf00 [obdclass]
            19:59:12: [<ffffffffa0cc0a88>] server_fill_super+0x668/0x1580 [obdclass]
            19:59:12: [<ffffffffa0c91958>] lustre_fill_super+0x1d8/0x530 [obdclass]
            19:59:12: [<ffffffffa0c91780>] ? lustre_fill_super+0x0/0x530 [obdclass]
            19:59:12: [<ffffffff811845df>] get_sb_nodev+0x5f/0xa0
            19:59:12: [<ffffffffa0c89135>] lustre_get_sb+0x25/0x30 [obdclass]
            19:59:12: [<ffffffff81183c1b>] vfs_kern_mount+0x7b/0x1b0
            19:59:12: [<ffffffff81183dc2>] do_kern_mount+0x52/0x130
            19:59:12: [<ffffffff811a3f82>] do_mount+0x2d2/0x8d0
            19:59:12: [<ffffffff811a4610>] sys_mount+0x90/0xe0
            19:59:12: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            

            and more frequent OSS LBUG during concurrent OST mounts/starts in test_41c always looks like :

            04:19:32:Lustre: DEBUG MARKER: == conf-sanity test 41c: concurent mounts of MDT/OST should all fail but one == 10:19:01 (1409739541)
            04:19:32:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
            04:19:32:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
            04:19:32:Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 ||
            04:19:32:			grep -q ^lustre-ost1/ /proc/mounts ||
            04:19:32:			zpool export  lustre-ost1
            04:19:32:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x703
            04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1
            04:19:32:Lustre: DEBUG MARKER: zpool list -H lustre-ost1 >/dev/null 2>&1 ||
            04:19:32:			zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost1
            04:19:32:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x0
            04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1
            04:19:32:Lustre: DEBUG MARKER: zpool list -H lustre-ost1 >/dev/null 2>&1 ||
            04:19:32:			zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost1
            04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
            04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre   		                   lustre-ost1/ost1 /mnt/ost1
            04:19:32:LustreError: 31255:0:(obd_mount_server.c:1753:server_fill_super()) Unable to start osd on lustre-ost1/ost1: -114
            04:19:32:LustreError: 31255:0:(obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-114)
            04:19:32:LustreError: 31163:0:(llog_osd.c:918:llog_osd_open()) ASSERTION( dt ) failed: 
            04:19:32:LustreError: 31163:0:(llog_osd.c:918:llog_osd_open()) LBUG
            04:19:32:Pid: 31163, comm: mount.lustre
            04:19:32:
            04:19:32:Call Trace:
            04:19:32: [<ffffffffa05ef895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            04:19:32: [<ffffffffa05efe97>] lbug_with_loc+0x47/0xb0 [libcfs]
            04:19:32: [<ffffffffa071c574>] llog_osd_open+0x844/0xb30 [obdclass]
            04:19:32: [<ffffffffa0600181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            04:19:32: [<ffffffffa070a1da>] llog_open+0xba/0x2c0 [obdclass]
            04:19:32: [<ffffffffa070dd01>] llog_backup+0x61/0x500 [obdclass]
            04:19:32: [<ffffffff8128daa0>] ? sprintf+0x40/0x50
            04:19:32: [<ffffffffa0ee581d>] mgc_process_log+0x12fd/0x1970 [mgc]
            04:19:32: [<ffffffffa0edf260>] ? mgc_blocking_ast+0x0/0x810 [mgc]
            04:19:32: [<ffffffffa096d710>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc]
            04:19:32: [<ffffffffa0ee6da8>] mgc_process_config+0x658/0x1210 [mgc]
            04:19:32: [<ffffffffa0750a1f>] lustre_process_log+0x20f/0xad0 [obdclass]
            04:19:32: [<ffffffffa0600181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            04:19:32: [<ffffffffa05fa3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
            04:19:32: [<ffffffffa07857e7>] server_start_targets+0x767/0x1af0 [obdclass]
            04:19:32: [<ffffffffa05fa3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
            04:19:32: [<ffffffffa0754246>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass]
            04:19:32: [<ffffffffa0600181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
            04:19:32: [<ffffffffa074bfa0>] ? class_config_llog_handler+0x0/0x18c0 [obdclass]
            04:19:32: [<ffffffffa078aa35>] server_fill_super+0xc95/0x1740 [obdclass]
            04:19:32: [<ffffffffa05fa3a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
            04:19:32: [<ffffffffa0755d68>] lustre_fill_super+0x1d8/0x550 [obdclass]
            04:19:32: [<ffffffffa0755b90>] ? lustre_fill_super+0x0/0x550 [obdclass]
            04:19:32: [<ffffffff8118c5df>] get_sb_nodev+0x5f/0xa0
            04:19:32: [<ffffffffa074d965>] lustre_get_sb+0x25/0x30 [obdclass]
            04:19:32: [<ffffffff8118bc3b>] vfs_kern_mount+0x7b/0x1b0
            04:19:32: [<ffffffff8118bde2>] do_kern_mount+0x52/0x130
            04:19:32: [<ffffffff8119e9e2>] ? vfs_ioctl+0x22/0xa0
            04:19:32: [<ffffffff811ad7bb>] do_mount+0x2fb/0x930
            04:19:32: [<ffffffff811ade80>] sys_mount+0x90/0xe0
            04:19:32: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
            04:19:32:
            
            bfaccini Bruno Faccini (Inactive) added a comment - I checked conf-sanity/test_41c failures since my patch has landed and found about 15 failures in about a month with less frequent MDS LBUG during concurrent MDT mounts/starts in test_41c that always looks like : 19:58:50:Lustre: DEBUG MARKER: == conf-sanity test 41c: concurent mounts of MDT/OST should all fail but one == 01:58:15 (1409623095) 19:58:50:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 19:58:50:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 19:58:50:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x703 19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1 19:58:50:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1 19:58:50:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 19:58:50:LustreError: 27017:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 703 sleeping for 40000ms 19:58:50:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x0 19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1 19:58:50:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 19:58:50:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1 19:58:50:LustreError: 15d-9: The MGS service was already started from server 19:58:50:LustreError: 27234:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. 19:58:50:LustreError: 27234:0:(obd_mount_server.c:1443:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) 19:58:50:LustreError: 27234:0:(obd_mount_server.c:1473:server_put_super()) no obd lustre-MDT0000 19:58:50:LustreError: 27234:0:(obd_mount_server.c:135:server_deregister_mount()) lustre-MDT0000 not registered 19:58:50:Lustre: MGS: Not available for connect from 0@lo (stopping) 19:58:50:LustreError: 26988:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff88006abf3400 x1478095423936308/t0(0) o253->MGC10.1.6.34@tcp@0@lo:26/25 lens 4768/4768 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 19:58:50:LustreError: 26988:0:(obd_mount_server.c:1140:server_register_target()) lustre-MDT0000: error registering with the MGS: rc = -5 (not fatal) 19:58:50:LustreError: 26988:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880069a1f400 x1478095423936316/t0(0) o101->MGC10.1.6.34@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 19:58:50:LustreError: 26988:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired req@ffff880069a1f800 x1478095423936324/t0(0) o101->MGC10.1.6.34@tcp@0@lo:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1 19:58:50:LustreError: 15c-8: MGC10.1.6.34@tcp: The configuration from log 'lustre-MDT0000' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. 19:58:50:LustreError: 26988:0:(obd_mount_server.c:1274:server_start_targets()) failed to start server lustre-MDT0000: -5 19:58:50:LustreError: 26988:0:(obd_mount_server.c:1716:server_fill_super()) Unable to start targets: -5 19:58:50:LustreError: 26988:0:(obd_mount_server.c:865:lustre_disconnect_lwp()) lustre-MDT0000-lwp-MDT0000: Can't end config log lustre-client. 19:58:50:LustreError: 26988:0:(obd_mount_server.c:1443:server_put_super()) lustre-MDT0000: failed to disconnect lwp. (rc=-2) 19:59:12:LustreError: 26988:0:(obd_mount_server.c:1473:server_put_super()) no obd lustre-MDT0000 19:59:12:LustreError: 26988:0:(obd_config.c:626:class_cleanup()) OBD 1 already stopping 19:59:12:LustreError: 26988:0:(obd_config.c:585:class_detach()) OBD device 1 still set up 19:59:12:LustreError: 26988:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount /dev/mapper/lvm--Role_MDS-P1 (-5) 19:59:12:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 19:59:12:LustreError: 27017:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 703 awake 19:59:12:LustreError: 27017:0:(obd_class.h:1008:obd_connect()) Device 1 not setup 19:59:12:Lustre: 27017:0:(service.c:2031:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20:20s); client may timeout. req@ffff880058849050 x1478095423936304/t0(0) o250-><?>@<?>:0/0 lens 400/264 e 0 to 0 dl 1409623123 ref 1 fl Complete:/0/0 rc -19/-19 19:59:12:LNet: Service thread pid 27017 stopped after 40.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 19:59:12:LustreError: 27234:0:(genops.c:1570:obd_exports_barrier()) ASSERTION( list_empty(&obd->obd_exports) ) failed: 19:59:12:LustreError: 27234:0:(genops.c:1570:obd_exports_barrier()) LBUG 19:59:12:Pid: 27234, comm: mount.lustre 19:59:12: 19:59:12:Call Trace: 19:59:12: [<ffffffffa0b29895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 19:59:12: [<ffffffffa0b29e97>] lbug_with_loc+0x47/0xb0 [libcfs] 19:59:12: [<ffffffffa0c5794a>] obd_exports_barrier+0x16a/0x170 [obdclass] 19:59:12: [<ffffffffa0474a06>] mgs_device_fini+0xf6/0x5a0 [mgs] 19:59:12: [<ffffffff81510ee2>] ? _spin_lock+0x12/0x30 19:59:12: [<ffffffffa0c84c07>] class_cleanup+0x577/0xda0 [obdclass] 19:59:12: [<ffffffffa0c59b36>] ? class_name2dev+0x56/0xe0 [obdclass] 19:59:12: [<ffffffffa0c864ec>] class_process_config+0x10bc/0x1c80 [obdclass] 19:59:12: [<ffffffffa0b34d98>] ? libcfs_log_return+0x28/0x40 [libcfs] 19:59:12: [<ffffffffa0c7fd51>] ? lustre_cfg_new+0x391/0x7e0 [obdclass] 19:59:12: [<ffffffffa0c87229>] class_manual_cleanup+0x179/0x6f0 [obdclass] 19:59:12: [<ffffffffa0c59b36>] ? class_name2dev+0x56/0xe0 [obdclass] 19:59:12: [<ffffffffa0cbc0cd>] server_put_super+0x46d/0xf00 [obdclass] 19:59:12: [<ffffffffa0cc0a88>] server_fill_super+0x668/0x1580 [obdclass] 19:59:12: [<ffffffffa0c91958>] lustre_fill_super+0x1d8/0x530 [obdclass] 19:59:12: [<ffffffffa0c91780>] ? lustre_fill_super+0x0/0x530 [obdclass] 19:59:12: [<ffffffff811845df>] get_sb_nodev+0x5f/0xa0 19:59:12: [<ffffffffa0c89135>] lustre_get_sb+0x25/0x30 [obdclass] 19:59:12: [<ffffffff81183c1b>] vfs_kern_mount+0x7b/0x1b0 19:59:12: [<ffffffff81183dc2>] do_kern_mount+0x52/0x130 19:59:12: [<ffffffff811a3f82>] do_mount+0x2d2/0x8d0 19:59:12: [<ffffffff811a4610>] sys_mount+0x90/0xe0 19:59:12: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b and more frequent OSS LBUG during concurrent OST mounts/starts in test_41c always looks like : 04:19:32:Lustre: DEBUG MARKER: == conf-sanity test 41c: concurent mounts of MDT/OST should all fail but one == 10:19:01 (1409739541) 04:19:32:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts 04:19:32:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 04:19:32:Lustre: DEBUG MARKER: ! zpool list -H lustre-ost1 >/dev/null 2>&1 || 04:19:32: grep -q ^lustre-ost1/ /proc/mounts || 04:19:32: zpool export lustre-ost1 04:19:32:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x703 04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1 04:19:32:Lustre: DEBUG MARKER: zpool list -H lustre-ost1 >/dev/null 2>&1 || 04:19:32: zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost1 04:19:32:Lustre: DEBUG MARKER: lctl set_param fail_loc=0x0 04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1 04:19:32:Lustre: DEBUG MARKER: zpool list -H lustre-ost1 >/dev/null 2>&1 || 04:19:32: zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost1 04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre lustre-ost1/ost1 /mnt/ost1 04:19:32:Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre lustre-ost1/ost1 /mnt/ost1 04:19:32:LustreError: 31255:0:(obd_mount_server.c:1753:server_fill_super()) Unable to start osd on lustre-ost1/ost1: -114 04:19:32:LustreError: 31255:0:(obd_mount.c:1340:lustre_fill_super()) Unable to mount (-114) 04:19:32:LustreError: 31163:0:(llog_osd.c:918:llog_osd_open()) ASSERTION( dt ) failed: 04:19:32:LustreError: 31163:0:(llog_osd.c:918:llog_osd_open()) LBUG 04:19:32:Pid: 31163, comm: mount.lustre 04:19:32: 04:19:32:Call Trace: 04:19:32: [<ffffffffa05ef895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 04:19:32: [<ffffffffa05efe97>] lbug_with_loc+0x47/0xb0 [libcfs] 04:19:32: [<ffffffffa071c574>] llog_osd_open+0x844/0xb30 [obdclass] 04:19:32: [<ffffffffa0600181>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:19:32: [<ffffffffa070a1da>] llog_open+0xba/0x2c0 [obdclass] 04:19:32: [<ffffffffa070dd01>] llog_backup+0x61/0x500 [obdclass] 04:19:32: [<ffffffff8128daa0>] ? sprintf+0x40/0x50 04:19:32: [<ffffffffa0ee581d>] mgc_process_log+0x12fd/0x1970 [mgc] 04:19:32: [<ffffffffa0edf260>] ? mgc_blocking_ast+0x0/0x810 [mgc] 04:19:32: [<ffffffffa096d710>] ? ldlm_completion_ast+0x0/0x930 [ptlrpc] 04:19:32: [<ffffffffa0ee6da8>] mgc_process_config+0x658/0x1210 [mgc] 04:19:32: [<ffffffffa0750a1f>] lustre_process_log+0x20f/0xad0 [obdclass] 04:19:32: [<ffffffffa0600181>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:19:32: [<ffffffffa05fa3a8>] ? libcfs_log_return+0x28/0x40 [libcfs] 04:19:32: [<ffffffffa07857e7>] server_start_targets+0x767/0x1af0 [obdclass] 04:19:32: [<ffffffffa05fa3a8>] ? libcfs_log_return+0x28/0x40 [libcfs] 04:19:32: [<ffffffffa0754246>] ? lustre_start_mgc+0x4b6/0x1e00 [obdclass] 04:19:32: [<ffffffffa0600181>] ? libcfs_debug_msg+0x41/0x50 [libcfs] 04:19:32: [<ffffffffa074bfa0>] ? class_config_llog_handler+0x0/0x18c0 [obdclass] 04:19:32: [<ffffffffa078aa35>] server_fill_super+0xc95/0x1740 [obdclass] 04:19:32: [<ffffffffa05fa3a8>] ? libcfs_log_return+0x28/0x40 [libcfs] 04:19:32: [<ffffffffa0755d68>] lustre_fill_super+0x1d8/0x550 [obdclass] 04:19:32: [<ffffffffa0755b90>] ? lustre_fill_super+0x0/0x550 [obdclass] 04:19:32: [<ffffffff8118c5df>] get_sb_nodev+0x5f/0xa0 04:19:32: [<ffffffffa074d965>] lustre_get_sb+0x25/0x30 [obdclass] 04:19:32: [<ffffffff8118bc3b>] vfs_kern_mount+0x7b/0x1b0 04:19:32: [<ffffffff8118bde2>] do_kern_mount+0x52/0x130 04:19:32: [<ffffffff8119e9e2>] ? vfs_ioctl+0x22/0xa0 04:19:32: [<ffffffff811ad7bb>] do_mount+0x2fb/0x930 04:19:32: [<ffffffff811ade80>] sys_mount+0x90/0xe0 04:19:32: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 04:19:32:

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: