Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2888

After downgrade from 2.4 to 2.1.4, hit (osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0, Lustre 2.1.6
    • Lustre 2.4.0, Lustre 2.1.4
    • None
    • before upgrade, server and client: 2.1.4 RHEL6
      after upgrade, server and client: lustre-master build# 1270 RHEL6
    • 3
    • 6970

    Description

      Here are what I did:
      1. format the system as 2.1.4 and then upgrade to 2.4, success.
      2. showdown the filesystem and disable quota
      3. downgrade the system to 2.1.4 again, when mount MDS, hit following errors

      Here is the console of MDS:

      Lustre: DEBUG MARKER: == upgrade-downgrade End == 18:53:45 (1362020025)
      LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
      LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
      LDISKFS-fs warning (device sdb1): ldiskfs_fill_super: extents feature not enabled on this filesystem, use tune2fs.
      LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: 
      Lustre: MGS MGS started
      Lustre: 7888:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 7306ea48-8511-52b2-40cf-6424fc417e41@0@lo t0 exp (null) cur 1362020029 last 0
      Lustre: MGC10.10.4.132@tcp: Reactivating import
      Lustre: MGS: Logs for fs lustre were removed by user request.  All servers must be restarted in order to regenerate the logs.
      Lustre: Setting parameter lustre-MDT0000-mdtlov.lov.stripesize in log lustre-MDT0000
      Lustre: Setting parameter lustre-clilov.lov.stripesize in log lustre-client
      Lustre: Enabling ACL
      Lustre: Enabling user_xattr
      LustreError: 7901:0:(osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed: 
      LustreError: 7901:0:(osd_handler.c:2343:osd_index_try()) LBUG
      Pid: 7901, comm: llog_process_th
      
      Message from
      Call Trace:
       syslogd@fat-amd [<ffffffffa03797f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      -1 at Feb 27 18: [<ffffffffa0379e07>] lbug_with_loc+0x47/0xb0 [libcfs]
      53:49 ...
       ker [<ffffffffa0d6bd74>] osd_index_try+0x84/0x540 [osd_ldiskfs]
      nel:LustreError: [<ffffffffa04c1dfe>] dt_try_as_dir+0x3e/0x60 [obdclass]
       7901:0:(osd_han [<ffffffffa0c5eb3a>] orph_index_init+0x6a/0x1e0 [mdd]
      dler.c:2343:osd_ [<ffffffffa0c6ec45>] mdd_prepare+0x1d5/0x640 [mdd]
      index_try()) ASS [<ffffffffa0ccd23c>] ? mdt_process_config+0x6c/0x1030 [mdt]
      ERTION( dt_objec [<ffffffffa0da0499>] cmm_prepare+0x39/0xe0 [cmm]
      t_exists(dt) ) f [<ffffffffa0ccfd7d>] mdt_device_alloc+0xe0d/0x2190 [mdt]
      ailed: 
      
      Me [<ffffffffa04bdeff>] ? keys_fill+0x6f/0x1a0 [obdclass]
      ssage from syslo [<ffffffffa04a2c87>] obd_setup+0x1d7/0x2f0 [obdclass]
      gd@fat-amd-1 at  [<ffffffffa048ef3b>] ? class_new_export+0x72b/0x960 [obdclass]
      Feb 27 18:53:49  [<ffffffffa04a2fa8>] class_setup+0x208/0x890 [obdclass]
      ...
       kernel:Lu [<ffffffffa04aac6c>] class_process_config+0xc3c/0x1c30 [obdclass]
      streError: 7901: [<ffffffffa037a993>] ? cfs_alloc+0x63/0x90 [libcfs]
      0:(osd_handler.c [<ffffffffa04a5813>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
      :2343:osd_index_ [<ffffffffa04acd0b>] class_config_llog_handler+0x9bb/0x1610 [obdclass]
      try()) LBUG
       [<ffffffffa0637e3b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc]
       [<ffffffffa0478098>] llog_process_thread+0x888/0xd00 [obdclass]
       [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass]
       [<ffffffff8100c14a>] child_rip+0xa/0x20
       [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass]
       [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass]
       [<ffffffff8100c140>] ? child_rip+0x0/0x20
      
      Kernel panic - not syncing: LBUG
      Pid: 7901, comm: llog_process_th Not tainted 2.6.32-279.14.1.el6_lustre.x86_64 #1
      Call Trace:
      
       [<ffffffff814fdcba>] ? panic+0xa0/0x168
      Message from sy [<ffffffffa0379e5b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
      slogd@fat-amd-1  [<ffffffffa0d6bd74>] ? osd_index_try+0x84/0x540 [osd_ldiskfs]
      at Feb 27 18:53: [<ffffffffa04c1dfe>] ? dt_try_as_dir+0x3e/0x60 [obdclass]
      49 ...
       kernel [<ffffffffa0c5eb3a>] ? orph_index_init+0x6a/0x1e0 [mdd]
      :Kernel panic -  [<ffffffffa0c6ec45>] ? mdd_prepare+0x1d5/0x640 [mdd]
      not syncing: LBU [<ffffffffa0ccd23c>] ? mdt_process_config+0x6c/0x1030 [mdt]
      G
       [<ffffffffa0da0499>] ? cmm_prepare+0x39/0xe0 [cmm]
       [<ffffffffa0ccfd7d>] ? mdt_device_alloc+0xe0d/0x2190 [mdt]
       [<ffffffffa04bdeff>] ? keys_fill+0x6f/0x1a0 [obdclass]
       [<ffffffffa04a2c87>] ? obd_setup+0x1d7/0x2f0 [obdclass]
       [<ffffffffa048ef3b>] ? class_new_export+0x72b/0x960 [obdclass]
       [<ffffffffa04a2fa8>] ? class_setup+0x208/0x890 [obdclass]
       [<ffffffffa04aac6c>] ? class_process_config+0xc3c/0x1c30 [obdclass]
       [<ffffffffa037a993>] ? cfs_alloc+0x63/0x90 [libcfs]
       [<ffffffffa04a5813>] ? lustre_cfg_new+0x353/0x7e0 [obdclass]
       [<ffffffffa04acd0b>] ? class_config_llog_handler+0x9bb/0x1610 [obdclass]
       [<ffffffffa0637e3b>] ? llog_client_next_block+0x1db/0x4b0 [ptlrpc]
       [<ffffffffa0478098>] ? llog_process_thread+0x888/0xd00 [obdclass]
       [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass]
       [<ffffffff8100c14a>] ? child_rip+0xa/0x20
       [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass]
       [<ffffffffa0477810>] ? llog_process_thread+0x0/0xd00 [obdclass]
       [<ffffffff8100c140>] ? child_rip+0x0/0x20
      Initializing cgroup subsys cpuset
      Initializing cgroup subsys cpu
      

      Attachments

        Issue Links

          Activity

            [LU-2888] After downgrade from 2.4 to 2.1.4, hit (osd_handler.c:2343:osd_index_try()) ASSERTION( dt_object_exists(dt) ) failed

            Bobijam, that sounds like a very critical problem. Does conf-sanity test 32 not detect this problem during 2.1 to 2.4 upgrade? Is that problem repeatable?

            Please file a separate bug for that problem and make it a 2.4 blocker until it is better understood and fixed.

            adilger Andreas Dilger added a comment - Bobijam, that sounds like a very critical problem. Does conf-sanity test 32 not detect this problem during 2.1 to 2.4 upgrade? Is that problem repeatable? Please file a separate bug for that problem and make it a 2.4 blocker until it is better understood and fixed.
            bobijam Zhenyu Xu added a comment -

            Actually, I found that when 2.1 formatted disk upgraded 2.4, all files' size becomes 0, their content are lost.

            bobijam Zhenyu Xu added a comment - Actually, I found that when 2.1 formatted disk upgraded 2.4, all files' size becomes 0, their content are lost.
            bobijam Zhenyu Xu added a comment -

            This log shows that 2.4 MDT start cannot find the old llog objects, and created new ones in 2.4 way (using llog_osd_ops), which 2.1 code (using llog_lvfs_ops) cannot recognise.

            bobijam Zhenyu Xu added a comment - This log shows that 2.4 MDT start cannot find the old llog objects, and created new ones in 2.4 way (using llog_osd_ops), which 2.1 code (using llog_lvfs_ops) cannot recognise.
            yujian Jian Yu added a comment -

            So is my reaing right that the test actually failed because all OSCs are in inactive state, so it won't be possible to create any new files on such a filesystem?

            Right, touching a new file on Lustre client failed as follows:

            # touch /mnt/lustre/file
            touch: cannot touch `/mnt/lustre/file': Input/output error
            

            I do not see any attempts for the test to actually create anything post-downgrade so perhaps it's a case we are missing?

            The upgrade/downgrade testing needs to be performed as per wiki page https://wiki.hpdd.intel.com/display/ENG/Upgrade+and+Downgrade+Testing. As we can see, the data creating/verifying steps are included in the post-downgrade phases. However, these test cases are not covered in one test script currently. The script which detected the issue in this ticket was upgrade-downgrade.sh, which covered extra quotas and OST pools testing along the upgrade/downgrade path.

            The issue is that while performing this script to do downgrade testing, the extra quotas testing was disabled (due to the new way of setting quotas on master branch), only OST pools testing was covered along the downgrade path, which only verified that the existing files/directories could be accessed and the striping info was correct, but did not create new files.

            So, in order to cover all of the test cases in the wiki page, we need improve upgrade-downgrade.sh to make the quotas codes work on master branch, and also need run the other two test scripts {clean,rolling}-upgrade-downgrade.sh before the test cases covered by them are added into upgrade-downgrade.sh.

            yujian Jian Yu added a comment - So is my reaing right that the test actually failed because all OSCs are in inactive state, so it won't be possible to create any new files on such a filesystem? Right, touching a new file on Lustre client failed as follows: # touch /mnt/lustre/file touch: cannot touch `/mnt/lustre/file': Input/output error I do not see any attempts for the test to actually create anything post-downgrade so perhaps it's a case we are missing? The upgrade/downgrade testing needs to be performed as per wiki page https://wiki.hpdd.intel.com/display/ENG/Upgrade+and+Downgrade+Testing . As we can see, the data creating/verifying steps are included in the post-downgrade phases. However, these test cases are not covered in one test script currently. The script which detected the issue in this ticket was upgrade-downgrade.sh, which covered extra quotas and OST pools testing along the upgrade/downgrade path. The issue is that while performing this script to do downgrade testing, the extra quotas testing was disabled (due to the new way of setting quotas on master branch), only OST pools testing was covered along the downgrade path, which only verified that the existing files/directories could be accessed and the striping info was correct, but did not create new files. So, in order to cover all of the test cases in the wiki page, we need improve upgrade-downgrade.sh to make the quotas codes work on master branch, and also need run the other two test scripts {clean,rolling}-upgrade-downgrade.sh before the test cases covered by them are added into upgrade-downgrade.sh.
            bobijam Zhenyu Xu added a comment -

            Without those patches having on-disk change porting backward, the upgrade then downgrade test would be a headache.

            bobijam Zhenyu Xu added a comment - Without those patches having on-disk change porting backward, the upgrade then downgrade test would be a headache.
            bobijam Zhenyu Xu added a comment -

            I don't know about the test case, but the latest error has something about CATALOGS file changing in 2.4.

            the CATALOGS write by 2.1 is as follows (logid is i_ino+ __u64 0x0+i_generation)

            1. od -x /mnt/mds1/CATALOGS
              0000000 0021 0000 0000 0000 0000 0000 0000 0000
              0000020 2a9d 1d4f 0000 0000 0000 0000 0000 0000
              0000040 0022 0000 0000 0000 0000 0000 0000 0000
              0000060 2a9e 1d4f 0000 0000 0000 0000 0000 0000
              0000100 0023 0000 0000 0000 0000 0000 0000 0000
              0000120 2a9f 1d4f 0000 0000 0000 0000 0000 0000
              0000140

            after 2.4 mounted it, the CATALOGS logic arrays changes to

            1. od -x /mnt/mds1/CATALOGS
              0000000 0002 0000 0000 0000 0001 0000 0000 0000
              0000020 0000 0000 0000 0000 0000 0000 0000 0000
              0000040 0004 0000 0000 0000 0001 0000 0000 0000
              0000060 0000 0000 0000 0000 0000 0000 0000 0000
              0000100 0006 0000 0000 0000 0001 0000 0000 0000
              0000120 0000 0000 0000 0000 0000 0000 0000 0000
              0000140
            bobijam Zhenyu Xu added a comment - I don't know about the test case, but the latest error has something about CATALOGS file changing in 2.4. the CATALOGS write by 2.1 is as follows (logid is i_ino+ __u64 0x0+i_generation) od -x /mnt/mds1/CATALOGS 0000000 0021 0000 0000 0000 0000 0000 0000 0000 0000020 2a9d 1d4f 0000 0000 0000 0000 0000 0000 0000040 0022 0000 0000 0000 0000 0000 0000 0000 0000060 2a9e 1d4f 0000 0000 0000 0000 0000 0000 0000100 0023 0000 0000 0000 0000 0000 0000 0000 0000120 2a9f 1d4f 0000 0000 0000 0000 0000 0000 0000140 after 2.4 mounted it, the CATALOGS logic arrays changes to od -x /mnt/mds1/CATALOGS 0000000 0002 0000 0000 0000 0001 0000 0000 0000 0000020 0000 0000 0000 0000 0000 0000 0000 0000 0000040 0004 0000 0000 0000 0001 0000 0000 0000 0000060 0000 0000 0000 0000 0000 0000 0000 0000 0000100 0006 0000 0000 0000 0001 0000 0000 0000 0000120 0000 0000 0000 0000 0000 0000 0000 0000 0000140
            green Oleg Drokin added a comment -

            So is my reaing right that the test actually failed because all OSCs are in inactive state, so it won't be possible to create any new files on such a filesystem? I do not see any attempts for the test to actually create anything post-downgrade so perhaps it's a case we are missing?

            green Oleg Drokin added a comment - So is my reaing right that the test actually failed because all OSCs are in inactive state, so it won't be possible to create any new files on such a filesystem? I do not see any attempts for the test to actually create anything post-downgrade so perhaps it's a case we are missing?
            mdiep Minh Diep added a comment - Here is the test report: https://maloo.whamcloud.com/test_sessions/f77f5032-9cd5-11e2-802d-52540035b04c
            yujian Jian Yu added a comment - - edited

            just posted a 2.1 patch to port only necessary ldiskfs based OI implementation at http://review.whamcloud.com/5731

            Here are the test configuration and result:

            Lustre b2_1 build: http://build.whamcloud.com/job/lustre-reviews/14375/
            Lustre master build: http://build.whamcloud.com/job/lustre-master/1369/
            Distro/Arch: RHEL6.3/x86_64

            Clean upgrade and downgrade path: b2_1->master->b2_1

            After downgrading from master to b2_1, mounting the server targets and clients succeeded. However, on the MDS node, "lctl dl" showed that:

            [root@wtm-83 ~]# lctl dl
              0 UP mgs MGS MGS 15
              1 UP mgc MGC10.10.19.8@tcp ac763461-679d-82b7-e00e-7e3d7f5e6234 5
              2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4
              3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 9
              4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3
              5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
              6 IN osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5
            

            Dmesg on the MDS node showed that:

            Lustre: MGS MGS started
            Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from ac763461-679d-82b7-e00e-7e3d7f5e6234@0@lo t0 exp (null) cur 1365015074 last 0
            Lustre: MGC10.10.19.8@tcp: Reactivating import
            Lustre: Enabling ACL
            Lustre: Enabling user_xattr
            Lustre: lustre-MDT0000: used disk, loading
            Lustre: 10441:0:(mdt_lproc.c:416:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity
            LustreError: 10441:0:(llog_lvfs.c:199:llog_lvfs_read_header()) bad log / header magic: 0x2e (expected 0x10645539)
            LustreError: 10441:0:(llog_obd.c:220:llog_setup_named()) obd lustre-OST0000-osc-MDT0000 ctxt 2 lop_setup=ffffffffa0562ca0 failed -5
            LustreError: 10441:0:(osc_request.c:4231:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
            LustreError: 10441:0:(osc_request.c:4248:__osc_llog_init()) osc 'lustre-OST0000-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880c24f6b8b0 rc=-5
            LustreError: 10441:0:(osc_request.c:4250:__osc_llog_init()) logid 0x2:0x0
            LustreError: 10441:0:(osc_request.c:4278:osc_llog_init()) rc: -5
            LustreError: 10441:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 0 osc 'lustre-OST0000-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-5)
            LustreError: 10441:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x4:0x0: rc -116
            LustreError: 10441:0:(llog_obd.c:220:llog_setup_named()) obd lustre-OST0001-osc-MDT0000 ctxt 2 lop_setup=ffffffffa0562ca0 failed -116
            LustreError: 10441:0:(osc_request.c:4231:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT
            LustreError: 10441:0:(osc_request.c:4248:__osc_llog_init()) osc 'lustre-OST0001-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880c24f6b8b0 rc=-116
            LustreError: 10441:0:(osc_request.c:4250:__osc_llog_init()) logid 0x4:0x0
            LustreError: 10441:0:(osc_request.c:4278:osc_llog_init()) rc: -116
            LustreError: 10441:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 1 osc 'lustre-OST0001-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116)
            Lustre: 10566:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
            Lustre: 10567:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release.
            Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 4447b0f0-2f2f-51f4-cd9f-aa011ef3eb77@10.10.19.17@tcp t0 exp (null) cur 1365015077 last 0
            Lustre: 10209:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1365015074/real 1365015074]  req@ffff8805fb180800 x1431322036797450/t0(0) o8->lustre-OST0000-osc-MDT0000@10.10.19.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1365015079 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from a8451e2f-8501-14bd-8ab3-88e0df1b7640@10.10.19.26@tcp t0 exp (null) cur 1365015079 last 0
            Lustre: 10209:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request  sent has timed out for slow reply: [sent 1365015075/real 1365015075]  req@ffff880626de8800 x1431322036797451/t0(0) o8->lustre-OST0001-osc-MDT0000@10.10.19.26@tcp:28/4 lens 368/512 e 0 to 1 dl 1365015080 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
            Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from a2dec252-dec3-5b33-4395-9b26a06a9dac@10.10.18.253@tcp t0 exp (null) cur 1365015081 last 0
            Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
            Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
            Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
            Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp
            LustreError: 10599:0:(lov_log.c:160:lov_llog_origin_connect()) error osc_llog_connect tgt 1 (-107)
            LustreError: 10598:0:(mds_lov.c:832:__mds_lov_synchronize()) lustre-OST0000_UUID failed at llog_origin_connect: -107
            LustreError: 10598:0:(mds_lov.c:861:__mds_lov_synchronize()) sync lustre-OST0000_UUID failed -107
            LustreError: 10598:0:(mds_lov.c:865:__mds_lov_synchronize()) deactivating lustre-OST0000_UUID
            LustreError: 10599:0:(lov_log.c:160:lov_llog_origin_connect()) Skipped 1 previous similar message
            LustreError: 10599:0:(mds_lov.c:832:__mds_lov_synchronize()) lustre-OST0001_UUID failed at llog_origin_connect: -107
            LustreError: 10599:0:(mds_lov.c:861:__mds_lov_synchronize()) sync lustre-OST0001_UUID failed -107
            LustreError: 10599:0:(mds_lov.c:865:__mds_lov_synchronize()) deactivating lustre-OST0001_UUID
            Lustre: 10454:0:(ldlm_lib.c:952:target_handle_connect()) lustre-MDT0000: connection from d03c55d9-e816-05ed-ff50-0ba89f2504bb@10.10.18.253@tcp t0 exp (null) cur 1365015101 last 0
            Lustre: DEBUG MARKER: Using TIMEOUT=20
            Lustre: DEBUG MARKER: 2 OST are inactive after 20 seconds, give up
            

            Test report is still in Maloo import queue.

            yujian Jian Yu added a comment - - edited just posted a 2.1 patch to port only necessary ldiskfs based OI implementation at http://review.whamcloud.com/5731 Here are the test configuration and result: Lustre b2_1 build: http://build.whamcloud.com/job/lustre-reviews/14375/ Lustre master build: http://build.whamcloud.com/job/lustre-master/1369/ Distro/Arch: RHEL6.3/x86_64 Clean upgrade and downgrade path: b2_1->master->b2_1 After downgrading from master to b2_1, mounting the server targets and clients succeeded. However, on the MDS node, "lctl dl" showed that: [root@wtm-83 ~]# lctl dl 0 UP mgs MGS MGS 15 1 UP mgc MGC10.10.19.8@tcp ac763461-679d-82b7-e00e-7e3d7f5e6234 5 2 UP lov lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 4 3 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 9 4 UP mds mdd_obd-lustre-MDT0000 mdd_obd_uuid-lustre-MDT0000 3 5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 6 IN osc lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 Dmesg on the MDS node showed that: Lustre: MGS MGS started Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from ac763461-679d-82b7-e00e-7e3d7f5e6234@0@lo t0 exp (null) cur 1365015074 last 0 Lustre: MGC10.10.19.8@tcp: Reactivating import Lustre: Enabling ACL Lustre: Enabling user_xattr Lustre: lustre-MDT0000: used disk, loading Lustre: 10441:0:(mdt_lproc.c:416:lprocfs_wr_identity_upcall()) lustre-MDT0000: identity upcall set to /usr/sbin/l_getidentity LustreError: 10441:0:(llog_lvfs.c:199:llog_lvfs_read_header()) bad log / header magic: 0x2e (expected 0x10645539) LustreError: 10441:0:(llog_obd.c:220:llog_setup_named()) obd lustre-OST0000-osc-MDT0000 ctxt 2 lop_setup=ffffffffa0562ca0 failed -5 LustreError: 10441:0:(osc_request.c:4231:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT LustreError: 10441:0:(osc_request.c:4248:__osc_llog_init()) osc 'lustre-OST0000-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880c24f6b8b0 rc=-5 LustreError: 10441:0:(osc_request.c:4250:__osc_llog_init()) logid 0x2:0x0 LustreError: 10441:0:(osc_request.c:4278:osc_llog_init()) rc: -5 LustreError: 10441:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 0 osc 'lustre-OST0000-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-5) LustreError: 10441:0:(llog_lvfs.c:616:llog_lvfs_create()) error looking up logfile 0x4:0x0: rc -116 LustreError: 10441:0:(llog_obd.c:220:llog_setup_named()) obd lustre-OST0001-osc-MDT0000 ctxt 2 lop_setup=ffffffffa0562ca0 failed -116 LustreError: 10441:0:(osc_request.c:4231:__osc_llog_init()) failed LLOG_MDS_OST_ORIG_CTXT LustreError: 10441:0:(osc_request.c:4248:__osc_llog_init()) osc 'lustre-OST0001-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' catid ffff880c24f6b8b0 rc=-116 LustreError: 10441:0:(osc_request.c:4250:__osc_llog_init()) logid 0x4:0x0 LustreError: 10441:0:(osc_request.c:4278:osc_llog_init()) rc: -116 LustreError: 10441:0:(lov_log.c:248:lov_llog_init()) error osc_llog_init idx 1 osc 'lustre-OST0001-osc-MDT0000' tgt 'mdd_obd-lustre-MDT0000' (rc=-116) Lustre: 10566:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release. Lustre: 10567:0:(debug.c:326:libcfs_debug_str2mask()) You are trying to use a numerical value for the mask - this will be deprecated in a future release. Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from 4447b0f0-2f2f-51f4-cd9f-aa011ef3eb77@10.10.19.17@tcp t0 exp (null) cur 1365015077 last 0 Lustre: 10209:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1365015074/real 1365015074] req@ffff8805fb180800 x1431322036797450/t0(0) o8->lustre-OST0000-osc-MDT0000@10.10.19.17@tcp:28/4 lens 368/512 e 0 to 1 dl 1365015079 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from a8451e2f-8501-14bd-8ab3-88e0df1b7640@10.10.19.26@tcp t0 exp (null) cur 1365015079 last 0 Lustre: 10209:0:(client.c:1817:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1365015075/real 1365015075] req@ffff880626de8800 x1431322036797451/t0(0) o8->lustre-OST0001-osc-MDT0000@10.10.19.26@tcp:28/4 lens 368/512 e 0 to 1 dl 1365015080 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 10437:0:(ldlm_lib.c:952:target_handle_connect()) MGS: connection from a2dec252-dec3-5b33-4395-9b26a06a9dac@10.10.18.253@tcp t0 exp (null) cur 1365015081 last 0 Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp Lustre: lustre-MDT0000: Temporarily refusing client connection from 10.10.18.253@tcp LustreError: 10599:0:(lov_log.c:160:lov_llog_origin_connect()) error osc_llog_connect tgt 1 (-107) LustreError: 10598:0:(mds_lov.c:832:__mds_lov_synchronize()) lustre-OST0000_UUID failed at llog_origin_connect: -107 LustreError: 10598:0:(mds_lov.c:861:__mds_lov_synchronize()) sync lustre-OST0000_UUID failed -107 LustreError: 10598:0:(mds_lov.c:865:__mds_lov_synchronize()) deactivating lustre-OST0000_UUID LustreError: 10599:0:(lov_log.c:160:lov_llog_origin_connect()) Skipped 1 previous similar message LustreError: 10599:0:(mds_lov.c:832:__mds_lov_synchronize()) lustre-OST0001_UUID failed at llog_origin_connect: -107 LustreError: 10599:0:(mds_lov.c:861:__mds_lov_synchronize()) sync lustre-OST0001_UUID failed -107 LustreError: 10599:0:(mds_lov.c:865:__mds_lov_synchronize()) deactivating lustre-OST0001_UUID Lustre: 10454:0:(ldlm_lib.c:952:target_handle_connect()) lustre-MDT0000: connection from d03c55d9-e816-05ed-ff50-0ba89f2504bb@10.10.18.253@tcp t0 exp (null) cur 1365015101 last 0 Lustre: DEBUG MARKER: Using TIMEOUT=20 Lustre: DEBUG MARKER: 2 OST are inactive after 20 seconds, give up Test report is still in Maloo import queue.
            bobijam Zhenyu Xu added a comment -

            ok, and done

            bobijam Zhenyu Xu added a comment - ok, and done
            green Oleg Drokin added a comment -

            Can yu please rebase this on top of current b2_1?

            green Oleg Drokin added a comment - Can yu please rebase this on top of current b2_1?

            People

              bobijam Zhenyu Xu
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: