Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3142

recovery-mds-scale test_failover_mds: dd: writing `/mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com/dd-file': Bad file descriptor

Details

    • 3
    • 7628

    Description

      While running recovery-mds-scale test_failover_mds, dd operation failed on one of the client nodes as follows:

      2013-04-08 22:25:26: dd run starting
      + mkdir -p /mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com
      + /usr/bin/lfs setstripe -c -1 /mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com
      + cd /mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com
      ++ /usr/bin/lfs df /mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com
      + FREE_SPACE=12963076
      + BLKS=2916692
      + echo 'Free disk space is 12963076, 4k blocks to dd is 2916692'
      + load_pid=8739
      + wait 8739
      + dd bs=4k count=2916692 status=noxfer if=/dev/zero of=/mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com/dd-file
      dd: writing `/mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com/dd-file': Bad file descriptor
      295176+0 records in
      295175+0 records out
      + '[' 1 -eq 0 ']'
      ++ date '+%F %H:%M:%S'
      + echoerr '2013-04-08 22:27:28: dd failed'
      + echo '2013-04-08 22:27:28: dd failed'
      2013-04-08 22:27:28: dd failed
      

      Maloo report: https://maloo.whamcloud.com/test_sets/68bce4aa-a1bb-11e2-bdac-52540035b04c

      Attachments

        Activity

          [LU-3142] recovery-mds-scale test_failover_mds: dd: writing `/mnt/lustre/d0.dd-client-32vm5.lab.whamcloud.com/dd-file': Bad file descriptor
          pjones Peter Jones added a comment -

          James please could you open a new ticket to track this bug? It sounds like what you are describing is a regression introduced into this work (which was included in 2.4.0) by the work from LU-2193 (which happened since 2.4.0) so it would be easier to create a new ticket and link it to the other two related tickets. It can get really confusing trying to work out the situation for a given bug if there are commits to it spanning release boundaries.

          pjones Peter Jones added a comment - James please could you open a new ticket to track this bug? It sounds like what you are describing is a regression introduced into this work (which was included in 2.4.0) by the work from LU-2193 (which happened since 2.4.0) so it would be easier to create a new ticket and link it to the other two related tickets. It can get really confusing trying to work out the situation for a given bug if there are commits to it spanning release boundaries.

          Created a patch to fix this issue.

          http://review.whamcloud.com/#change,6566

          simmonsja James A Simmons added a comment - Created a patch to fix this issue. http://review.whamcloud.com/#change,6566
          simmonsja James A Simmons added a comment - - edited

          I found it. Patch http://review.whamcloud.com/#change,4501 removed the fid_build_from_res_name function. Creating patch and testing...

          simmonsja James A Simmons added a comment - - edited I found it. Patch http://review.whamcloud.com/#change,4501 removed the fid_build_from_res_name function. Creating patch and testing...

          I don't know how it got passed you build system but patch http://review.whamcloud.com/#change,6102 is missing the
          function fid_build_from_res_name.

          lustre-2.4.0/lustre/mdt/mdt_handler.c: In function ‘mdt_intent_layout’:
          lustre-2.4.0/lustre/mdt/mdt_handler.c:3754: error: implicit declaration of function ‘fid_build_from_res_name’

          simmonsja James A Simmons added a comment - I don't know how it got passed you build system but patch http://review.whamcloud.com/#change,6102 is missing the function fid_build_from_res_name. lustre-2.4.0/lustre/mdt/mdt_handler.c: In function ‘mdt_intent_layout’: lustre-2.4.0/lustre/mdt/mdt_handler.c:3754: error: implicit declaration of function ‘fid_build_from_res_name’
          pjones Peter Jones added a comment -

          Efficiency of solution will be improved under LU-2818. Current solution sufficient for 2.4

          pjones Peter Jones added a comment - Efficiency of solution will be improved under LU-2818 . Current solution sufficient for 2.4

          Not closing this bug until the redundant getxattr call has been cleaned up per inspection comments.

          adilger Andreas Dilger added a comment - Not closing this bug until the redundant getxattr call has been cleaned up per inspection comments.
          yujian Jian Yu added a comment -

          A patch for master branch to gather the logs on passive server nodes in failure configuration: http://review.whamcloud.com/6112.

          yujian Jian Yu added a comment - A patch for master branch to gather the logs on passive server nodes in failure configuration: http://review.whamcloud.com/6112 .

          this bug should be caused by the wrong size of "MD" in MDT,

          00000004:00000001:21.0:1366291257.871077:0:4881:0:(mdd_object.c:268:mdd_xattr_get()) Process entered
          00000004:00000001:21.0:1366291257.871080:0:4881:0:(lod_object.c:373:lod_xattr_get()) Process entered
          00000004:00000001:21.0:1366291257.871085:0:4881:0:(lod_object.c:377:lod_xattr_get()) Process leaving (rc=176 : 176 : b0)
          00000004:00000001:21.0:1366291257.871087:0:4881:0:(mdd_object.c:281:mdd_xattr_get()) Process leaving (rc=176 : 176 : b0)
          00000004:00020000:21.0:1366291257.871089:0:4881:0:(mdt_lvb.c:158:mdt_lvbo_fill()) lustre-MDT0000: expected 176 actual 128.
          00000004:00000001:21.0:1366291257.880256:0:4881:0:(mdt_lvb.c:159:mdt_lvbo_fill()) Process leaving via out (rc=18446744073709551582 : -34 : 0xffffffffffffffde)

          1, the default value of mdt_device->mdt_max_mdsize is 128bytes,
          #define MAX_MD_SIZE (sizeof(struct lov_mds_md) + 4 * sizeof(struct lov_ost_data)

          2, before failover, the MD size is changed to 176bytes = (sizeof(struct lov_mds_md) + 6 * sizeof(struct lov_ost_data)
          mdt_device->mdt_max_mdsize is updated accordingly (see mdt_attr_get_lov, will update "mdt_max_mdsize" in "getattr" request)

          3, after failover, the new MDT doesn't know the actual mdt_max_mdsize, and still use the default value, then client calls ll_layout_refresh
          to get the MD and the MDT will failed with ERANGE for there is still no "getattr" request to update mdt_device>mdt_max_mdsize!

          the patch is tracked at http://review.whamcloud.com/#change,6102

          hongchao.zhang Hongchao Zhang added a comment - this bug should be caused by the wrong size of "MD" in MDT, 00000004:00000001:21.0:1366291257.871077:0:4881:0:(mdd_object.c:268:mdd_xattr_get()) Process entered 00000004:00000001:21.0:1366291257.871080:0:4881:0:(lod_object.c:373:lod_xattr_get()) Process entered 00000004:00000001:21.0:1366291257.871085:0:4881:0:(lod_object.c:377:lod_xattr_get()) Process leaving (rc=176 : 176 : b0) 00000004:00000001:21.0:1366291257.871087:0:4881:0:(mdd_object.c:281:mdd_xattr_get()) Process leaving (rc=176 : 176 : b0) 00000004:00020000:21.0:1366291257.871089:0:4881:0:(mdt_lvb.c:158:mdt_lvbo_fill()) lustre-MDT0000: expected 176 actual 128. 00000004:00000001:21.0:1366291257.880256:0:4881:0:(mdt_lvb.c:159:mdt_lvbo_fill()) Process leaving via out (rc=18446744073709551582 : -34 : 0xffffffffffffffde) 1, the default value of mdt_device->mdt_max_mdsize is 128bytes, #define MAX_MD_SIZE (sizeof(struct lov_mds_md) + 4 * sizeof(struct lov_ost_data) 2, before failover, the MD size is changed to 176bytes = (sizeof(struct lov_mds_md) + 6 * sizeof(struct lov_ost_data) mdt_device->mdt_max_mdsize is updated accordingly (see mdt_attr_get_lov, will update "mdt_max_mdsize" in "getattr" request) 3, after failover, the new MDT doesn't know the actual mdt_max_mdsize, and still use the default value, then client calls ll_layout_refresh to get the MD and the MDT will failed with ERANGE for there is still no "getattr" request to update mdt_device >mdt_max_mdsize! the patch is tracked at http://review.whamcloud.com/#change,6102
          yujian Jian Yu added a comment -

          Lustre Branch: master
          Lustre Build: http://build.whamcloud.com/job/lustre-master/1406/
          Distro/Arch: RHEL6.3/x86_64
          Test Group: failover
          FAILURE_MODE=HARD

          Please look into this report: https://maloo.whamcloud.com/test_sets/3d4c2120-a849-11e2-ba78-52540035b04c

          yujian Jian Yu added a comment - Lustre Branch: master Lustre Build: http://build.whamcloud.com/job/lustre-master/1406/ Distro/Arch: RHEL6.3/x86_64 Test Group: failover FAILURE_MODE=HARD Please look into this report: https://maloo.whamcloud.com/test_sets/3d4c2120-a849-11e2-ba78-52540035b04c

          in the logs of http://review.whamcloud.com/#change,6013

          the stripe data of the file used by "dd" is missed after failover, then it set the layout type of the file to Empty,
          which causes this error (-9, Bad file descriptor)

          ...
          00020000:00000001:0.0:1366159710.913287:0:8913:0:(lov_object.c:359:lov_fini_raid0()) Process entered
          00020000:00000010:0.0:1366159710.913288:0:8913:0:(lov_object.c:362:lov_fini_raid0()) kfreed 'r0->lo_sub': 56 at ffff88007d4e6940.
          00020000:00000010:0.0:1366159710.913290:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d1840.
          00020000:00000010:0.0:1366159710.913292:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d17c0.
          00020000:00000010:0.0:1366159710.913293:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d1740.
          00020000:00000010:0.0:1366159710.913293:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d16c0.
          00020000:00000010:0.0:1366159710.913294:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d1640.
          00020000:00000010:0.0:1366159710.913294:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d15c0.
          00020000:00000010:0.0:1366159710.913295:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo[i]': 112 at ffff8800373d1540.
          00020000:00000010:0.0:1366159710.913296:0:8913:0:(lov_ea.c:133:lsm_free_plain()) kfreed 'lsm': 128 at ffff88007b7cbc40.
          00020000:00000001:0.0:1366159710.913297:0:8913:0:(lov_object.c:369:lov_fini_raid0()) Process leaving
          00000020:00001000:0.0:1366159710.913297:0:8913:0:(cl_object.c:896:cl_env_put()) 1@ffff88007a373138
          00000020:00000010:0.0:1366159710.913298:0:8913:0:(lu_object.c:421:lu_global_key_fini()) kfreed 'info': 512 at ffff8800175e4200.
          00000020:00000010:0.0:1366159710.913299:0:8913:0:(cl_object.c:1090:cl0_key_fini()) kfreed 'info': 600 at ffff8800435e3000.
          00000008:00000010:0.0:1366159710.913300:0:8913:0:(osc_dev.c:135:osc_key_fini()) slab-freed '(info)': 984 at ffff88002ad267f0.
          00020000:00000010:0.0:1366159710.913301:0:8913:0:(lov_dev.c:159:lov_key_fini()) slab-freed '(info)': 288 at ffff88007db76be8.
          00000080:00000010:0.0:1366159710.913302:0:8913:0:(lcommon_cl.c:146:ccc_key_fini()) slab-freed '(info)': 352 at ffff88007db048e0.
          00000080:00000010:0.0:1366159710.913303:0:8913:0:(vvp_dev.c:96:vvp_key_fini()) slab-freed '(info)': 488 at ffff88002ad28420.
          00000020:00000010:0.0:1366159710.913304:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff8800175e4400.
          00000008:00000010:0.0:1366159710.913305:0:8913:0:(osc_dev.c:159:osc_session_fini()) slab-freed '(info)': 424 at ffff88007db05590.
          00020000:00000010:0.0:1366159710.913305:0:8913:0:(lov_dev.c:183:lov_session_key_fini()) slab-freed '(info)': 400 at ffff88007db74508.
          00000080:00000010:0.0:1366159710.913306:0:8913:0:(lcommon_cl.c:164:ccc_session_key_fini()) slab-freed '(session)': 184 at ffff88007a36b1f8.
          00000080:00000010:0.0:1366159710.913307:0:8913:0:(vvp_dev.c:114:vvp_session_key_fini()) slab-freed '(session)': 104 at ffff88007a36a7b0.
          00000020:00000010:0.0:1366159710.913308:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff880030d75800.
          00000020:00000010:0.0:1366159710.913309:0:8913:0:(cl_object.c:787:cl_env_fini()) slab-freed '(cle)': 168 at ffff88007a373138.
          00020000:00000001:0.0:1366159710.913310:0:8913:0:(lov_object.c:653:lov_layout_change()) Process leaving (rc=0 : 0 : 0)
          00020000:00000001:0.0:1366159710.913311:0:8913:0:(lov_object.c:731:lov_conf_set()) Process leaving
          00020000:00000001:0.0:1366159710.913312:0:8913:0:(lov_object.c:735:lov_conf_set()) Process leaving (rc=0 : 0 : 0)
          00000020:00000001:0.0:1366159710.913313:0:8913:0:(cl_object.c:329:cl_conf_set()) Process leaving (rc=0 : 0 : 0)
          00000020:00001000:0.0:1366159710.913314:0:8913:0:(cl_object.c:896:cl_env_put()) 1@ffff88007a373e58
          00000020:00000010:0.0:1366159710.913315:0:8913:0:(lu_object.c:421:lu_global_key_fini()) kfreed 'info': 512 at ffff88001fc15c00.
          00000020:00000010:0.0:1366159710.913316:0:8913:0:(cl_object.c:1090:cl0_key_fini()) kfreed 'info': 600 at ffff880049dc7c00.
          00000008:00000010:0.0:1366159710.913317:0:8913:0:(osc_dev.c:135:osc_key_fini()) slab-freed '(info)': 984 at ffff88007b751418.
          00020000:00000010:0.0:1366159710.913318:0:8913:0:(lov_dev.c:159:lov_key_fini()) slab-freed '(info)': 288 at ffff88007a370728.
          00000080:00000010:0.0:1366159710.913319:0:8913:0:(lcommon_cl.c:146:ccc_key_fini()) slab-freed '(info)': 352 at ffff88007a36f5e0.
          00000080:00000010:0.0:1366159710.913320:0:8913:0:(vvp_dev.c:96:vvp_key_fini()) slab-freed '(info)': 488 at ffff88007a36e238.
          00000020:00000010:0.0:1366159710.913320:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff88001765f400.
          00000008:00000010:0.0:1366159710.913321:0:8913:0:(osc_dev.c:159:osc_session_fini()) slab-freed '(info)': 424 at ffff88002ac25e18.
          00020000:00000010:0.0:1366159710.913322:0:8913:0:(lov_dev.c:183:lov_session_key_fini()) slab-freed '(info)': 400 at ffff88007db74828.
          00000080:00000010:0.0:1366159710.913323:0:8913:0:(lcommon_cl.c:164:ccc_session_key_fini()) slab-freed '(session)': 184 at ffff88007a36b870.
          00000080:00000010:0.0:1366159710.913323:0:8913:0:(vvp_dev.c:114:vvp_session_key_fini()) slab-freed '(session)': 104 at ffff88007a36a0c8.
          00000020:00000010:0.0:1366159710.913324:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff880017664000.
          00000020:00000010:0.0:1366159710.913325:0:8913:0:(cl_object.c:787:cl_env_fini()) slab-freed '(cle)': 168 at ffff88007a373e58.
          00000080:00000001:0.0:1366159710.913326:0:8913:0:(file.c:3236:ll_layout_conf()) Process leaving (rc=0 : 0 : 0)
          00010000:00000001:0.0:1366159710.913327:0:8913:0:(ldlm_lock.c:211:ldlm_lock_put()) Process entered
          00010000:00000001:0.0:1366159710.913328:0:8913:0:(ldlm_lock.c:244:ldlm_lock_put()) Process leaving
          00010000:00000001:0.0:1366159710.913329:0:8913:0:(ldlm_lock.c:609:__ldlm_handle2lock()) Process entered
          00000020:00000001:0.0:1366159710.913329:0:8913:0:(lustre_handles.c:172:class_handle2object()) Process entered
          00000020:00000001:0.0:1366159710.913330:0:8913:0:(lustre_handles.c:195:class_handle2object()) Process leaving (rc=18446612134411306496 : -131939298245120 : ffff88007cff0600)
          00010000:00000001:0.0:1366159710.913331:0:8913:0:(ldlm_lock.c:621:__ldlm_handle2lock()) Process leaving (rc=18446612134411306496 : -131939298245120 : ffff88007cff0600)
          00010000:00000001:0.0:1366159710.913332:0:8913:0:(ldlm_lock.c:876:ldlm_lock_decref_internal()) Process entered
          00010000:00010000:0.0:1366159710.913333:0:8913:0:(ldlm_lock.c:849:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: lustre-MDT0000-mdc-ffff880037d22800 lock: ffff88007cff0600/0x6e4fd575de0e4a90 lrc: 3/1,0 mode: CR/CR res: 8589935616/3 bits 0x8 rrc: 1 type: IBT flags: 0x20000000000 nid: local remote: 0xb524751f9485880b expref: -99 pid: 8913 timeout: 0 lvb_type: 3
          00010000:00000001:0.0:1366159710.913335:0:8913:0:(ldlm_lock.c:211:ldlm_lock_put()) Process entered
          00010000:00000001:0.0:1366159710.913335:0:8913:0:(ldlm_lock.c:244:ldlm_lock_put()) Process leaving
          00010000:00010000:0.0:1366159710.913337:0:8913:0:(ldlm_lock.c:931:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880037d22800 lock: ffff88007cff0600/0x6e4fd575de0e4a90 lrc: 2/0,0 mode: CR/CR res: 8589935616/3 bits 0x8 rrc: 1 type: IBT flags: 0x20000000000 nid: local remote: 0xb524751f9485880b expref: -99 pid: 8913 timeout: 0 lvb_type: 3
          00010000:00000001:0.0:1366159710.913341:0:8913:0:(ldlm_lock.c:312:ldlm_lock_add_to_lru()) Process entered
          00010000:00000001:0.0:1366159710.913342:0:8913:0:(ldlm_lock.c:316:ldlm_lock_add_to_lru()) Process leaving
          00010000:00000001:0.0:1366159710.913342:0:8913:0:(ldlm_lock.c:952:ldlm_lock_decref_internal()) Process leaving
          00010000:00000001:0.0:1366159710.913343:0:8913:0:(ldlm_lock.c:211:ldlm_lock_put()) Process entered
          00010000:00000001:0.0:1366159710.913343:0:8913:0:(ldlm_lock.c:244:ldlm_lock_put()) Process leaving
          00000080:00000001:0.0:1366159710.913344:0:8913:0:(file.c:3339:ll_layout_lock_set()) Process leaving (rc=0 : 0 : 0)
          00000080:00000001:0.0:1366159710.913345:0:8913:0:(file.c:3445:ll_layout_refresh()) Process leaving (rc=0 : 0 : 0)
          00000080:00000001:0.0:1366159710.913346:0:8913:0:(vvp_io.c:1220:vvp_io_init()) Process leaving (rc=0 : 0 : 0)
          00020000:00000001:0.0:1366159710.913347:0:8913:0:(lov_io.c:939:lov_io_init_empty()) Process entered
          00020000:00000001:0.0:1366159710.913347:0:8913:0:(lov_io.c:968:lov_io_init_empty()) Process leaving (rc=1 : 1 : 1)
          00000020:00000001:0.0:1366159710.913348:0:8913:0:(cl_io.c:178:cl_io_init0()) Process leaving (rc=1 : 1 : 1)
          00000020:00000001:0.0:1366159710.913349:0:8913:0:(cl_io.c:239:cl_io_rw_init()) Process leaving (rc=1 : 1 : 1)
          00000080:00000001:0.0:1366159710.913350:0:8913:0:(file.c:920:ll_file_io_generic()) Process leaving via out (rc=18446744073709551607 : -9 : 0xfffffffffffffff7)
          ...

          but the log of MDS is unavailable (it's at another node "wtm-15vm7"), and Yujian will help to collect it manually and will check it once the log is ready.

          hongchao.zhang Hongchao Zhang added a comment - in the logs of http://review.whamcloud.com/#change,6013 the stripe data of the file used by "dd" is missed after failover, then it set the layout type of the file to Empty, which causes this error (-9, Bad file descriptor) ... 00020000:00000001:0.0:1366159710.913287:0:8913:0:(lov_object.c:359:lov_fini_raid0()) Process entered 00020000:00000010:0.0:1366159710.913288:0:8913:0:(lov_object.c:362:lov_fini_raid0()) kfreed 'r0->lo_sub': 56 at ffff88007d4e6940. 00020000:00000010:0.0:1366159710.913290:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d1840. 00020000:00000010:0.0:1366159710.913292:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d17c0. 00020000:00000010:0.0:1366159710.913293:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d1740. 00020000:00000010:0.0:1366159710.913293:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d16c0. 00020000:00000010:0.0:1366159710.913294:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d1640. 00020000:00000010:0.0:1366159710.913294:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d15c0. 00020000:00000010:0.0:1366159710.913295:0:8913:0:(lov_ea.c:131:lsm_free_plain()) slab-freed 'lsm->lsm_oinfo [i] ': 112 at ffff8800373d1540. 00020000:00000010:0.0:1366159710.913296:0:8913:0:(lov_ea.c:133:lsm_free_plain()) kfreed 'lsm': 128 at ffff88007b7cbc40. 00020000:00000001:0.0:1366159710.913297:0:8913:0:(lov_object.c:369:lov_fini_raid0()) Process leaving 00000020:00001000:0.0:1366159710.913297:0:8913:0:(cl_object.c:896:cl_env_put()) 1@ffff88007a373138 00000020:00000010:0.0:1366159710.913298:0:8913:0:(lu_object.c:421:lu_global_key_fini()) kfreed 'info': 512 at ffff8800175e4200. 00000020:00000010:0.0:1366159710.913299:0:8913:0:(cl_object.c:1090:cl0_key_fini()) kfreed 'info': 600 at ffff8800435e3000. 00000008:00000010:0.0:1366159710.913300:0:8913:0:(osc_dev.c:135:osc_key_fini()) slab-freed '(info)': 984 at ffff88002ad267f0. 00020000:00000010:0.0:1366159710.913301:0:8913:0:(lov_dev.c:159:lov_key_fini()) slab-freed '(info)': 288 at ffff88007db76be8. 00000080:00000010:0.0:1366159710.913302:0:8913:0:(lcommon_cl.c:146:ccc_key_fini()) slab-freed '(info)': 352 at ffff88007db048e0. 00000080:00000010:0.0:1366159710.913303:0:8913:0:(vvp_dev.c:96:vvp_key_fini()) slab-freed '(info)': 488 at ffff88002ad28420. 00000020:00000010:0.0:1366159710.913304:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff8800175e4400. 00000008:00000010:0.0:1366159710.913305:0:8913:0:(osc_dev.c:159:osc_session_fini()) slab-freed '(info)': 424 at ffff88007db05590. 00020000:00000010:0.0:1366159710.913305:0:8913:0:(lov_dev.c:183:lov_session_key_fini()) slab-freed '(info)': 400 at ffff88007db74508. 00000080:00000010:0.0:1366159710.913306:0:8913:0:(lcommon_cl.c:164:ccc_session_key_fini()) slab-freed '(session)': 184 at ffff88007a36b1f8. 00000080:00000010:0.0:1366159710.913307:0:8913:0:(vvp_dev.c:114:vvp_session_key_fini()) slab-freed '(session)': 104 at ffff88007a36a7b0. 00000020:00000010:0.0:1366159710.913308:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff880030d75800. 00000020:00000010:0.0:1366159710.913309:0:8913:0:(cl_object.c:787:cl_env_fini()) slab-freed '(cle)': 168 at ffff88007a373138. 00020000:00000001:0.0:1366159710.913310:0:8913:0:(lov_object.c:653:lov_layout_change()) Process leaving (rc=0 : 0 : 0) 00020000:00000001:0.0:1366159710.913311:0:8913:0:(lov_object.c:731:lov_conf_set()) Process leaving 00020000:00000001:0.0:1366159710.913312:0:8913:0:(lov_object.c:735:lov_conf_set()) Process leaving (rc=0 : 0 : 0) 00000020:00000001:0.0:1366159710.913313:0:8913:0:(cl_object.c:329:cl_conf_set()) Process leaving (rc=0 : 0 : 0) 00000020:00001000:0.0:1366159710.913314:0:8913:0:(cl_object.c:896:cl_env_put()) 1@ffff88007a373e58 00000020:00000010:0.0:1366159710.913315:0:8913:0:(lu_object.c:421:lu_global_key_fini()) kfreed 'info': 512 at ffff88001fc15c00. 00000020:00000010:0.0:1366159710.913316:0:8913:0:(cl_object.c:1090:cl0_key_fini()) kfreed 'info': 600 at ffff880049dc7c00. 00000008:00000010:0.0:1366159710.913317:0:8913:0:(osc_dev.c:135:osc_key_fini()) slab-freed '(info)': 984 at ffff88007b751418. 00020000:00000010:0.0:1366159710.913318:0:8913:0:(lov_dev.c:159:lov_key_fini()) slab-freed '(info)': 288 at ffff88007a370728. 00000080:00000010:0.0:1366159710.913319:0:8913:0:(lcommon_cl.c:146:ccc_key_fini()) slab-freed '(info)': 352 at ffff88007a36f5e0. 00000080:00000010:0.0:1366159710.913320:0:8913:0:(vvp_dev.c:96:vvp_key_fini()) slab-freed '(info)': 488 at ffff88007a36e238. 00000020:00000010:0.0:1366159710.913320:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff88001765f400. 00000008:00000010:0.0:1366159710.913321:0:8913:0:(osc_dev.c:159:osc_session_fini()) slab-freed '(info)': 424 at ffff88002ac25e18. 00020000:00000010:0.0:1366159710.913322:0:8913:0:(lov_dev.c:183:lov_session_key_fini()) slab-freed '(info)': 400 at ffff88007db74828. 00000080:00000010:0.0:1366159710.913323:0:8913:0:(lcommon_cl.c:164:ccc_session_key_fini()) slab-freed '(session)': 184 at ffff88007a36b870. 00000080:00000010:0.0:1366159710.913323:0:8913:0:(vvp_dev.c:114:vvp_session_key_fini()) slab-freed '(session)': 104 at ffff88007a36a0c8. 00000020:00000010:0.0:1366159710.913324:0:8913:0:(lu_object.c:1531:keys_fini()) kfreed 'ctx->lc_value': 320 at ffff880017664000. 00000020:00000010:0.0:1366159710.913325:0:8913:0:(cl_object.c:787:cl_env_fini()) slab-freed '(cle)': 168 at ffff88007a373e58. 00000080:00000001:0.0:1366159710.913326:0:8913:0:(file.c:3236:ll_layout_conf()) Process leaving (rc=0 : 0 : 0) 00010000:00000001:0.0:1366159710.913327:0:8913:0:(ldlm_lock.c:211:ldlm_lock_put()) Process entered 00010000:00000001:0.0:1366159710.913328:0:8913:0:(ldlm_lock.c:244:ldlm_lock_put()) Process leaving 00010000:00000001:0.0:1366159710.913329:0:8913:0:(ldlm_lock.c:609:__ldlm_handle2lock()) Process entered 00000020:00000001:0.0:1366159710.913329:0:8913:0:(lustre_handles.c:172:class_handle2object()) Process entered 00000020:00000001:0.0:1366159710.913330:0:8913:0:(lustre_handles.c:195:class_handle2object()) Process leaving (rc=18446612134411306496 : -131939298245120 : ffff88007cff0600) 00010000:00000001:0.0:1366159710.913331:0:8913:0:(ldlm_lock.c:621:__ldlm_handle2lock()) Process leaving (rc=18446612134411306496 : -131939298245120 : ffff88007cff0600) 00010000:00000001:0.0:1366159710.913332:0:8913:0:(ldlm_lock.c:876:ldlm_lock_decref_internal()) Process entered 00010000:00010000:0.0:1366159710.913333:0:8913:0:(ldlm_lock.c:849:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(CR) ns: lustre-MDT0000-mdc-ffff880037d22800 lock: ffff88007cff0600/0x6e4fd575de0e4a90 lrc: 3/1,0 mode: CR/CR res: 8589935616/3 bits 0x8 rrc: 1 type: IBT flags: 0x20000000000 nid: local remote: 0xb524751f9485880b expref: -99 pid: 8913 timeout: 0 lvb_type: 3 00010000:00000001:0.0:1366159710.913335:0:8913:0:(ldlm_lock.c:211:ldlm_lock_put()) Process entered 00010000:00000001:0.0:1366159710.913335:0:8913:0:(ldlm_lock.c:244:ldlm_lock_put()) Process leaving 00010000:00010000:0.0:1366159710.913337:0:8913:0:(ldlm_lock.c:931:ldlm_lock_decref_internal()) ### add lock into lru list ns: lustre-MDT0000-mdc-ffff880037d22800 lock: ffff88007cff0600/0x6e4fd575de0e4a90 lrc: 2/0,0 mode: CR/CR res: 8589935616/3 bits 0x8 rrc: 1 type: IBT flags: 0x20000000000 nid: local remote: 0xb524751f9485880b expref: -99 pid: 8913 timeout: 0 lvb_type: 3 00010000:00000001:0.0:1366159710.913341:0:8913:0:(ldlm_lock.c:312:ldlm_lock_add_to_lru()) Process entered 00010000:00000001:0.0:1366159710.913342:0:8913:0:(ldlm_lock.c:316:ldlm_lock_add_to_lru()) Process leaving 00010000:00000001:0.0:1366159710.913342:0:8913:0:(ldlm_lock.c:952:ldlm_lock_decref_internal()) Process leaving 00010000:00000001:0.0:1366159710.913343:0:8913:0:(ldlm_lock.c:211:ldlm_lock_put()) Process entered 00010000:00000001:0.0:1366159710.913343:0:8913:0:(ldlm_lock.c:244:ldlm_lock_put()) Process leaving 00000080:00000001:0.0:1366159710.913344:0:8913:0:(file.c:3339:ll_layout_lock_set()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:0.0:1366159710.913345:0:8913:0:(file.c:3445:ll_layout_refresh()) Process leaving (rc=0 : 0 : 0) 00000080:00000001:0.0:1366159710.913346:0:8913:0:(vvp_io.c:1220:vvp_io_init()) Process leaving (rc=0 : 0 : 0) 00020000:00000001:0.0:1366159710.913347:0:8913:0:(lov_io.c:939:lov_io_init_empty()) Process entered 00020000:00000001:0.0:1366159710.913347:0:8913:0:(lov_io.c:968:lov_io_init_empty()) Process leaving (rc=1 : 1 : 1) 00000020:00000001:0.0:1366159710.913348:0:8913:0:(cl_io.c:178:cl_io_init0()) Process leaving (rc=1 : 1 : 1) 00000020:00000001:0.0:1366159710.913349:0:8913:0:(cl_io.c:239:cl_io_rw_init()) Process leaving (rc=1 : 1 : 1) 00000080:00000001:0.0:1366159710.913350:0:8913:0:(file.c:920:ll_file_io_generic()) Process leaving via out (rc=18446744073709551607 : -9 : 0xfffffffffffffff7) ... but the log of MDS is unavailable (it's at another node "wtm-15vm7"), and Yujian will help to collect it manually and will check it once the log is ready.

          People

            hongchao.zhang Hongchao Zhang
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: