Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3716

mds-survey ASSERTION( lu_device_is_mdt(o->lo_dev) )

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.4.0, Lustre 2.5.1
    • 3
    • 9570

    Description

      Observed fairly repeatably when running mds-survey.sh after the first few million creates. It appears unlikely to occur when creating less than 100,00 objects (the default).

      2013-08-06 15:05:51 LustreError: 4379:0:(mdt_handler.c:2352:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed:
      2013-08-06 15:05:51 LustreError: 4379:0:(mdt_handler.c:2352:mdt_obj()) LBUG
      

      The failures where observed with code post 2.4.0 which includes the LU-2758 fix.

      Attachments

        Activity

          [LU-3716] mds-survey ASSERTION( lu_device_is_mdt(o->lo_dev) )
          yujian Jian Yu added a comment -

          Can this patch be rebased?

          Sure, will do.

          yujian Jian Yu added a comment - Can this patch be rebased? Sure, will do.

          Can this patch be rebased?

          doug Doug Oucharek (Inactive) added a comment - Can this patch be rebased?
          yujian Jian Yu added a comment -

          Patch for master branch: http://review.whamcloud.com/10130

          yujian Jian Yu added a comment - Patch for master branch: http://review.whamcloud.com/10130
          di.wang Di Wang added a comment -

          IMHO, I do not think you need(should) change MDD. I think you create create a separate root object for echo access, so in echo_resolve_path, it will always start from that root object, and form its own tree, instead of using file system ROOT. And normal client will never see this tree under this special root.

          You can see how MDD create the root object in mdd_prepare.

          di.wang Di Wang added a comment - IMHO, I do not think you need(should) change MDD. I think you create create a separate root object for echo access, so in echo_resolve_path, it will always start from that root object, and form its own tree, instead of using file system ROOT. And normal client will never see this tree under this special root. You can see how MDD create the root object in mdd_prepare.
          yujian Jian Yu added a comment -

          Currently, in mdd_create(), both echo object and normal object are created via:

          mdd_object_create_internal()->...->osd_object_ea_create()
          

          and inserted into normal namespace via:

          __mdd_index_insert()->...->osd_index_ea_insert()->...->__osd_ea_add_rec()->osd_ldiskfs_add_entry()
          

          The objects are looked up via:

          mdd_lookup()->...->osd_index_ea_lookup()->osd_ea_lookup_rec()->osd_ldiskfs_find_entry()
          

          To separate echo objects from normal namespace, it seems we can check "FID_SEQ_ECHO" sequence number after mdd_object_create() and do not run __mdd_index_insert() for echo objects. However, this will make the echo objects unable to be looked up.

          A possible way is to make sure the "parent" object for echo object is always different from the one for normal object, as Di mentioned in previous comment that "Probably echo test should be done in a special directory".

          Hi Di, is this a proper way to resolve the issue? If yes, could you please suggest what special directory I should create in jt_obd_md_common()? Thanks.

          yujian Jian Yu added a comment - Currently, in mdd_create(), both echo object and normal object are created via: mdd_object_create_internal()->...->osd_object_ea_create() and inserted into normal namespace via: __mdd_index_insert()->...->osd_index_ea_insert()->...->__osd_ea_add_rec()->osd_ldiskfs_add_entry() The objects are looked up via: mdd_lookup()->...->osd_index_ea_lookup()->osd_ea_lookup_rec()->osd_ldiskfs_find_entry() To separate echo objects from normal namespace, it seems we can check "FID_SEQ_ECHO" sequence number after mdd_object_create() and do not run __mdd_index_insert() for echo objects. However, this will make the echo objects unable to be looked up. A possible way is to make sure the "parent" object for echo object is always different from the one for normal object, as Di mentioned in previous comment that "Probably echo test should be done in a special directory". Hi Di, is this a proper way to resolve the issue? If yes, could you please suggest what special directory I should create in jt_obd_md_common()? Thanks.
          di.wang Di Wang added a comment - - edited

          Yes, that is what I mean, i.e. we should separate echo objects from the normal namespace, so normal clients will not access these echo objects.

          di.wang Di Wang added a comment - - edited Yes, that is what I mean, i.e. we should separate echo objects from the normal namespace, so normal clients will not access these echo objects.
          yujian Jian Yu added a comment -

          For echo client, the echo md objects are created by echo_md_create_internal()->mdd_create():

          00008000:00100000:1.0:1397818339.647451:0:3611:0:(echo_client.c:1602:echo_md_create_internal()) Start creating object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0
          00000004:00000001:1.0:1397818339.647453:0:3611:0:(mdd_dir.c:2164:mdd_create()) Process entered
          ......
          00000004:00000001:1.0:1397818339.647535:0:3611:0:(mdd_object.c:375:mdd_object_create_internal()) Process entered
          ......
          00000004:00000001:1.0:1397818339.647564:0:3611:0:(mdd_object.c:383:mdd_object_create_internal()) Process leaving (rc=0 : 0 : 0)
          ......
          00000004:00000001:1.0:1397818339.647606:0:3611:0:(mdd_dir.c:2272:mdd_create()) Process leaving
          00008000:00100000:1.0:1397818339.647614:0:3611:0:(echo_client.c:1614:echo_md_create_internal()) End creating object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 rc  = 0
          

          And looked up by echo_lookup_object()->mdd_lookup():

          00008000:00100000:1.0:1397818505.932739:0:3626:0:(echo_client.c:1897:echo_lookup_object()) Start lookup object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0
          00000004:00000001:1.0:1397818505.932741:0:3626:0:(mdd_dir.c:115:mdd_lookup()) Process entered
          00000004:00000001:1.0:1397818505.932741:0:3626:0:(mdd_dir.c:72:__mdd_lookup()) Process entered
          ......
          00000004:00000001:1.0:1397818505.932760:0:3626:0:(mdd_dir.c:106:__mdd_lookup()) Process leaving (rc=0 : 0 : 0)
          00000004:00000001:1.0:1397818505.932761:0:3626:0:(mdd_dir.c:122:mdd_lookup()) Process leaving (rc=0 : 0 : 0)
          00008000:00100000:1.0:1397818505.932762:0:3626:0:(echo_client.c:1905:echo_lookup_object()) End lookup object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0
          

          And destroyed by echo_md_destroy_internal()->mdd_unlink():

          00008000:00100000:9.0:1397818817.663055:0:3650:0:(echo_client.c:1944:echo_md_destroy_internal()) Start destroy object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0
          00000004:00000001:9.0:1397818817.663057:0:3650:0:(mdd_dir.c:1499:mdd_unlink()) Process entered
          ......
          00000004:00000001:9.0:1397818817.663166:0:3650:0:(mdd_dir.c:1603:mdd_unlink()) Process leaving
          00008000:00100000:9.0:1397818817.663182:0:3650:0:(echo_client.c:1953:echo_md_destroy_internal()) End destroy object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0
          

          Hi Di,

          If we allow echo client and normal client attached and operate at the same time, we should distinguish echo md objects from normal md objects, do I understand correctly?

          yujian Jian Yu added a comment - For echo client, the echo md objects are created by echo_md_create_internal()->mdd_create(): 00008000:00100000:1.0:1397818339.647451:0:3611:0:(echo_client.c:1602:echo_md_create_internal()) Start creating object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 00000004:00000001:1.0:1397818339.647453:0:3611:0:(mdd_dir.c:2164:mdd_create()) Process entered ...... 00000004:00000001:1.0:1397818339.647535:0:3611:0:(mdd_object.c:375:mdd_object_create_internal()) Process entered ...... 00000004:00000001:1.0:1397818339.647564:0:3611:0:(mdd_object.c:383:mdd_object_create_internal()) Process leaving (rc=0 : 0 : 0) ...... 00000004:00000001:1.0:1397818339.647606:0:3611:0:(mdd_dir.c:2272:mdd_create()) Process leaving 00008000:00100000:1.0:1397818339.647614:0:3611:0:(echo_client.c:1614:echo_md_create_internal()) End creating object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 rc = 0 And looked up by echo_lookup_object()->mdd_lookup(): 00008000:00100000:1.0:1397818505.932739:0:3626:0:(echo_client.c:1897:echo_lookup_object()) Start lookup object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 00000004:00000001:1.0:1397818505.932741:0:3626:0:(mdd_dir.c:115:mdd_lookup()) Process entered 00000004:00000001:1.0:1397818505.932741:0:3626:0:(mdd_dir.c:72:__mdd_lookup()) Process entered ...... 00000004:00000001:1.0:1397818505.932760:0:3626:0:(mdd_dir.c:106:__mdd_lookup()) Process leaving (rc=0 : 0 : 0) 00000004:00000001:1.0:1397818505.932761:0:3626:0:(mdd_dir.c:122:mdd_lookup()) Process leaving (rc=0 : 0 : 0) 00008000:00100000:1.0:1397818505.932762:0:3626:0:(echo_client.c:1905:echo_lookup_object()) End lookup object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 And destroyed by echo_md_destroy_internal()->mdd_unlink(): 00008000:00100000:9.0:1397818817.663055:0:3650:0:(echo_client.c:1944:echo_md_destroy_internal()) Start destroy object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 00000004:00000001:9.0:1397818817.663057:0:3650:0:(mdd_dir.c:1499:mdd_unlink()) Process entered ...... 00000004:00000001:9.0:1397818817.663166:0:3650:0:(mdd_dir.c:1603:mdd_unlink()) Process leaving 00008000:00100000:9.0:1397818817.663182:0:3650:0:(echo_client.c:1953:echo_md_destroy_internal()) End destroy object [0x200000400:0x1:0x0] 3373 ffff88031cd5a5a0 Hi Di, If we allow echo client and normal client attached and operate at the same time, we should distinguish echo md objects from normal md objects, do I understand correctly?
          di.wang Di Wang added a comment -

          Hmm, the right fix might be disconnect echo objects to the name space, i.e. these objects can only be accessed by echo client, then echo client and normal client can be attached at the same time. Probably echo test should be done in a special directory.

          di.wang Di Wang added a comment - Hmm, the right fix might be disconnect echo objects to the name space, i.e. these objects can only be accessed by echo client, then echo client and normal client can be attached at the same time. Probably echo test should be done in a special directory.
          yujian Jian Yu added a comment -

          Both sanity test 225a and 225b run mds-survey.

          I saw the following error in the test output of sanity test 225a in https://maloo.whamcloud.com/test_sets/f9ad31f6-aac8-11e3-a41c-52540035b04c :

          ERROR: Module obdecho is in use
          

          So, this is the same situation that Wang Di described above:

          Usually this LBUG means mds-survey and other clients(probably normal lustre client) access the same object at the same time, or the object has been accessed by normal clients previously, and object stack is still in the server cache.

          Wang Di, should I improve mds-survey to check whether obdecho is in use or not so as to avoid the LBUG?

          yujian Jian Yu added a comment - Both sanity test 225a and 225b run mds-survey. I saw the following error in the test output of sanity test 225a in https://maloo.whamcloud.com/test_sets/f9ad31f6-aac8-11e3-a41c-52540035b04c : ERROR: Module obdecho is in use So, this is the same situation that Wang Di described above: Usually this LBUG means mds-survey and other clients(probably normal lustre client) access the same object at the same time, or the object has been accessed by normal clients previously, and object stack is still in the server cache. Wang Di, should I improve mds-survey to check whether obdecho is in use or not so as to avoid the LBUG?

          Running 2.5.1-RC4, using ldiskfs, I ran into this assertion when running sanity test_225a. The test results are here https://maloo.whamcloud.com/test_sets/13e6472a-aba9-11e3-a696-52540035b04c, but there are not MDS logs. I was able to capture some information from the crash dump.

          From the MDSs console

          Message from syslogd@c16 at Mar 13 16:10:16 ...
           kernel:LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: 
          
          Message from syslogd@c16 at Mar 13 16:10:16 ...
           kernel:LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) LBUG
          

          From dmesg save by crash:

          <4>Lustre: DEBUG MARKER: == sanity test 225a: Metadata survey sanity with zero-stripe == 16:10:12 (1394752212)
          <6>Lustre: Echo OBD driver; http://www.lustre.org/
          <3>LustreError: 2019:0:(echo_client.c:1743:echo_md_lookup()) lookup MDT0000-tests: rc = -2
          <3>LustreError: 2019:0:(echo_client.c:1942:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2
          <3>LustreError: 2041:0:(echo_client.c:1743:echo_md_lookup()) lookup MDT0000-tests1: rc = -2
          <3>LustreError: 2041:0:(echo_client.c:1942:echo_md_destroy_internal()) Can't find child MDT0000-tests1: rc = -2
          <0>LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: 
          <0>LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) LBUG
          <4>Pid: 32557, comm: mdt00_004
          <4>
          <4>Call Trace:
          <4> [<ffffffffa052e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
          <4> [<ffffffffa052ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
          <4> [<ffffffffa0e37365>] mdt_obj+0x55/0x80 [mdt]
          <4> [<ffffffffa0e3bc46>] mdt_object_find+0x66/0x170 [mdt]
          <4> [<ffffffffa0e4ef84>] mdt_getattr_name_lock+0x7f4/0x1aa0 [mdt]
          <4> [<ffffffffa0821ba5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
          <4> [<ffffffffa0848bb6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
          <4> [<ffffffffa0823e34>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
          <4> [<ffffffffa0e504c9>] mdt_intent_getattr+0x299/0x480 [mdt]
          <4> [<ffffffffa0e3ea0e>] mdt_intent_policy+0x3ae/0x770 [mdt]
          <4> [<ffffffffa07d9511>] ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc]
          <4> [<ffffffffa0802abf>] ldlm_handle_enqueue0+0x4ef/0x10a0 [ptlrpc]
          <4> [<ffffffffa0e3eed6>] mdt_enqueue+0x46/0xe0 [mdt]
          <4> [<ffffffffa0e45bca>] mdt_handle_common+0x52a/0x1470 [mdt]
          <4> [<ffffffffa0e80545>] mds_regular_handle+0x15/0x20 [mdt]
          <4> [<ffffffffa0832a45>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
          <4> [<ffffffffa05403df>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
          <4> [<ffffffffa082a0e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
          <4> [<ffffffffa0833dad>] ptlrpc_main+0xaed/0x1740 [ptlrpc]
          <4> [<ffffffffa08332c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
          <4> [<ffffffff8109aee6>] kthread+0x96/0xa0
          <4> [<ffffffff8100c20a>] child_rip+0xa/0x20
          <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
          <3>LustreError: 2242:0:(echo_client.c:1016:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
          <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
          <4>
          <0>Kernel panic - not syncing: LBUG
          <4>Pid: 32557, comm: mdt00_004 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1
          <4>Call Trace:
          <4> [<ffffffff81527983>] ? panic+0xa7/0x16f
          <4> [<ffffffffa052eeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
          <4> [<ffffffffa0e37365>] ? mdt_obj+0x55/0x80 [mdt]
          <4> [<ffffffffa0e3bc46>] ? mdt_object_find+0x66/0x170 [mdt]
          <4> [<ffffffffa0e4ef84>] ? mdt_getattr_name_lock+0x7f4/0x1aa0 [mdt]
          <4> [<ffffffffa0821ba5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
          <4> [<ffffffffa0848bb6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
          <4> [<ffffffffa0823e34>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc]
          <4> [<ffffffffa0e504c9>] ? mdt_intent_getattr+0x299/0x480 [mdt]
          <4> [<ffffffffa0e3ea0e>] ? mdt_intent_policy+0x3ae/0x770 [mdt]
          <4> [<ffffffffa07d9511>] ? ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc]
          <3>LustreError: 2242:0:(echo_client.c:1016:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second
          <4> [<ffffffffa0802abf>] ? ldlm_handle_enqueue0+0x4ef/0x10a0 [ptlrpc]
          <4> [<ffffffffa0e3eed6>] ? mdt_enqueue+0x46/0xe0 [mdt]
          <4> [<ffffffffa0e45bca>] ? mdt_handle_common+0x52a/0x1470 [mdt]
          <4> [<ffffffffa0e80545>] ? mds_regular_handle+0x15/0x20 [mdt]
          <4> [<ffffffffa0832a45>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
          <4> [<ffffffffa05403df>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
          <4> [<ffffffffa082a0e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
          <4> [<ffffffffa0833dad>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc]
          <4> [<ffffffffa08332c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc]
          <4>Lustre: DEBUG MARKER: sanity test_225a: @@@@@@ FAIL: LBUG/LASSERT detected
          <4> [<ffffffff8109aee6>] ? kthread+0x96/0xa0
          <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20
          <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0
          <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20
          

          I experienced this crash both times I ran sanity test 225a.

          jamesanunez James Nunez (Inactive) added a comment - Running 2.5.1-RC4, using ldiskfs, I ran into this assertion when running sanity test_225a. The test results are here https://maloo.whamcloud.com/test_sets/13e6472a-aba9-11e3-a696-52540035b04c , but there are not MDS logs. I was able to capture some information from the crash dump. From the MDSs console Message from syslogd@c16 at Mar 13 16:10:16 ... kernel:LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: Message from syslogd@c16 at Mar 13 16:10:16 ... kernel:LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) LBUG From dmesg save by crash: <4>Lustre: DEBUG MARKER: == sanity test 225a: Metadata survey sanity with zero-stripe == 16:10:12 (1394752212) <6>Lustre: Echo OBD driver; http://www.lustre.org/ <3>LustreError: 2019:0:(echo_client.c:1743:echo_md_lookup()) lookup MDT0000-tests: rc = -2 <3>LustreError: 2019:0:(echo_client.c:1942:echo_md_destroy_internal()) Can't find child MDT0000-tests: rc = -2 <3>LustreError: 2041:0:(echo_client.c:1743:echo_md_lookup()) lookup MDT0000-tests1: rc = -2 <3>LustreError: 2041:0:(echo_client.c:1942:echo_md_destroy_internal()) Can't find child MDT0000-tests1: rc = -2 <0>LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) ASSERTION( lu_device_is_mdt(o->lo_dev) ) failed: <0>LustreError: 32557:0:(mdt_handler.c:2414:mdt_obj()) LBUG <4>Pid: 32557, comm: mdt00_004 <4> <4>Call Trace: <4> [<ffffffffa052e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa052ee97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0e37365>] mdt_obj+0x55/0x80 [mdt] <4> [<ffffffffa0e3bc46>] mdt_object_find+0x66/0x170 [mdt] <4> [<ffffffffa0e4ef84>] mdt_getattr_name_lock+0x7f4/0x1aa0 [mdt] <4> [<ffffffffa0821ba5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] <4> [<ffffffffa0848bb6>] ? __req_capsule_get+0x166/0x710 [ptlrpc] <4> [<ffffffffa0823e34>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] <4> [<ffffffffa0e504c9>] mdt_intent_getattr+0x299/0x480 [mdt] <4> [<ffffffffa0e3ea0e>] mdt_intent_policy+0x3ae/0x770 [mdt] <4> [<ffffffffa07d9511>] ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc] <4> [<ffffffffa0802abf>] ldlm_handle_enqueue0+0x4ef/0x10a0 [ptlrpc] <4> [<ffffffffa0e3eed6>] mdt_enqueue+0x46/0xe0 [mdt] <4> [<ffffffffa0e45bca>] mdt_handle_common+0x52a/0x1470 [mdt] <4> [<ffffffffa0e80545>] mds_regular_handle+0x15/0x20 [mdt] <4> [<ffffffffa0832a45>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] <4> [<ffffffffa05403df>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4> [<ffffffffa082a0e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] <4> [<ffffffffa0833dad>] ptlrpc_main+0xaed/0x1740 [ptlrpc] <4> [<ffffffffa08332c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc] <4> [<ffffffff8109aee6>] kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <3>LustreError: 2242:0:(echo_client.c:1016:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 32557, comm: mdt00_004 Not tainted 2.6.32-431.5.1.el6_lustre.x86_64 #1 <4>Call Trace: <4> [<ffffffff81527983>] ? panic+0xa7/0x16f <4> [<ffffffffa052eeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0e37365>] ? mdt_obj+0x55/0x80 [mdt] <4> [<ffffffffa0e3bc46>] ? mdt_object_find+0x66/0x170 [mdt] <4> [<ffffffffa0e4ef84>] ? mdt_getattr_name_lock+0x7f4/0x1aa0 [mdt] <4> [<ffffffffa0821ba5>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] <4> [<ffffffffa0848bb6>] ? __req_capsule_get+0x166/0x710 [ptlrpc] <4> [<ffffffffa0823e34>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] <4> [<ffffffffa0e504c9>] ? mdt_intent_getattr+0x299/0x480 [mdt] <4> [<ffffffffa0e3ea0e>] ? mdt_intent_policy+0x3ae/0x770 [mdt] <4> [<ffffffffa07d9511>] ? ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc] <3>LustreError: 2242:0:(echo_client.c:1016:echo_device_free()) echo_client still has objects at cleanup time, wait for 1 second <4> [<ffffffffa0802abf>] ? ldlm_handle_enqueue0+0x4ef/0x10a0 [ptlrpc] <4> [<ffffffffa0e3eed6>] ? mdt_enqueue+0x46/0xe0 [mdt] <4> [<ffffffffa0e45bca>] ? mdt_handle_common+0x52a/0x1470 [mdt] <4> [<ffffffffa0e80545>] ? mds_regular_handle+0x15/0x20 [mdt] <4> [<ffffffffa0832a45>] ? ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] <4> [<ffffffffa05403df>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4> [<ffffffffa082a0e9>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] <4> [<ffffffffa0833dad>] ? ptlrpc_main+0xaed/0x1740 [ptlrpc] <4> [<ffffffffa08332c0>] ? ptlrpc_main+0x0/0x1740 [ptlrpc] <4>Lustre: DEBUG MARKER: sanity test_225a: @@@@@@ FAIL: LBUG/LASSERT detected <4> [<ffffffff8109aee6>] ? kthread+0x96/0xa0 <4> [<ffffffff8100c20a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109ae50>] ? kthread+0x0/0xa0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 I experienced this crash both times I ran sanity test 225a.

          I ran into this assertion when running sanity test 225a https://maloo.whamcloud.com/test_sessions/f6c5b8e6-aac8-11e3-a41c-52540035b04c .
          Unfortunately, there are no logs on the MDS. I'll try and run again to see what I can collect.

          jamesanunez James Nunez (Inactive) added a comment - I ran into this assertion when running sanity test 225a https://maloo.whamcloud.com/test_sessions/f6c5b8e6-aac8-11e3-a41c-52540035b04c . Unfortunately, there are no logs on the MDS. I'll try and run again to see what I can collect.

          People

            yujian Jian Yu
            behlendorf Brian Behlendorf
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: