Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.4.1
    • None
    • 3
    • 12649

    Description

      Creating a file with stripe > 167 fails. If a second setripe is attempted on the same file the mdt LBUG.

      mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> cat /proc/fs/lustre/version 
      lustre: 2.4.1
      kernel: ../lustre/scripts
      build:  3nasC_ofed154
      mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs setstripe -c 166 test169
      mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs setstripe -c 167 test167
      mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs setstripe -c 168 test168
      error on ioctl 0x4008669a for 'test168' (3): No space left on device
      error: setstripe: create stripe file 'test168' failed
      mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs getstripe test168
      test168 has no stripe info
      

      LBUG OUTPUT

      LNet: 1919:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Skipped 11 previous similar messages^M
      LustreError: 4699:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: ^M
      LustreError: 4699:0:(lod_object.c:704:lod_ah_init()) LBUG^M
      Pid: 4699, comm: mdt03_002^M
      ^M
      Call Trace:^M
       [<ffffffffa050c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
       [<ffffffffa050ce97>] lbug_with_loc+0x47/0xb0 [libcfs]^M
       [<ffffffffa0faa78f>] lod_ah_init+0x57f/0x5c0 [lod]^M
       [<ffffffffa0c62a83>] mdd_object_make_hint+0x83/0xa0 [mdd]^M
       [<ffffffffa0c6eeb2>] mdd_create_data+0x332/0x7d0 [mdd]^M
       [<ffffffffa0f08d8c>] mdt_finish_open+0x125c/0x1950 [mdt]^M
       [<ffffffffa0f04658>] ? mdt _ 2o3b joeuctt_ oopf en24_ lcopcuks+ 0xi1n c8k/d0b,x5 w1a0i [tmidntg ]^Mf
      or the rest, timeout in 10 second(s)^M
       [<ffffffffa0f0af26>] mdt_reint_open+0xfe6/0x20e0 [mdt]^M
      .All cpus are now in kdb^M
      
      

      MDT VERSION

      nbp9-mds ~ # cat /proc/fs/lustre/version 
      lustre: 2.4.1
      kernel: 2.6.32-358.23.2.el6.20140115.x86_64.lustre241
      build:  5.2nasS_ofed154
      

      Attachments

        Issue Links

          Activity

            [LU-4623] creating file stripe > 167 fails

            I think LU-4791 supersedes LU-4260.
            I want to document here since we filed this one.

            jaylan Jay Lan (Inactive) added a comment - I think LU-4791 supersedes LU-4260 . I want to document here since we filed this one.
            mhanafi Mahmoud Hanafi added a comment - - edited

            We still hit this but with patch LU-4260 applied

            Running lustre-2.4.1-6nas source at https://github.com/jlan/lustre-nas

            <6>Lustre: nbp9-MDT0000: Recovery over after 2:04, of 11086 clients 11086 recovered and 0 were evicted.
            <0>LustreError: 5367:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed:
            <0>LustreError: 5367:0:(lod_object.c:704:lod_ah_init()) LBUG
            <4>Pid: 5367, comm: mdt00_009
            <4>
            <4>Call Trace:
            <4> [<ffffffffa0511895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
            <4> [<ffffffffa0511e97>] lbug_with_loc+0x47/0xb0 [libcfs]
            <4> [<ffffffffa0faa77f>] lod_ah_init+0x57f/0x5c0 [lod]
            <4> [<ffffffffa0c67a83>] mdd_object_make_hint+0x83/0xa0 [mdd]
            <4> [<ffffffffa0c73ec2>] mdd_create_data+0x332/0x7d0 [mdd]
            <4> [<ffffffffa0f08d8c>] mdt_finish_open+0x125c/0x1950 [mdt]
            <4> [<ffffffffa0f04658>] ? mdt_object_open_lock+0x1c8/0x510 [mdt]
            <4> [<ffffffffa0f0af26>] mdt_reint_open+0xfe6/0x20e0 [mdt]
            <4> [<ffffffffa052e85e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs]
            <4> [<ffffffffa07f7ddc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc]
            <4> [<ffffffffa0ef5981>] mdt_reint_rec+0x41/0xe0 [mdt]
            <4> [<ffffffffa0edab03>] mdt_reint_internal+0x4c3/0x780 [mdt]
            <4> [<ffffffffa0edb090>] mdt_intent_reint+0x1f0/0x530 [mdt]
            <4> [<ffffffffa0ed8f3e>] mdt_intent_policy+0x39e/0x720 [mdt]
            <4> [<ffffffffa07af831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc]
            <4> [<ffffffffa07d61ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
            <4> [<ffffffffa0ed93c6>] mdt_enqueue+0x46/0xe0 [mdt]
            <4> [<ffffffffa0edfad7>] mdt_handle_common+0x647/0x16d0 [mdt]
            <4> [<ffffffffa0f19615>] mds_regular_handle+0x15/0x20 [mdt]
            <4> [<ffffffffa08083d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
            <4> [<ffffffffa05125de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
            <4> [<ffffffffa0523d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
            <4> [<ffffffffa07ff739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
            <4> [<ffffffff81055813>] ? __wake_up+0x53/0x70
            <4> [<ffffffffa080976e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
            <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
            <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
            <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
            <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
            <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            
            mhanafi Mahmoud Hanafi added a comment - - edited We still hit this but with patch LU-4260 applied Running lustre-2.4.1-6nas source at https://github.com/jlan/lustre-nas <6>Lustre: nbp9-MDT0000: Recovery over after 2:04, of 11086 clients 11086 recovered and 0 were evicted. <0>LustreError: 5367:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: <0>LustreError: 5367:0:(lod_object.c:704:lod_ah_init()) LBUG <4>Pid: 5367, comm: mdt00_009 <4> <4>Call Trace: <4> [<ffffffffa0511895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0511e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0faa77f>] lod_ah_init+0x57f/0x5c0 [lod] <4> [<ffffffffa0c67a83>] mdd_object_make_hint+0x83/0xa0 [mdd] <4> [<ffffffffa0c73ec2>] mdd_create_data+0x332/0x7d0 [mdd] <4> [<ffffffffa0f08d8c>] mdt_finish_open+0x125c/0x1950 [mdt] <4> [<ffffffffa0f04658>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] <4> [<ffffffffa0f0af26>] mdt_reint_open+0xfe6/0x20e0 [mdt] <4> [<ffffffffa052e85e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] <4> [<ffffffffa07f7ddc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] <4> [<ffffffffa0ef5981>] mdt_reint_rec+0x41/0xe0 [mdt] <4> [<ffffffffa0edab03>] mdt_reint_internal+0x4c3/0x780 [mdt] <4> [<ffffffffa0edb090>] mdt_intent_reint+0x1f0/0x530 [mdt] <4> [<ffffffffa0ed8f3e>] mdt_intent_policy+0x39e/0x720 [mdt] <4> [<ffffffffa07af831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] <4> [<ffffffffa07d61ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] <4> [<ffffffffa0ed93c6>] mdt_enqueue+0x46/0xe0 [mdt] <4> [<ffffffffa0edfad7>] mdt_handle_common+0x647/0x16d0 [mdt] <4> [<ffffffffa0f19615>] mds_regular_handle+0x15/0x20 [mdt] <4> [<ffffffffa08083d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] <4> [<ffffffffa05125de>] ? cfs_timer_arm+0xe/0x10 [libcfs] <4> [<ffffffffa0523d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4> [<ffffffffa07ff739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] <4> [<ffffffff81055813>] ? __wake_up+0x53/0x70 <4> [<ffffffffa080976e>] ptlrpc_main+0xace/0x1700 [ptlrpc] <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20 <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
            pjones Peter Jones added a comment -

            ok I have fixed this Jay.

            pjones Peter Jones added a comment - ok I have fixed this Jay.

            I think it was a mistake that this bug was marked as duplicate of LU-4620 and closed. It should be LU-4260 instead. Please fix it.

            jaylan Jay Lan (Inactive) added a comment - I think it was a mistake that this bug was marked as duplicate of LU-4620 and closed. It should be LU-4260 instead. Please fix it.
            green Oleg Drokin added a comment -

            I think this is a duplicate of LU-4260

            green Oleg Drokin added a comment - I think this is a duplicate of LU-4260
            emoly.liu Emoly Liu added a comment -

            I can reproduce it and will investigate it.

            emoly.liu Emoly Liu added a comment - I can reproduce it and will investigate it.
            pjones Peter Jones added a comment -

            Emoly

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please look into this one? Thanks Peter

            People

              emoly.liu Emoly Liu
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: