[LU-4623] creating file stripe > 167 fails Created: 12/Feb/14 Updated: 18/Apr/14 Resolved: 13/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Mahmoud Hanafi | Assignee: | Emoly Liu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12649 | ||||||||
| Description |
|
Creating a file with stripe > 167 fails. If a second setripe is attempted on the same file the mdt LBUG. mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> cat /proc/fs/lustre/version lustre: 2.4.1 kernel: ../lustre/scripts build: 3nasC_ofed154 mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs setstripe -c 166 test169 mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs setstripe -c 167 test167 mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs setstripe -c 168 test168 error on ioctl 0x4008669a for 'test168' (3): No space left on device error: setstripe: create stripe file 'test168' failed mhanafi@pfe20:/nobackupp9/mhanafi/teststripe> lfs getstripe test168 test168 has no stripe info LBUG OUTPUT LNet: 1919:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Skipped 11 previous similar messages^M
LustreError: 4699:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: ^M
LustreError: 4699:0:(lod_object.c:704:lod_ah_init()) LBUG^M
Pid: 4699, comm: mdt03_002^M
^M
Call Trace:^M
[<ffffffffa050c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
[<ffffffffa050ce97>] lbug_with_loc+0x47/0xb0 [libcfs]^M
[<ffffffffa0faa78f>] lod_ah_init+0x57f/0x5c0 [lod]^M
[<ffffffffa0c62a83>] mdd_object_make_hint+0x83/0xa0 [mdd]^M
[<ffffffffa0c6eeb2>] mdd_create_data+0x332/0x7d0 [mdd]^M
[<ffffffffa0f08d8c>] mdt_finish_open+0x125c/0x1950 [mdt]^M
[<ffffffffa0f04658>] ? mdt _ 2o3b joeuctt_ oopf en24_ lcopcuks+ 0xi1n c8k/d0b,x5 w1a0i [tmidntg ]^Mf
or the rest, timeout in 10 second(s)^M
[<ffffffffa0f0af26>] mdt_reint_open+0xfe6/0x20e0 [mdt]^M
.All cpus are now in kdb^M
MDT VERSION nbp9-mds ~ # cat /proc/fs/lustre/version lustre: 2.4.1 kernel: 2.6.32-358.23.2.el6.20140115.x86_64.lustre241 build: 5.2nasS_ofed154 |
| Comments |
| Comment by Peter Jones [ 12/Feb/14 ] |
|
Emoly Could you please look into this one? Thanks Peter |
| Comment by Emoly Liu [ 13/Feb/14 ] |
|
I can reproduce it and will investigate it. |
| Comment by Oleg Drokin [ 13/Feb/14 ] |
|
I think this is a duplicate of |
| Comment by Jay Lan (Inactive) [ 13/Feb/14 ] |
|
I think it was a mistake that this bug was marked as duplicate of |
| Comment by Peter Jones [ 13/Feb/14 ] |
|
ok I have fixed this Jay. |
| Comment by Mahmoud Hanafi [ 24/Mar/14 ] |
|
We still hit this but with patch Running lustre-2.4.1-6nas source at https://github.com/jlan/lustre-nas <6>Lustre: nbp9-MDT0000: Recovery over after 2:04, of 11086 clients 11086 recovered and 0 were evicted. <0>LustreError: 5367:0:(lod_object.c:704:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: <0>LustreError: 5367:0:(lod_object.c:704:lod_ah_init()) LBUG <4>Pid: 5367, comm: mdt00_009 <4> <4>Call Trace: <4> [<ffffffffa0511895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0511e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0faa77f>] lod_ah_init+0x57f/0x5c0 [lod] <4> [<ffffffffa0c67a83>] mdd_object_make_hint+0x83/0xa0 [mdd] <4> [<ffffffffa0c73ec2>] mdd_create_data+0x332/0x7d0 [mdd] <4> [<ffffffffa0f08d8c>] mdt_finish_open+0x125c/0x1950 [mdt] <4> [<ffffffffa0f04658>] ? mdt_object_open_lock+0x1c8/0x510 [mdt] <4> [<ffffffffa0f0af26>] mdt_reint_open+0xfe6/0x20e0 [mdt] <4> [<ffffffffa052e85e>] ? upcall_cache_get_entry+0x28e/0x860 [libcfs] <4> [<ffffffffa07f7ddc>] ? lustre_msg_add_version+0x6c/0xc0 [ptlrpc] <4> [<ffffffffa0ef5981>] mdt_reint_rec+0x41/0xe0 [mdt] <4> [<ffffffffa0edab03>] mdt_reint_internal+0x4c3/0x780 [mdt] <4> [<ffffffffa0edb090>] mdt_intent_reint+0x1f0/0x530 [mdt] <4> [<ffffffffa0ed8f3e>] mdt_intent_policy+0x39e/0x720 [mdt] <4> [<ffffffffa07af831>] ldlm_lock_enqueue+0x361/0x8d0 [ptlrpc] <4> [<ffffffffa07d61ef>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] <4> [<ffffffffa0ed93c6>] mdt_enqueue+0x46/0xe0 [mdt] <4> [<ffffffffa0edfad7>] mdt_handle_common+0x647/0x16d0 [mdt] <4> [<ffffffffa0f19615>] mds_regular_handle+0x15/0x20 [mdt] <4> [<ffffffffa08083d8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] <4> [<ffffffffa05125de>] ? cfs_timer_arm+0xe/0x10 [libcfs] <4> [<ffffffffa0523d9f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] <4> [<ffffffffa07ff739>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] <4> [<ffffffff81055813>] ? __wake_up+0x53/0x70 <4> [<ffffffffa080976e>] ptlrpc_main+0xace/0x1700 [ptlrpc] <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] <4> [<ffffffff8100c0ca>] child_rip+0xa/0x20 <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] <4> [<ffffffffa0808ca0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] <4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 |
| Comment by Jay Lan (Inactive) [ 18/Apr/14 ] |
|
I think |