[LU-10551] sanity-pfl: lfs mkdir: error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d12.sanity-pfl' Created: 23/Jan/18 Updated: 06/Mar/18 Resolved: 06/Mar/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Cliff White (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | dne, pfl, zfs | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Cliff White <cliff.white@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ecf4cec6-fd83-11e7-a7cd-52540065bddc We are seeing this failure on multiple sanity-pfl runs. Previous tests in the script are creating 4 MDTs, the devices should exist. |
| Comments |
| Comment by Peter Jones [ 08/Feb/18 ] |
|
Bobijam Could you please advise on this one Thanks Peter |
| Comment by Zhenyu Xu [ 10/Feb/18 ] |
|
client's calling ll_dir_setdirstripe() 00000080:00200000:0.0:1516384792.861903:0:3934:0:(dir.c:464:ll_dir_setdirstripe()) VFS Op:inode=[0x200000007:0x1:0x0](ffff880060fac090) name d12.sanity-pfl stripe_offset 0, stripe_count: 2 .. 00000100:00100000:0.0:1516384792.861942:0:3934:0:(client.c:1602:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc lfs:a576c287-2c5a-9504-7beb-9aa1ec9e5dce:3934:1590020423487728:10.2.8.5@tcp:36 MDS1 got the request 00000100:00100000:1.0:1516384792.057779:0:21059:0:(service.c:1949:ptlrpc_server_handle_req_in()) got req x1590020423487728 ... 00000100:00100000:1.0:1516384792.200635:0:21059:0:(client.c:1602:ptlrpc_send_new_req()) Sending RPC pname:cluuid:pid:xid:nid:opc mdt00_002:lustre-MDT0000-mdtlov_UUID:21059:1590020472076672:10.2.8.6@tcp:700 00000100:00100000:1.0:1516384792.200660:0:21059:0:(client.c:2321:ptlrpc_set_wait()) set ffff8800588d0c00 going to sleep for 11 seconds 00000100:00100000:1.0:1516384792.201016:0:21059:0:(client.c:2027:ptlrpc_check_set()) Completed RPC pname:cluuid:pid:xid:nid:opc mdt00_002:lustre-MDT0000-mdtlov_UUID:21059:1590020472076672:10.2.8.6@tcp:700 00080000:00020000:1.0:1516384792.201272:0:21059:0:(osd_oi.c:978:osd_idc_find_or_init()) lustre-MDT0000: can't lookup: rc = -2 Lai/Nasf, is it that the dir entry object created does not shown in the OI? |
| Comment by Gerrit Updater [ 12/Feb/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/31268 |
| Comment by Zhenyu Xu [ 01/Mar/18 ] |
|
from https://testing.hpdd.intel.com/test_logs/e5ff0994-1d16-11e8-bd00-52540065bddc/show_text in lod_declare_create(), the to be created local object fid was add into id map cache by calling lod_sub_declare_create() first 00080000:00000001:1.0:1519884308.953048:0:19746:0:(osd_object.c:1448:osd_declare_create()) Process entered 00080000:00000001:1.0:1519884308.953056:0:19746:0:(osd_oi.c:302:fid_is_on_ost()) Process entered 80000000:00000001:1.0:1519884308.953057:0:19746:0:(fld_handler.c:207:fld_local_lookup()) Process entered 80000000:00000001:1.0:1519884308.953057:0:19746:0:(fld_cache.c:530:fld_cache_lookup()) Process entered 80000000:00000001:1.0:1519884308.953058:0:19746:0:(fld_cache.c:549:fld_cache_lookup()) Process leaving (rc=0 : 0 : 0) 80000000:00000001:1.0:1519884308.953058:0:19746:0:(fld_handler.c:224:fld_local_lookup()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:1.0:1519884308.953059:0:19746:0:(osd_oi.c:331:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0) 00080000:00000002:1.0:1519884308.953063:0:19746:0:(osd_oi.c:1117:osd_idc_find_and_init()) lustre-MDT0000: FID [0x200000404:0x12:0x0] not in the id map cache then it goes to lod_declare_dir_striping_create() static int osd_trans_stop(const struct lu_env *env, struct dt_device *dt, struct thandle *th) { ... /* reset OI cache for safety */ osd_oti_get(env)->oti_ins_cache_used = 0; 00080000:00000002:1.0:1519884308.953063:0:19746:0:(osd_oi.c:1117:osd_idc_find_and_init()) lustre-MDT0000: FID [0x200000404:0x12:0x0] not in the id map cache ... 00080000:00000001:1.0:1519884308.953069:0:19746:0:(osd_object.c:1502:osd_declare_create()) Process leaving (rc=0 : 0 : 0) 00000004:00000001:1.0:1519884308.953070:0:19746:0:(lod_object.c:3444:lod_dir_striping_create_internal()) Process entered 00000004:00000001:1.0:1519884308.953071:0:19746:0:(lod_object.c:2060:lod_declare_xattr_set_lmv()) Process entered ... 00000004:00000001:1.0:1519884308.953076:0:19746:0:(obd_class.h:945:obd_fid_alloc()) Process entered ... 40000000:00000001:1.0:1519884308.953079:0:19746:0:(fid_request.c:350:seq_client_alloc_fid()) Process entered 40000000:00000001:1.0:1519884308.953080:0:19746:0:(fid_request.c:218:seq_client_alloc_seq()) Process entered 40000000:00000001:1.0:1519884308.953081:0:19746:0:(fid_request.c:179:seq_client_alloc_meta()) Process entered 40000000:00000001:1.0:1519884308.953081:0:19746:0:(fid_handler.c:360:seq_server_alloc_meta()) Process entered 40000000:00000001:1.0:1519884308.953082:0:19746:0:(fid_handler.c:322:__seq_server_alloc_meta()) Process entered 40000000:00000001:1.0:1519884308.953083:0:19746:0:(fid_handler.c:277:seq_server_check_and_alloc_super()) Process entered 00080000:00000001:1.0:1519884308.953084:0:19746:0:(osd_handler.c:337:osd_trans_create()) Process entered ... 00080000:00000001:1.0:1519884308.953118:0:19746:0:(osd_handler.c:199:osd_trans_start()) Process entered 00080000:00000001:1.0:1519884308.953122:0:19746:0:(osd_handler.c:232:osd_trans_start()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:1.0:1519884308.953123:0:19746:0:(osd_io.c:198:osd_write()) Process entered 00080000:00000001:1.0:1519884308.953142:0:19746:0:(osd_io.c:227:osd_write()) Process leaving (rc=24 : 24 : 18) 00080000:00000001:1.0:1519884308.953143:0:19746:0:(osd_handler.c:282:osd_trans_stop()) Process entered ... then the following lod_dir_declare_create_stripes()-> lod_sub_declare_insert() cannot find the local fid in the id map cache, also cannot find in the osd_fid_lookup() since it's in the declare phase, and the striped dir create fails. 00000004:00000001:1.0:1519884309.060924:0:19746:0:(lod_object.c:1747:lod_dir_declare_create_stripes()) Process entered ... 00080000:00000001:1.0:1519884309.060933:0:19746:0:(osd_object.c:1448:osd_declare_create()) Process entered ... 00080000:00000002:1.0:1519884309.061163:0:19746:0:(osd_oi.c:1063:osd_idc_find_or_init()) lustre-MDT0000: FID [0x200000404:0x12:0x0] not in the id map cache ... 00080000:00000001:1.0:1519884309.061170:0:19746:0:(osd_oi.c:612:osd_fid_lookup()) Process entered 00080000:00000001:1.0:1519884309.061170:0:19746:0:(osd_oi.c:302:fid_is_on_ost()) Process entered 80000000:00000001:1.0:1519884309.061171:0:19746:0:(fld_handler.c:207:fld_local_lookup()) Process entered 80000000:00000001:1.0:1519884309.061171:0:19746:0:(fld_cache.c:530:fld_cache_lookup()) Process entered 80000000:00000001:1.0:1519884309.061172:0:19746:0:(fld_cache.c:549:fld_cache_lookup()) Process leaving (rc=0 : 0 : 0) 80000000:00000001:1.0:1519884309.061172:0:19746:0:(fld_handler.c:224:fld_local_lookup()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:1.0:1519884309.061173:0:19746:0:(osd_oi.c:331:fid_is_on_ost()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:1.0:1519884309.061179:0:19746:0:(osd_oi.c:641:osd_fid_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) 00080000:00020000:1.0:1519884309.061180:0:19746:0:(osd_oi.c:1085:osd_idc_find_or_init()) lustre-MDT0000: can't lookup: rc = -2 |
| Comment by Alex Zhuravlev [ 01/Mar/18 ] |
|
is it possible to reorder things so that FID allocation (and nested transaction) happens at the very beginning? |
| Comment by Alex Zhuravlev [ 01/Mar/18 ] |
|
given nested transactions is a very rare case, another option is to have a counter in osd_thread_info, increment it in osd_trans_create(), decrement in osd_trans_top() and reset IDC when the counter reaches 0. |
| Comment by Gerrit Updater [ 06/Mar/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31268/ |
| Comment by Peter Jones [ 06/Mar/18 ] |
|
Landed for 2.11 |