[LU-1524] Parent doesn't exist Created: 14/Jun/12  Updated: 21/Mar/13  Resolved: 21/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Zhenyu Xu
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: File service160.bz2     File service160.gz    
Severity: 3
Rank (Obsolete): 4045

 Description   

mds starts to report "parent doesn't exist" Load in the mds became very high we ending up dumping the server. Have vmcore if needed. Could be a dup of LU-1350 but with hung service thread.

Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) header@ffff880c18139480[0x0, 1, [0x325698b32c:0x9:0x0] hash lru]

{ ^M Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) ....mdt@ffff880c181394d8mdt-object@ffff880c18139480(ioepoch=0 flags=0x0, epochcount=0, writecount=0)^M Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) ....cmm@ffff880b36c71d40[local]^M Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) ....mdd@ffff8808ceb92a40mdd-object@ffff8808ceb92a40(open_count=0, valid=0, cltime=0, flags=0)^M Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) ....osd-ldiskfs@ffff8808ceb92980osd-ldiskfs-object@ffff8808ceb92980(i:(null):0/0)[plain]^M Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) }

header@ffff880c18139480^M
Lustre: 4504:0:(mdt_handler.c:888:mdt_getattr_name_lock()) Parent doesn't exist!^M

Lustre: 4946:0:(mdt_xattr.c:375:mdt_reint_setxattr()) client miss to set OBD_MD_FLCTIME when setxattr: [object [0x2f00600666:0x44:0x0]] [valid 68719476736]^M
Lustre: Service thread pid 9153 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:^M
Lustre: Skipped 2 previous similar messages^M
Pid: 9153, comm: mdt_mdss_153^M
^M
Call Trace:^M
[<ffffffffa0a76785>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]^M
[<ffffffff8108fff0>] ? autoremove_wake_function+0x0/0x40^M
[<ffffffff81012c69>] ? read_tsc+0x9/0x20^M
[<ffffffffa0a6eb4b>] jbd2_journal_stop+0x2cb/0x320 [jbd2]^M
[<ffffffffa0ac7048>] __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]^M
[<ffffffffa0c498f8>] osd_trans_stop+0xb8/0x290 [osd_ldiskfs]^M
[<ffffffffa089fb06>] ? seq_store_write+0xc6/0x2b0 [fid]^M
[<ffffffffa089f867>] seq_store_trans_stop+0x57/0xe0 [fid]^M
[<ffffffffa089fd8c>] seq_store_update+0x9c/0x1e0 [fid]^M
[<ffffffffa089e99a>] seq_server_alloc_meta+0x4aa/0x720 [fid]^M
[<ffffffffa0630800>] ? lustre_swab_lu_seq_range+0x0/0x30 [obdclass]^M
[<ffffffffa089efc8>] seq_query+0x3b8/0x680 [fid]^M
[<ffffffffa075e954>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
[<ffffffffa0be7e85>] mdt_handle_common+0x8d5/0x1810 [mdt]^M
[<ffffffffa075e954>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
[<ffffffffa0be8e35>] mdt_mdss_handle+0x15/0x20 [mdt]^M
[<ffffffffa076f42e>] ptlrpc_main+0xb7e/0x18f0 [ptlrpc]^M
[<ffffffffa076e8b0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
[<ffffffff8100c14a>] child_rip+0xa/0x20^M
[<ffffffffa076e8b0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
[<ffffffffa076e8b0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
[<ffffffff8100c140>] ? child_rip+0x0/0x20^M



 Comments   
Comment by Peter Jones [ 14/Jun/12 ]

Bobijam

Is this a duplicate of LU-1350?

Peter

Comment by Zhenyu Xu [ 14/Jun/12 ]

Mahmoud,

What situation does this issue happen? During MDS recovery? Or what operaions made MDS start to dumping this info and hung there?

Comment by Mahmoud Hanafi [ 14/Jun/12 ]

This happen after the Parent's... message. The MDS was not in recovery. but the same mds after recover is again seeing a high load.

Jun 14 19:26:50 nbp2-mds kernel: Lustre: nbp2-MDT0000: sending delayed replies to recovered clients
Jun 14 19:26:50 nbp2-mds kernel: Lustre: 4747:0:(mds_lov.c:1024:mds_notify()) MDS mdd_obd-nbp2-MDT0000: in recovery, not resetting orphans on nbp2-OST0000_UUID
Jun 14 19:26:50 nbp2-mds kernel: Lustre: 4747:0:(mds_lov.c:1024:mds_notify()) Skipped 14 previous similar messages
Jun 14 19:26:50 nbp2-mds kernel: Lustre: MDS mdd_obd-nbp2-MDT0000: nbp2-OST000a_UUID now active, resetting orphans
Jun 14 19:26:50 nbp2-mds kernel: Lustre: MDS mdd_obd-nbp2-MDT0000: nbp2-OST000e_UUID now active, resetting orphans
Jun 14 19:26:50 nbp2-mds kernel: Lustre: Skipped 1 previous similar message
Jun 14 19:26:50 nbp2-mds kernel: Lustre: 4747:0:(mdd_orphans.c:371:orph_key_test_and_del()) Found orphan! Delete it
Jun 14 19:27:13 nbp2-mds pcp-pmie[3585]: High 1-minute load average 332load@nbp2-mds
Jun 14 19:27:37 nbp2-mds kernel: LustreError: 4349:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x1 sub-object on OST idx 2/1: rc = -107
Jun 14 19:27:37 nbp2-mds kernel: LustreError: 4349:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x1 sub-object on OST idx 5/1: rc = -107
Jun 14 19:27:38 nbp2-mds kernel: LustreError: 4349:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x3 sub-object on OST idx 4/1: rc = -107
Jun 14 19:27:38 nbp2-mds kernel: LustreError: 4349:0:(lov_request.c:569:lov_update_create_set()) Skipped 12 previous similar messages
Jun 14 19:27:39 nbp2-mds kernel: LustreError: 4349:0:(lov_request.c:569:lov_update_create_set()) error creating fid 0x2 sub-object on OST idx 3/1: rc = -107

Comment by Zhenyu Xu [ 15/Jun/12 ]

Looks like the OSTs are busy deleteing orphan objects, and MDS's busy waiting for fid sequence update.

Wangdi, could you help to have a look?

Comment by Zhenyu Xu [ 15/Jun/12 ]

Whether did you observe that the MDS keep high load after recovery or did the high load keep a short time after recovery?

Would you mind uploading vmcore or sysrq-trigger output, we'd like to know the whole picture of this situation.

(lov_request.c:569:lov_update_create_set()) error creating fid 0x1 sub-object on OST idx 2/1: rc = -107) means the 1st object hasn't created on this OST yet, is this a production system or a testing system?

Comment by Mahmoud Hanafi [ 19/Jun/12 ]

We hit this issue again. Uploading console logs with call traces. We are not able to upload the vmcore due to security.

Comment by Di Wang [ 20/Jun/12 ]

According to the log. there is a LBUG here.

<0>LustreError: 11533:0:(lu_object.c:113:lu_object_put()) ASSERTION(cfs_list_empty(&top->loh_lru)) failed^M
<0>LustreError: 11533:0:(lu_object.c:113:lu_object_put()) LBUG^M
<4>Pid: 11533, comm: mdt_rdpg_07^M
<4>^M
<4>Call Trace:^M
<4> [<ffffffffa056b855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M
<4> [<ffffffffa056be95>] lbug_with_loc+0x75/0xe0 [libcfs]^M
<4> [<ffffffffa0576da6>] libcfs_assertion_failed+0x66/0x70 [libcfs]^M
<4> [<ffffffffa066e069>] lu_object_put+0x209/0x210 [obdclass]^M
<4> [<ffffffffa0c0f041>] mdt_close+0x2f1/0xac0 [mdt]^M
<4> [<ffffffffa0be7031>] ? mdt_unpack_req_pack_rep+0x51/0x5d0 [mdt]^M
<4> [<ffffffffa075fd84>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
<4> [<ffffffffa07620d8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]^M
<4> [<ffffffffa0be7e85>] mdt_handle_common+0x8d5/0x1810 [mdt]^M
<4> [<ffffffffa075fd84>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
<4> [<ffffffffa0be8e75>] mdt_readpage_handle+0x15/0x20 [mdt]^M
<4> [<ffffffffa077085e>] ptlrpc_main+0xb7e/0x18f0 [ptlrpc]^M
<4> [<ffffffffa076fce0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
<4> [<ffffffff8100c14a>] child_rip+0xa/0x20^M
<4> [<ffffffffa076fce0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
<4> [<ffffffffa076fce0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
<4> [<ffffffff8100c140>] ? child_rip+0x0/0x20^M
<4>^M
<0>Kernel panic - not syncing: LBUG^M
<4>Pid: 11533, comm: mdt_rdpg_07 Not tainted 2.6.32-220.4.1.el6.20120130.x86_64.lustre211 #1^M
<4>Call Trace:^M
<4> [<ffffffff81520c76>] ? panic+0x78/0x164^M
<4> [<ffffffffa056beeb>] ? lbug_with_loc+0xcb/0xe0 [libcfs]^M
<4> [<ffffffffa0576da6>] ? libcfs_assertion_failed+0x66/0x70 [libcfs]^M
<4> [<ffffffffa066e069>] ? lu_object_put+0x209/0x210 [obdclass]^M
<4> [<ffffffffa0c0f041>] ? mdt_close+0x2f1/0xac0 [mdt]^M
<4> [<ffffffffa0be7031>] ? mdt_unpack_req_pack_rep+0x51/0x5d0 [mdt]^M
<4> [<ffffffffa075fd84>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
<4> [<ffffffffa07620d8>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]^M
<4> [<ffffffffa0be7e85>] ? mdt_handle_common+0x8d5/0x1810 [mdt]^M
<4> [<ffffffffa075fd84>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
<4> [<ffffffffa0be8e75>] ? mdt_readpage_handle+0x15/0x20 [mdt]^M
<4> [<ffffffffa077085e>] ? ptlrpc_main+0xb7e/0x18f0 [ptlrpc]^M
<4> [<ffffffffa076fce0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
<4> [<ffffffff8100c14a>] ? child_rip+0xa/0x20^M
<4> [<ffffffffa076fce0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
<4> [<ffffffffa076fce0>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
<4> [<ffffffff8100c140>] ? child_rip+0x0/0x20^M

Seems like 1013.

Comment by Mahmoud Hanafi [ 20/Jun/12 ]

Yes the LBUG was after the system was rebooted and that is a different issue

Sent from my iPhone

Comment by Di Wang [ 20/Jun/12 ]

Yeah, saw that. btw: the MDT is almost full?

Comment by Di Wang [ 20/Jun/12 ]

According to the console log. it seems the journal can not be synced to disk in time. See

Pid: 18717, comm: mdt_mdss_343^M
^M
Call Trace:^M
[<ffffffffa0a76785>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]^M
[<ffffffff8108fff0>] ? autoremove_wake_function+0x0/0x40^M
[<ffffffffa0a76836>] ? __jbd2_log_start_commit+0x36/0x40 [jbd2]^M
[<ffffffffa0a6eb4b>] jbd2_journal_stop+0x2cb/0x320 [jbd2]^M
[<ffffffffa0ac7048>] __ldiskfs_journal_stop+0x68/0xa0 [ldiskfs]^M
[<ffffffffa0c4a8f8>] osd_trans_stop+0xb8/0x290 [osd_ldiskfs]^M
[<ffffffffa089fb06>] ? seq_store_write+0xc6/0x2b0 [fid]^M
[<ffffffffa089f867>] seq_store_trans_stop+0x57/0xe0 [fid]^M
[<ffffffffa089fd8c>] seq_store_update+0x9c/0x1e0 [fid]^M
[<ffffffffa089e99a>] seq_server_alloc_meta+0x4aa/0x720 [fid]^M
[<ffffffffa0631800>] ? lustre_swab_lu_seq_range+0x0/0x30 [obdclass]^M
[<ffffffffa089efc8>] seq_query+0x3b8/0x680 [fid]^M
[<ffffffffa075fce4>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
[<ffffffffa0be7e85>] mdt_handle_common+0x8d5/0x1810 [mdt]^M
[<ffffffffa075fce4>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]^M
[<ffffffffa0be8e35>] mdt_mdss_handle+0x15/0x20 [mdt]^M
[<ffffffffa07707be>] ptlrpc_main+0xb7e/0x18f0 [ptlrpc]^M
[<ffffffffa076fc40>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
[<ffffffff8100c14a>] child_rip+0xa/0x20^M
[<ffffffffa076fc40>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
[<ffffffffa076fc40>] ? ptlrpc_main+0x0/0x18f0 [ptlrpc]^M
[<ffffffff8100c140>] ? child_rip+0x0/0x20^M
^M
LustreError: dumping log to /tmp/lustre-log.1340067239.18717^M

Then all other processes just wait there to get enough journal credit to start the journal. Hmm I do not know why the journal can not be synced to disk in time. Are you using internal or external journal for your system?

Actually in LU-795, this seq code has been changed, i.e. most time seq update does not need to be sync at all. Unfortunately, that patch is only landed on 2.2, which will definitely help here. Do you have plan to upgrade your system to 2.2 anytime soon.

And also we saw a lot lnet error msg here

Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.167@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.30.1@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.132@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.130@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.29.8@o2ib [old ver: 12, new ver: 12]

Which usually means client is being reboot. Is that true? And how many clients in your system? Thanks.

Comment by Mahmoud Hanafi [ 20/Jun/12 ]

As for disk space we have lots free
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda8 20096128 130179 19965949 1% /
tmpfs 3074827 1 3074826 1% /dev/shm
/dev/sda7 36720 50 36670 1% /boot
/dev/mapper/nbp2--vg-mgs
64000 138 63862 1% /mnt/lustre/mgs
/dev/mapper/nbp2--vg-mdt2
268435456 35795386 232640070 14% /mnt/lustre/nbp2-mdt
nbp2-mds ~ # df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda8 484397076 11907584 448142440 3% /
tmpfs 12299308 0 12299308 0% /dev/shm
/dev/sda7 141845 33932 100589 26% /boot
/dev/mapper/nbp2--vg-mgs
1007896 19888 988008 2% /mnt/lustre/mgs
/dev/mapper/nbp2--vg-mdt2
939453620 23505792 915947828 3% /mnt/lustre/nbp2-mdt

We also noticed the slowness in the journal. We have check the RAID subsystem no issue were found.

We do have a lot of clients over 10000.

We are not planing to upgrade to 2.2 but we are looking at 2.1.2. We may be able to pull-in LU-795.

Comment by Di Wang [ 20/Jun/12 ]

Hmm, I found some error msg

LustreError: 12820:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88014c7a2ec0^M
[-- MARK -- Mon Jun 18 09:00:00 2012]^M
LustreError: 10403:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff88014b0b9300^M
LustreError: 12833:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff880240ccf140^M

Ah, that is because current log handle does not have space for the record, instead of the disk space. Hmm, the error msg is a little misleading. I will correct it a bit.

LU-795 definitely fix some journal sync problem here, so it should help in your case.

So there are more than 10k clients, do you expect they are being reboot during the time you collecting the console log, since we see a lot

Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.167@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.30.1@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.132@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.130@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.29.8@o2ib [old ver: 12, new ver: 12]

in the message you upload.

Also it seems you also met a lot LU-617,

LustreError: 18929:0:(mdt_recovery.c:793:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 171918688025, new: 171918688024 replay: 0. see LU-617.
...

Unfortunately, the fix is also landed on 2.2. (the patch on 2.1 is just a workaround fix).

Comment by Mahmoud Hanafi [ 20/Jun/12 ]

does kiblnd_passive_connect always mean the client is being reconnecting from a reboot? Or is it just it's reconnecting, may due to fabric issue.

Comment by Di Wang [ 20/Jun/12 ]

That is what I learned, but I am not lnet expert. Liang, could you please comment here?

Comment by Peter Jones [ 29/Jun/12 ]

Added Liang as a watcher so that he can see Wang Di's question

Comment by Liang Zhen (Inactive) [ 29/Jun/12 ]

So there are more than 10k clients, do you expect they are being reboot during the time you collecting the console log, since we see a lot

Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.167@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.30.1@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.132@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.59.130@o2ib [old ver: 12, new ver: 12]
Lustre: 2056:0:(o2iblnd_cb.c:2326:kiblnd_passive_connect()) Conn stale 10.151.29.8@o2ib [old ver: 12, new ver: 12]

I think we generate these messages only for rebooted remote peers.

Comment by Peter Jones [ 21/Mar/13 ]

As per NASA ok to close ticket

Generated at Sat Feb 10 01:17:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.