[LU-5294] mdd_unlink() returning -7 Created: 03/Jul/14  Updated: 14/Jul/14  Resolved: 14/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Blake Caldwell Assignee: Oleg Drokin
Resolution: Fixed Votes: 1
Labels: None
Environment:

RHEL 6.4, kernel 2.6.32_358.23.2.el6,


Attachments: File dumpe2fs.atlas-linkfarm.mdt0     File lustre.log.atlas-linkfarm     File lustre.log.atlas-linkfarm.all     File test_file.debugfs_stat    
Severity: 3
Rank (Obsolete): 14769

 Description   

Any client on this filesystem will get back -7 from an unlink or rm. Creates, reads, writes work fine. We tried multiple users, and found no difference. Upcall is successful:

  1. /usr/sbin/l_getidentity linkfarm-MDT0000 0

+trace and +rpctrace debugs were captured on the MDT while performing the unlink on a client 10.36.226.85@o2ib

I can upload a full debug log. This is easy to recreate and capture, so just let me know which debug flags would useful.

00000040:00000001:19.0:1404405371.756302:0:13436:0:(llog_osd.c:317:llog_osd_declare_write_rec()) Process leaving (rc=0 : 0 : 0)
00000040:00000001:19.0:1404405371.756303:0:13436:0:(llog.c:714:llog_declare_write_rec()) Process leaving (rc=0 : 0 : 0)
00000040:00000001:19.0:1404405371.756303:0:13436:0:(llog_cat.c:443:llog_cat_declare_add_rec()) Process leaving (rc=0 : 0 : 0)
00000040:00000001:19.0:1404405371.756304:0:13436:0:(llog.c:790:llog_declare_add()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756305:0:13436:0:(osp_sync.c:206:osp_sync_declare_add()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756306:0:13436:0:(osp_object.c:322:osp_declare_object_destroy()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756307:0:13436:0:(lod_object.c:1044:lod_declare_object_destroy()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756309:0:13436:0:(lod_object.c:434:lod_declare_xattr_set()) Process entered
00000004:00000001:19.0:1404405371.756310:0:13436:0:(lod_object.c:464:lod_declare_xattr_set()) Process leaving (rc=18446744073709551609 : -7 : fffffffffffffff9)
00000004:00000001:19.0:1404405371.756311:0:13436:0:(mdd_dir.c:1422:mdd_unlink()) Process leaving via stop (rc=18446744073709551609 : -7 : 0xfffffffffffffff9)
00000004:00000001:19.0:1404405371.756313:0:13436:0:(osd_handler.c:915:osd_trans_stop()) Process entered
00040000:00000001:19.0:1404405371.756314:0:13436:0:(qsd_handler.c:1074:qsd_op_end()) Process entered
00040000:00000001:19.0:1404405371.756315:0:13436:0:(qsd_handler.c:1102:qsd_op_end()) Process leaving
00000004:00000001:19.0:1404405371.756316:0:13436:0:(osd_handler.c:968:osd_trans_stop()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756317:0:13436:0:(mdt_reint.c:868:mdt_reint_unlink()) Process leaving
00000004:00000001:19.0:1404405371.756317:0:13436:0:(mdt_handler.c:2791:mdt_object_unlock()) Process entered
00000004:00000001:19.0:1404405371.756318:0:13436:0:(mdt_handler.c:2739:mdt_save_lock()) Process entered

...
00000004:00000001:19.0:1404405371.756396:0:13436:0:(mdt_internal.h:584:mdt_object_put()) Process entered
00000020:00000001:19.0:1404405371.756397:0:13436:0:(lustre_fid.h:715:fid_flatten32()) Process leaving (rc=4196255 : 4196255 : 40079f)
00000004:00000001:19.0:1404405371.756398:0:13436:0:(mdt_internal.h:586:mdt_object_put()) Process leaving
00000004:00000001:19.0:1404405371.756398:0:13436:0:(mdt_reint.c:1375:mdt_reint_rec()) Process leaving (rc=18446744073709551609 : -7 : fffffffffffffff9)
00000004:00000001:19.0:1404405371.756399:0:13436:0:(mdt_handler.c:1832:mdt_reint_internal()) Process leaving
02000000:00000001:19.0:1404405371.756400:0:13436:0:(upcall_cache.c:276:upcall_cache_put_entry()) Process entered
02000000:00000001:19.0:1404405371.756400:0:13436:0:(upcall_cache.c:287:upcall_cache_put_entry()) Process leaving
00000004:00000001:19.0:1404405371.756401:0:13436:0:(mdt_handler.c:429:mdt_client_compatibility()) Process entered
00000004:00000001:19.0:1404405371.756401:0:13436:0:(mdt_handler.c:433:mdt_client_compatibility()) Process leaving
00000004:00000001:19.0:1404405371.756402:0:13436:0:(mdt_lib.c:572:mdt_fix_reply()) Process entered
00000004:00000001:19.0:1404405371.756403:0:13436:0:(mdt_lib.c:671:mdt_fix_reply()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756404:0:13436:0:(mdt_handler.c:1898:mdt_reint()) Process leaving (rc=18446744073709551609 : -7 : fffffffffffffff9)
00010000:00000001:19.0:1404405371.756405:0:13436:0:(ldlm_lib.c:2440:target_send_reply()) Process entered
00010000:00000001:19.0:1404405371.756406:0:13436:0:(ldlm_lib.c:2393:target_pack_pool_reply()) Process entered
00010000:00000001:19.0:1404405371.756407:0:13436:0:(ldlm_lib.c:2412:target_pack_pool_reply()) Process leaving (rc=0 : 0 : 0)

...
00010000:00000001:19.0:1404405371.756416:0:13436:0:(ldlm_lib.c:2452:target_send_reply()) Process leaving
00000004:00000001:19.0:1404405371.756417:0:13436:0:(mdt_handler.c:3103:mdt_req_handle()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756417:0:13436:0:(mdt_handler.c:3429:mdt_handle0()) Process leaving (rc=0 : 0 : 0)
00000004:00000001:19.0:1404405371.756418:0:13436:0:(mdt_handler.c:3463:mdt_handle_common()) Process leaving (rc=0 : 0 : 0)
00000100:00100000:19.0:1404405371.756421:0:13436:0:(service.c:2055:ptlrpc_server_handle_request()) Handled RPC pname:cluuid+ref:pid:xid:nid:opc mdt03_035:12cfc1a2-8c1a-ebe6-ebdb-eb076741d9d3+16:27222:x1472624384278960:12345-10.36.226.85@o2ib:36 Request procesed in 395us (422us total) trans 0 rc -7/-7
00000100:00100000:19.0:1404405371.756424:0:13436:0:(nrs_fifo.c:244:nrs_fifo_req_stop()) NRS stop fifo request from 12345-10.36.226.85@o2ib, seq: 2243836



 Comments   
Comment by James Nunez (Inactive) [ 03/Jul/14 ]

Mike,

Would you please take a look at this issue and comment?

Thank you,
James

Comment by Oleg Drokin [ 03/Jul/14 ]

Is this vanilla 2.4.3 (servers)?
Is this for any file, or a specific set of files, wide striping in use?
1.8 clients or any version clients

Comment by Blake Caldwell [ 03/Jul/14 ]

This is the same 2.4.3 as the rest of our production servers with several patches listed below. It is for any file on the filesystem. Wide striping is not in use (only 1 OST) on this filesystem. Any client (including 1.8) can reproduce this error.

LU-4006 (landed to 2.6, not sure if we¹ve thoroughly tested)
LU-4008 (3 older patches in our version, 2 landed to master, working
toward landing in 2.5)
LU-4019 (in 2.5)
LU-4403 (landed to 2.6)
LU-4570 (debug-only patch)
LU-4719 (landed to 2.6)
LU-4791 (marked as merged in Gerrit, but no ³Fix Version" Jira)

Comment by Oleg Drokin [ 03/Jul/14 ]

Hm, kind of weird in that apparently whatever was the function that returned -7, it's not in your trace.
It looks like one possible candidate would be osp_insert_update called from osp_md_declare_xattr_set(), but it should print a CERROR and it also has entry/exits inside of it so it should be visible.

I assume you do not get any messages in dmesg on mds as well?

Did this appear out of a sudden after you applied some patchesm or did this combination of patches worked until it stopped working?

Comment by Blake Caldwell [ 03/Jul/14 ]

Nope, no dmesg output on mds. I'm attaching a complete log. This didn't correlate to any patch changes (we've been on this one for a couple months maybe now). We had strange things going on with DNS (outage this week) and syslog (was blocking on misconfigured server), but they have been resolved. I was initially looking for a correlation to an upcall result, but that part seems fine.

Comment by Oleg Drokin [ 03/Jul/14 ]

Well, I assume you already trid to reboot the MDS and this did not clear the condition, so perhaps we should add a small patch as a firs tstep that would print the ops address that returned the failure and then we'll trace where did it came from and perhaps add more debug in there if it's still unclear of what's going on?
Would this be workable for you?

Comment by Blake Caldwell [ 03/Jul/14 ]

We haven't rebooted the MDS yet, but since that sounds like the next step, we will plan on doing that. Probably Tuesday if we can hobble along in this mode for a bit.

Comment by Oleg Drokin [ 03/Jul/14 ]

Is there anything else you will be able to do on the system before then?

Also can you grab an mds log at -1 debug level while reproducing the error on the off chace it might catch more messages at some other level and shine some more light at what's going on?

Comment by John Fuchs-Chesney (Inactive) [ 03/Jul/14 ]

Thank you for picking this up Oleg.
~ jfc.

Comment by Andreas Dilger [ 04/Jul/14 ]

It may be that this is caused by the file having a too-large ACL xattr? I recall a similar problem in the past, maybe the patch is not landed on b2_4?

Alternately, does this filesystem have wide striping enabled (for more than 160 OSTs)?

Comment by Matt Ezell [ 04/Jul/14 ]

It may be that this is caused by the file having a too-large ACL xattr? I recall a similar problem in the past, maybe the patch is not landed on b2_4?

We don't use and typically disable ACLs on our file systems. This seems to be happening for all files.

Alternately, does this filesystem have wide striping enabled (for more than 160 OSTs)?

This is our "link farm" file system. It houses symlinks for all users that point to one of the two large file systems. It's just 1 OST, so no wide striping.

Comment by Andreas Dilger [ 04/Jul/14 ]

Can you please post the output of dumpe2fs -h /dev/mdtdev on the MDS. In particular, I'd like to check the features that are enabled on the filesystem.

Also, is this problem specific to some files, or is this happening for all files? If it is specific to certain files, could you please run debugfs -c -R "stat /ROOT/path/to/file" /dev/mdtdev, where /path/to/file is just the part below the mountpoint (i.e. excluding the "/mnt/linkfarm" part, or whatever).

Comment by Blake Caldwell [ 07/Jul/14 ]

Some updates after trying the above:

The issue is still present after rebooting the mds. After setting lnet.printk to -1 it overwhelmed rsyslog, and necessitated a reboot. We need to fix rsyslog to drop messages.

Attached is a debug log with all options turned on in /proc/sys/lnet/debug

Also, as requested, I'm attaching the output from debugfs "stat $FILE" /dev/mdt and dumpe2fs -h /dev/mdtdev. The version of e2fsprogs is a little out of date, so I noticed that no "parent" fid was given (e2fsprogs-1.42.9.wc1-7.el6). Ideally it'd be best t wait until tomorrow if we need to update that version on the running nodes.

Comment by Oleg Drokin [ 09/Jul/14 ]

hm, still nothing additionally useful in the full debug log.

I inspected the code again. lod_object.c:464 looks like this in my 2.4.3 tree:

        rc = dt_declare_xattr_set(env, next, buf, name, fl, th);

        RETURN(rc);

So the error must have come from that is declared as:

static inline int dt_declare_xattr_set(const struct lu_env *env,
                                      struct dt_object *dt,
                                      const struct lu_buf *buf,
                                      const char *name, int fl,
                                      struct thandle *th)
{
        LASSERT(dt);
        LASSERT(dt->do_ops);
        LASSERT(dt->do_ops->do_declare_xattr_set);
        return dt->do_ops->do_declare_xattr_set(env, dt, buf, name, fl, th);
}

Now, I checked the code for all definitions of do_declare_xattr_set and there are only four:
lod/lod_object.c: .do_declare_xattr_set = lod_declare_xattr_set, – This one we are called from already. also it has entry/exit, so if there was a recursion, we'd see it in the log.
osd-zfs/osd_object.c: .do_declare_xattr_set = osd_declare_xattr_set, – This is zfs-only and you don't use ZFS, additionally there's ENTRY at the start of that function that we would see but don't
osp/osp_md_object.c: .do_declare_xattr_set = osp_md_declare_xattr_set, – This should only happen for DNE case that you don't seem to be having. Additionally while this function doe snot have ENTRY/EXIT, it calls into functions that do. Additionally there's CERROR in case of error there.
osd-ldiskfs/osd_handler.c: .do_declare_xattr_set = osd_declare_xattr_set, – This one would be the one that failed I would guess, except it's always returning 0:

static int osd_declare_xattr_set(const struct lu_env *env,
                                 struct dt_object *dt,
                                 const struct lu_buf *buf, const char *name,
                                 int fl, struct thandle *handle)
{
        struct osd_thandle *oh;

        LASSERT(handle != NULL);

        oh = container_of0(handle, struct osd_thandle, ot_super);
        LASSERT(oh->ot_handle == NULL);

        osd_trans_declare_op(env, oh, OSD_OT_XATTR_SET,
                             strcmp(name, XATTR_NAME_VERSION) == 0 ?
                             osd_dto_credits_noquota[DTO_ATTR_SET_BASE] :
                             osd_dto_credits_noquota[DTO_XATTR_SET]);

        return 0;
}

Master version of osd_declare_xattr_set is defined a bit differently, but still always returns 0.
Does your version of osd_declare_xattr_set in osd-ldiskfs/ look the same?
If so, I am totally out of ideas about how ANY error could emerge from lod_declare_xattr_set.

Comment by Blake Caldwell [ 10/Jul/14 ]

Thanks Oleg. From that I found that we have patch set 2 of LU-4791 rather than patch set 3 that was merged
http://review.whamcloud.com/#/c/9837/2..3/lustre/osd-ldiskfs/osd_handler.c,cm

The conditional that returns -E2BIG was marked "unlikely". Any idea what's causing the comparison to fail now?
http://review.whamcloud.com/#/c/9837/1/lustre/osd-ldiskfs/osd_handler.c,cm

The relevant pieces of our LU-4791.patch:
@@ -2713,10 +2722,30 @@ static int osd_declare_xattr_set(const struct lu_env *env,
const struct lu_buf *buf, const char *name,
int fl, struct thandle *handle)
{
+ struct osd_device *osd = osd_dev(dt->do_lu.lo_dev);
+ struct super_block *sb = osd_sb(osd);
struct osd_thandle *oh;
+ int max_ea_size;
+ int ea_overhead;
+

LASSERT(handle != NULL);

+#if defined(LDISKFS_FEATURE_INCOMPAT_EA_INODE)
+ if (LDISKFS_HAS_INCOMPAT_FEATURE(sb, LDISKFS_FEATURE_INCOMPAT_EA_INODE))
+ max_ea_size = LDISKFS_XATTR_MAX_LARGE_EA_SIZE;
+ else
+#endif
+ max_ea_size = sb->s_blocksize;
+
+ /* we also need take account in the overhead,
+ * xattr_header + magic + xattr_entry_head */
+ ea_overhead = sizeof(struct ldiskfs_xattr_header) + sizeof(__u32) +
+ LDISKFS_XATTR_LEN(XATTR_NAME_MAX_LEN);
+
+ if (buf->lb_len > max_ea_size - ea_overhead)
+ return -E2BIG;
+
oh = container_of0(handle, struct osd_thandle, ot_super);
LASSERT(oh->ot_handle == NULL);

Comment by Oleg Drokin [ 10/Jul/14 ]

So, according to comments in LU-4791, bull hit the exact same problem yo did with patchset 2 and Wangdi explains that he failed to take xattr overhead into account, that's why the patchset 3 was created.

In the end I think you just need to drop patchet 2 from your tree and add patchset 3.

Comment by James A Simmons [ 10/Jul/14 ]

I merged the lastest version of the LU-4791 patch as well as include the LU-4260 fix. I have tested on the small scale with no problems. Once it is production and we have no problems it will be posted here.

Comment by James A Simmons [ 10/Jul/14 ]

One more note. Lustre 2.5 already has all the correct needed fixes.

Comment by Blake Caldwell [ 14/Jul/14 ]

Our new build with LU-4791 (patch set 3) and also LU-4260 resolved this issue. The file in question could be deleted. Thanks Oleg/James!

Comment by James A Simmons [ 14/Jul/14 ]

This ticket can be closed.

Comment by John Fuchs-Chesney (Inactive) [ 14/Jul/14 ]

Thanks James!
~ jfc

Generated at Sat Feb 10 01:50:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.