[LU-8858] o_lcookie is not swabbed properly Created: 22/Nov/16  Updated: 22/Nov/16  Resolved: 22/Nov/16

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.x (1.8.0 - 1.8.5), Lustre 2.1.6
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We hit a problem when using Sparc Client vs x86_64 Lustre server(Server is 2.1.6)

Call Trace:
[<ffffffff886ed601>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
[<ffffffff886edb08>] lbug_with_loc+0x48/0x90 [libcfs]
[<ffffffff89032f87>] filter_cancel_cookies_cb+0x1e7/0x5a0 [obdfilter]
[<ffffffff88e2b7c7>] fsfilt_ldiskfs_cb_func+0x17/0x160 [fsfilt_ldiskfs]
[<ffffffff88d46b00>] jbd2_journal_commit_transaction+0xbb8/0x1120 [jbd2]
[<ffffffff8003dddd>] lock_timer_base+0x1b/0x3c
[<ffffffff88d4a2c3>] kjournald2+0x9a/0x1ec [jbd2]
[<ffffffff800a3cdf>] autoremove_wake_function+0x0/0x2e
[<ffffffff88d4a229>] kjournald2+0x0/0x1ec [jbd2]
[<ffffffff800a3ac7>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032c4f>] kthread+0xfe/0x132
[<ffffffff8005dfc1>] child_rip+0xa/0x11
[<ffffffff800a3ac7>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032b51>] kthread+0x0/0x132
[<ffffffff8005dfb7>] child_rip+0x0/0x11

Kernel panic - not syncing: LBUG

But we could mount x86_64 client using same Lustre version, problems seems come to
o_lcookie is not swabbed properly so that it caused problems. and we applied fix, Sparc
client could mount Server without problems.

I still could not see where master branch fix the problem, so i think latest master branch
aslo have this problem.



 Comments   
Comment by Gerrit Updater [ 22/Nov/16 ]

Wang Shilong (wshilong@ddn.com) uploaded a new patch: http://review.whamcloud.com/23891
Subject: LU-8858 ptlrpc: swab o_lcookie propely
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3ee629a0134d137f0364dbf8f883d53c804a009b

Comment by Andreas Dilger [ 22/Nov/16 ]

Since 2.8 the client does not pass o_lcookie from the MDS at all, it is only used internally by the OSP on the MDS. All of the code to pass the cookie from the MDS to the OSS via the client was removed in patch http://review.whamcloud.com/12922 "LU-6017 obd: remove destroy cookie handling". It was kept from 2.4 to 2.7 for compatibility with older MDSes, but it wasn't expected that 2.8 clients would be used with 2.1 servers.

Even on 2.3 and earlier clients, the cookie is opaque to the client and is only passed through the client from the MDS to the OSS, so the client shouldn't be swabbing this structure.

Do you have the actual LASSERT() that is failing? Is it:

static inline struct llog_ctxt *llog_group_get_ctxt(struct obd_llog_group *olg,
                                                    int index)
{
        struct llog_ctxt *ctxt;

        LASSERT(index >= 0 && index < LLOG_MAX_CTXTS);

This shouldn't be LASSERTing on data from the network. Do you have any idea what values were being sent for index?

I'm not against adding the swabbing on the client, but I don't think that is actually fixing the right problem. It may be that the 2.8 client is just sending garbage values in this field and swabbing won't make any difference.

Comment by Wang Shilong (Inactive) [ 22/Nov/16 ]

Hi Andreas,

Sorry, i did not make it clear enough, Problem Client version is b1_8 based. and We applied this patch in the Server
Side, b1_8 Sparc clients could mount without problems then.

Comment by Wang Shilong (Inactive) [ 22/Nov/16 ]

I checked codes again, In the 1.8 series and 2.1 series branch, I could see o_lcookie passed from Clients to MDS.

[root@localhost lustre-release]# git grep o_lcookie
lustre/include/lustre/lustre_idl.h:        struct llog_cookie      o_lcookie;      /* destroy: unlink cookie from MDS */
lustre/obdclass/obdo.c:                dst->o_lcookie = src->o_lcookie;
lustre/obdfilter/filter.c:                        *fcc = oa->o_lcookie;
lustre/obdfilter/filter.c:                        fcc = &oa->o_lcookie;
lustre/obdfilter/filter.c:                        *fcc = oa->o_lcookie;
lustre/obdfilter/filter_log.c:        oa->o_lcookie = *cookie;
lustre/obdfilter/filter_log.c:        oinfo.oi_oa->o_lcookie = *cookie;
lustre/osc/osc_request.c:                oinfo->oi_oa->o_lcookie = *oti->oti_logcookies;
lustre/osc/osc_request.c:                        *oti->oti_logcookies = oa->o_lcookie;
lustre/osc/osc_request.c:                oa->o_lcookie = *oti->oti_logcookies;
lustre/ost/ost_handler.c:                oti->oti_logcookies = &body->oa.o_lcookie;
lustre/ost/ost_handler.c:        oti->oti_logcookies = &repbody->oa.o_lcookie;
lustre/ptlrpc/wiretest.c:        LASSERTF((int)offsetof(struct obdo, o_lcookie) == 136, " found %lld\n",
lustre/ptlrpc/wiretest.c:                 (long long)(int)offsetof(struct obdo, o_lcookie));
lustre/ptlrpc/wiretest.c:        LASSERTF((int)sizeof(((struct obdo *)0)->o_lcookie) == 32, " found %lld\n",
lustre/ptlrpc/wiretest.c:                 (long long)(int)sizeof(((struct obdo *)0)->o_lcookie));
lustre/utils/wirecheck.c:        CHECK_MEMBER(obdo, o_lcookie);
lustre/utils/wiretest.c:        LASSERTF((int)offsetof(struct obdo, o_lcookie) == 136, " found %lld\n",
lustre/utils/wiretest.c:                 (long long)(int)offsetof(struct obdo, o_lcookie));
lustre/utils/wiretest.c:        LASSERTF((int)sizeof(((struct obdo *)0)->o_lcookie) == 32, " found %lld\n",
lustre/utils/wiretest.c:                 (long long)(int)sizeof(((struct obdo *)0)->o_lcookie));

But in the latest master codes Clients won't pass o_lcookie to server any more.

Comment by Wang Shilong (Inactive) [ 22/Nov/16 ]

Problem dose not exist in the latest master.

Generated at Sat Feb 10 02:21:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.