[LU-3587] CR_MAXSIZE is too small Created: 16/Jul/13  Updated: 17/Nov/13  Resolved: 08/Oct/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.5.0, Lustre 2.4.2

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9102

 Description   

One of our Lustre 2.4.0 based clients is hitting an assertion when we run robinhood.

2013-07-09 14:35:15 vrelic8 login: LustreError: 5467:0:(mdc_request.c:1500:changelog_kuc_hdr()) ASSERTION( len <= cfs_size_round(2*255 + 1 + sizeof(struct changelog_rec)) ) failed:
2013-07-09 14:41:07 LustreError: 5467:0:(mdc_request.c:1500:changelog_kuc_hdr()) LBUG

Backtrace of the crashed thread:

PID: 5467   TASK: ffff885c8668eaa0  CPU: 17  COMMAND: "mdc_clg_send_th"
 #0 [ffff885c866779e8] machine_kexec at ffffffff81035bfb
 #1 [ffff885c86677a48] crash_kexec at ffffffff810c0932
 #2 [ffff885c86677b18] panic at ffffffff8150d943
 #3 [ffff885c86677b98] lbug_with_loc at ffffffffa042bf4b [libcfs]
 #4 [ffff885c86677bb8] changelog_kuc_hdr at ffffffffa096bf3e [mdc]
 #5 [ffff885c86677bc8] changelog_kkuc_cb at ffffffffa096d28c [mdc]
 #6 [ffff885c86677bf8] llog_process_thread at ffffffffa05e110b [obdclass]
 #7 [ffff885c86677ca8] llog_process_or_fork at ffffffffa05e2e1d [obdclass]
 #8 [ffff885c86677cf8] llog_cat_process_cb at ffffffffa05e560a [obdclass]
 #9 [ffff885c86677d58] llog_process_thread at ffffffffa05e110b [obdclass]
#10 [ffff885c86677e08] llog_process_or_fork at ffffffffa05e2e1d [obdclass]
#11 [ffff885c86677e58] llog_cat_process_or_fork at ffffffffa05e3ef9 [obdclass]
#12 [ffff885c86677ee8] llog_cat_process at ffffffffa05e41d9 [obdclass]
#13 [ffff885c86677f08] mdc_changelog_send_thread at ffffffffa097157b [mdc]
#14 [ffff885c86677f48] kernel_thread at ffffffff8100c0ca

Digging into the crash dump, the values on the stack did not appear to contain either the "len" variable calculated by changelog_kkuc_cb(). But the problem was easily reproducible, so I added a debug message and tried again. This told me:

2013-07-15 11:27:32 LustreError: 5472:0:(mdc_request.c:1504:changelog_kuc_hdr()) ASSERTION( len <= CR_MAXSIZE ) failed: CR_MAXSIZE=576, len=588, rec=ffff885ce8c9a148, changelog_rec_size=96
2013-07-15 11:27:32 LustreError: 5472:0:(mdc_request.c:1504:changelog_kuc_hdr()) LBUG

With a pointer to the llog_changelog_rec function, I could print it out in crash:

crash> print *(struct llog_changelog_rec *)0xffff885ce8c9a148
$4 = {
  cr_hdr = {
    lrh_len = 608,
    lrh_index = 29516,
    lrh_type = 275120128,
    padding = 0
  },
  cr = {
    cr_namelen = 484,
    cr_flags = 8192,
    cr_type = 8,
    cr_index = 80023354,
    cr_prev = 0,
    cr_time = 1474100508289501189,
    {
      cr_tfid = {
        f_seq = 0,
        f_oid = 0,
        f_ver = 0
      },
      cr_markerflags = 0
    },
    cr_pfid = {
      f_seq = 8589961859,
      f_oid = 28956,
      f_ver = 0
    },
    cr_name = 0xffff885ce8c9a198 "\263j"
  },
  cr_tail = {
    lrt_len = 27315,
    lrt_index = 2
  }
}

So here is what I understand:

  • cr_type tells us that this is a CL_RENAME
  • cr_flags tells us that cr is a struct changelog_ext_rec instead of the normal struct changelog_rec
  • cr_namelen is large at 484

Note that cr_name looks corrupt, but is not. It is merely because crash does not know that we have an undeclared union here. If I manually add 32 bytes to the cr_namelen pointer, I am able to see a correct filename:

crash> print (((struct llog_changelog_rec *)0xffff885ce8c9a148)->cr).cr_name+32

I won't share the users's file name here, it is sufficient to say that it is 245 characters long, not including the terminating NULL character. If I then add 246 characters, I can see a second filename:

crash> print (((struct llog_changelog_rec *)0xffff885ce8c9a148)->cr).cr_name+32+246

This filename is nearly the same as the first one, but the trailing string of "-failed" was removed, resulting in a string 238 characters long, not including the terminating null character. Doing the math:

245+238+1 = 484

OK, that matches cr_len! Now I think I see why CR_MAXSIZE starts out with "2*LEN_MAX + 1". But I don't understand it. Why are we only accounting for one of the null characters??

But the root of the bug appears to be that CR_MAXSIZE only accounts for the size of the smaller "sizeof(struct changelog_rec)", when it should be using "sizeof(struct changelog_ext_rec)". Those extra 32 bytes would have been enough to store this structure, and avoid the bug.

I can change CR_MAXSIZE to do the right thing, and I will push a patch to do so soon. I have not yet fully checked to see if changing CR_MAXSIZE could break anything else.



 Comments   
Comment by Christopher Morrone [ 16/Jul/13 ]

Pushed a possible solution in change http://review.whamcloud.com/6993.

I will test it some time tomorrow.

Comment by Lai Siyao [ 16/Jul/13 ]

CR_MAXSIZE starts out with "2*NAME_MAX +1" is because a null character is used as delimiter between two names, and we've already stored the total name length of the two names plus the delimiter in rec->cr_namelen, and we always access the second name with length.

The fix looks correct, and it will be great if a sanity test is added for this.

Comment by Christopher Morrone [ 22/Jul/13 ]

FYI, change 6993 did address the problem, and we are using it on nodes where we run robinhood.

Comment by Christopher Morrone [ 20/Aug/13 ]

Note that there is an additional problem, beyond the incorrectly defined CR_MAXSIZE. The KUC code uses a size for a buffer that is basically sizeof(struct kuc_hdr) + CR_MAXSIZE, but the assertion does not take into account the struct kuc_hdr size, only CR_MAXSIZE.

I have a second patch that I will push shortly.

Comment by Christopher Morrone [ 20/Aug/13 ]

Second patch is:

http://review.whamcloud.com/7406

Comment by Christopher Morrone [ 20/Aug/13 ]

Also, as has already been mentioned, it would be very nice to have a new sanity test that tests a rename of a file name 255 characters long to another 255 character long file name, and then read the changelog that results. I won't have time to work on that test.

Comment by Peter Jones [ 24/Sep/13 ]

The patch has landed to master, but has a test been created yet?

Comment by Peter Jones [ 24/Sep/13 ]

Bob

Could you please look into creating a suitable test?

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 24/Sep/13 ]

added sanity test
http://review.whamcloud.com/7751

Comment by Peter Jones [ 08/Oct/13 ]

Landed for 2.5

Comment by Bob Glossman (Inactive) [ 04/Nov/13 ]

backports for b2_4:
http://review.whamcloud.com/8170
http://review.whamcloud.com/8171

Generated at Sat Feb 10 01:35:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.