[LU-2079] Error reading changelog_users file preventing successful changelog setup during init Created: 02/Oct/12  Updated: 29/Nov/12  Resolved: 29/Nov/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Prakash Surya (Inactive) Assignee: Prakash Surya (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: topsequoia

Attachments: Text File systemtap-LU-2079.txt    
Severity: 3
Rank (Obsolete): 4335

 Description   

I'm having issues mounting our 2.3.51-2chaos based FS after rebooting the clients. I see the following messages on the console:

LustreError: 5872:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881009598400 x1414745503563778/t0(0) o101->MGC172.20.5.2@o2ib500@172.20.5.2@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
ib0: no IPv6 routers present
LustreError: 5900:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880810225800 x1414745503563780/t0(0) o101->MGC172.20.5.2@o2ib500@172.20.5.2@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: Skipped 1 previous similar message
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: Skipped 2 previous similar messages
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: Skipped 5 previous similar messages
LustreError: 5954:0:(lmv_obd.c:1190:lmv_statfs()) can't stat MDS #0 (lstest-MDT0000-mdc-ffff88080ba12000), error -11
LustreError: 4827:0:(lov_obd.c:937:lov_cleanup()) lov tgt 385 not cleaned! deathrow=0, lovrc=1
LustreError: 4827:0:(lov_obd.c:937:lov_cleanup()) lov tgt 386 not cleaned! deathrow=1, lovrc=1
Lustre: Unmounted lstest-client
LustreError: 5954:0:(obd_mount.c:2332:lustre_fill_super()) Unable to mount  (-11)

I haven't had time to look into the cause, but thought it might be useful to open an issue about it.



 Comments   
Comment by Peter Jones [ 02/Oct/12 ]

Alex

Who should look into this one?

Peter

Comment by Alex Zhuravlev [ 02/Oct/12 ]

me

Comment by Prakash Surya (Inactive) [ 03/Oct/12 ]

I see many of the following messages on the MDS:

Lustre: lstest-MDT0000: Temporarily refusing client connection from 172.20.4.103@o2ib500
Lustre: Skipped 6094 previous similar messages

So it looks like it's failing this check in target_handle_connect:

 803         if (target->obd_no_conn) {
 804                 cfs_spin_unlock(&target->obd_dev_lock);
 805 
 806                 LCONSOLE_WARN("%s: Temporarily refusing client connection "
 807                               "from %s\n", target->obd_name,
 808                               libcfs_nid2str(req->rq_peer.nid));
 809                 GOTO(out, rc = -EAGAIN);
 810         }
Comment by Prakash Surya (Inactive) [ 03/Oct/12 ]

Digging through all the noise on the console, I think this might be the cause of the issue:

LustreError: 32893:0:(llog_osd.c:227:llog_osd_read_header()) lstest-MDT0000-osd: error reading log header from [0x1:0x3:0x0]: rc = -14
LustreError: 32893:0:(mdd_device.c:411:mdd_changelog_init()) lstest-MDD0000: changelog setup during init failed: rc = -14

My theory is, mdd_changelog_init fails -> mdd_prepare fails -> mdt_prepare fails -> obd_no_conn doesn't get enabled in mdt_prepare.

Comment by Alex Zhuravlev [ 03/Oct/12 ]

the theory is correct.

Comment by Alex Zhuravlev [ 03/Oct/12 ]

Prakash, would you mind to reset rc to 0 in mdd_changelog_init() and mdd->mdd_cl.mc_flags = 0; for a while ? I'm looking at this yet.

Comment by Prakash Surya (Inactive) [ 03/Oct/12 ]

So.. Is this what you're thinking:

diff --git i/lustre/mdd/mdd_device.c w/lustre/mdd/mdd_device.c
index af403b0..7b71958 100644
--- i/lustre/mdd/mdd_device.c
+++ w/lustre/mdd/mdd_device.c
@@ -412,6 +412,11 @@ static int mdd_changelog_init(const struct lu_env *env, struct mdd_device *mdd)
                mdd->mdd_cl.mc_flags |= CLM_ERR;
        }
 
+       if (rc == -EFAULT) { /* LU-2079 */
+               rc = 0;
+               mdd->mdd_cl.mc_flags = 0;
+       }
+
        return rc;
 }
Comment by Alex Zhuravlev [ 03/Oct/12 ]

yes.. for some reason the record llog gets from dmu is shorter than expected.. trying to reproduce locally.

are you using changelogs ?

Comment by Prakash Surya (Inactive) [ 03/Oct/12 ]

are you using changelogs ?

No, not on this filesystem.

Comment by Prakash Surya (Inactive) [ 03/Oct/12 ]

The clients are now able to connect after applying the following patch and rebooting the MDS: http://review.whamcloud.com/4169

Comment by Alex Zhuravlev [ 03/Oct/12 ]

thanks... I was trying to reproduce this mounting orion's filesystem with master's code, but it worked fine... need to think more.
sorry for all this.

Comment by Prakash Surya (Inactive) [ 03/Oct/12 ]

I mentioned this issue to Brian, and he said that EFAULT is often returned if a buffer that is too small is passed in. I haven't verified if that is happening, but it's something to keep in mind. Even if that is the case, it doesn't answer why that happened.

Comment by Alex Zhuravlev [ 04/Oct/12 ]

yes, EFAULT is returned when read is short.. can you try to mount with zfs directly (or find by zdb) and dump changelog_catalog file and attach it here, please?

Comment by Christopher Morrone [ 05/Oct/12 ]

Alex, changelog_catalog is 0 length.

Comment by Alex Zhuravlev [ 08/Oct/12 ]

Chris, to be honest I don't know how could it get corrupted (supposed to be at least 8K from the very beginning).
can you remove the file manually (mount -t zfs ..) ?

Comment by Christopher Morrone [ 08/Oct/12 ]

Since we don't know the cause, and it clearly can happen, I think we'll need a code change to handle this situation.

Comment by Alex Zhuravlev [ 08/Oct/12 ]

no objections

Comment by Alex Zhuravlev [ 23/Oct/12 ]

http://review.whamcloud.com/4376

this is a debug patch, we'd like to see attributes of the object. the idea is that for a reason (literally a bug in the past) the object was created with a wrong type - directory. and now this wrong type lead to a wrong calculation of size.

please, try to mount with the patch and attach kernel messages from MDS.

Comment by Prakash Surya (Inactive) [ 25/Oct/12 ]

Alex, Thanks for the explanation. The patch is applied to our branch and I'll get it installed later today.

Comment by Prakash Surya (Inactive) [ 25/Oct/12 ]

Here you go:

2012-10-25 11:08:00 LustreError: 32687:0:(llog_osd.c:227:llog_osd_read_header()) lstest-MDT0000-osd: error reading log header from [0x1:0x3:0x0]: rc = -14
2012-10-25 11:08:00 LustreError: 32687:0:(llog_osd.c:230:llog_osd_read_header()) attrs: valid 17ff, mode 100644, size 24, block 257
Comment by Alex Zhuravlev [ 06/Nov/12 ]

thanks... the theory was wrong, sorry could you dump that file and attach it here, please? anything like hexdump is good enough. I'm still scratching the head what happened to the file. at the moment another theory is that the header was written partially for a reason.

Comment by Christopher Morrone [ 06/Nov/12 ]

How are you mapping [0x1:0x3:0x0] to changelog_catalog?? This is about as clear as mud in the code. I think I see that the first 0x1 identifies the file as an LLOG, but I don't see the next level of mapping.

And the error message that Prakash shared says "size 24". Is that the size of the file? Because there are files that are size 24, but changelog_catalog is not one of them. As I said back on Oct 5, the changelog_catalog appears to be empty when I mount the filesystem through the posix layer.

Comment by Alex Zhuravlev [ 06/Nov/12 ]

in the directory entry we store dnode (to maintain compatibility with zfs) and fid (which seem to be [0x1:0x3:0x0]), then we lookup fid in OI.

as for the reverse - mdd_changelog_init() uses "changelog_catalog" to name the object.

Comment by Alex Zhuravlev [ 06/Nov/12 ]

it seems few fids in sequence 1 (llog) were re-used (due to step-by-step landing and changes during inspections) and now changelog_catalog share the fid with seq_ctl or seq_srv (iirc, there was a problem with duplicated re-used sequences which is a sign of wrong seq_

{ctl|srv}

). seq_

{srv|ctl}

are 24bytes ...

Comment by Alex Zhuravlev [ 07/Nov/12 ]

I'm developing a patch to verify dnode/fid in direntry agains dinode/fid in OI.

Comment by Alex Zhuravlev [ 08/Nov/12 ]

Guys, could you try with http://review.whamcloud.com/#change,4169 ? this patch replaces previous one. except for the line to ignore errors in mdd_changelog_init() the patch should be OK to land on master branch, I think. if the last theory is confirmed with the patch, then I'll develop one-time fix.

Comment by Christopher Morrone [ 08/Nov/12 ]

Yes, I'll swap in that change.

Comment by Christopher Morrone [ 09/Nov/12 ]

Alex,

2012-11-09 14:37:50 LustreError: 32846:0:(llog_osd.c:227:llog_osd_read_header()) lstest-MDT0000-osd: error reading log header from [0x1:0x3:0x0]: rc = -14
2012-11-09 14:37:50 LustreError: 32846:0:(llog_osd.c:230:llog_osd_read_header()) attrs: valid 17ff, mode 100644, size 24, block 257
2012-11-09 14:37:50 LustreError: 32846:0:(mdd_device.c:410:mdd_changelog_init()) lstest-MDD0000: changelog setup during init failed: rc = -14

I don't see any of the new error messages.

Comment by Prakash Surya (Inactive) [ 13/Nov/12 ]

Alex: Chris and I have speculated that the file in question is actually the CHANGELOG_USERS file and not the CHANGELOG_CATALOG file. I've been looking into the issue some more this morning, and have more evidence this is the case.

Using systemtap, I can see that it's the second call to llog_cat_init_and_process from within mdd_prepare that is failing. So what I think is happening is this call:

                                                                      
 317         rc = llog_open_create(env, ctxt, &ctxt->loc_handle, NULL,          
 318                               CHANGELOG_CATALOG);                          
 319         if (rc)                                                            
 320                 GOTO(out_cleanup, rc);                                     
 321                                                                            
 322         ctxt->loc_handle->lgh_logops->lop_add = llog_cat_add_rec;          
 323         ctxt->loc_handle->lgh_logops->lop_declare_add =                    
 324                                         llog_cat_declare_add_rec;          
 325                                                                            
 326         rc = llog_cat_init_and_process(env, ctxt->loc_handle);             
 327         if (rc)                                                            
 328                 GOTO(out_close, rc);                                       

is succeeding, as can be seen by the systemtap output:

                                                                      
     0 mount.lustre(51462):->llog_open_create env=0xffff880f5d263bb8 ctxt=0xffff880f80491b00 res=0xffff880f80491b40 logid=0x0 name=0xffffffffa0ee3292
...                                                                             
   548 mount.lustre(51462):<-llog_open_create return=0x0                        
     0 mount.lustre(51462):->llog_cat_init_and_process env=0xffff880f5d263bb8 llh=0xffff880f7dc843c0
...                                                                             
   367 mount.lustre(51462):<-llog_cat_init_and_process return=0x0               

But then later, this is failing:

                                                                      
 354         rc = llog_open_create(env, uctxt, &uctxt->loc_handle, NULL,        
 355                               CHANGELOG_USERS);                            
 356         if (rc)                                                            
 357                 GOTO(out_ucleanup, rc);                                    
 358                                                                            
 359         uctxt->loc_handle->lgh_logops->lop_add = llog_cat_add_rec;         
 360         uctxt->loc_handle->lgh_logops->lop_declare_add = llog_cat_declare_add_rec;
 361                                                                            
 362         rc = llog_cat_init_and_process(env, uctxt->loc_handle);            
 363         if (rc)                                                            
 364                 GOTO(out_uclose, rc);                                      

as you can see here:

                                                                      
     0 mount.lustre(51462):->llog_open_create env=0xffff880f5d263bb8 ctxt=0xffff880facc13740 res=0xffff880facc13780 logid=0x0 name=0xffffffffa0ee32ba
...                                                                             
   498 mount.lustre(51462):<-llog_open_create return=0x0                        
     0 mount.lustre(51462):->llog_cat_init_and_process env=0xffff880f5d263bb8 llh=0xffff880e283cb900
...                                                                             
 25647 mount.lustre(51462):<-llog_cat_init_and_process return=0xfffffffffffffff2

I think the dmu_read call is successful, but since the CATALOG_USERS file is only 24 bytes in length, dt_read reports an error (osd_read returns 24 which != LLOG_CHUNK_SIZE).

So, why is the CATALOG_USERS file 24 bytes in length when dt_read is expecting it to be LLOG_CHUNK_SIZE bytes?

Here is the hexdump of the CATALOG_USERS file, in case it's useful:

                                                                      
# grove-mds2 /mnt/grove-mds2/mdt0 > hexdump changelog_catalog                   
# grove-mds2 /mnt/grove-mds2/mdt0 > hexdump changelog_users                     
0000000 0bd0 0000 0002 0000 ffff ffff ffff ffff                                 
0000010 0000 0000 0000 0000                                                     
0000018                                                                         
Comment by Prakash Surya (Inactive) [ 13/Nov/12 ]

Here's the full systemtap log I gathered by running these on the MDS:

stap -DSTP_NO_OVERLOAD /usr/share/doc/systemtap-1.6/examples/general/para-callgraph.stp 'module("obdclass").function("*")' 'module("mdd").function("mdd_prepare")'

And this in another shell:

/etc/init.d/lustre start
Comment by Prakash Surya (Inactive) [ 13/Nov/12 ]

And just to be completely sure, here's the string passed to llog_open_create just prior to the failed llog_cat_init_and_process call:

crash> p (char *)0xffffffffa0ee32ba
$3 = 0xffffffffa0ee32ba "changelog_users"
Comment by Prakash Surya (Inactive) [ 13/Nov/12 ]

Buried in all the console noise, I managed to find this message:

2012-11-13 14:05:03 LustreError: 53596:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2
Comment by Alex Zhuravlev [ 14/Nov/12 ]

thanks for the help guys. unfortunately it doesn't help very much that it's not changelog, but changelog_users - still the file is corrupted and I can't prove the root cause, sorry.

this is how the first bytes of a llog should look like:

0000000 2000 0000 0000 0000 5539 1064 0000 0000
0000010 9f4d 50a0 0000 0000 0013 0000 0058 0000
0000020 0000 0000 0004 0000 0000 0000 6f63 666e

struct llog_rec_hdr {
__u32 lrh_len;
__u32 lrh_index;
__u32 lrh_type;
__u32 lrh_id;
};

notice lrh_len=2000 (first 8K header of any llog)
lrh_type=10645539 (LLOG_HDR_MAGIC = LLOG_OP_MAGIC | 0x45539)

the good news is that the filesystem itself seem to be consistent (given no bad messages from the latest patch).
at least OI is not broken, there is no duplicate fids, etc.

so having this I'd suggest to remove changelog_users manually (or I can make a patch to do so at mount time).

Comment by Prakash Surya (Inactive) [ 19/Nov/12 ]

Alex, what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

Comment by Prakash Surya (Inactive) [ 19/Nov/12 ]

Alex, a couple more questions when you have some time:

You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? I ask because those two look very similar:

                                                                         
# grovei /tftpboot/dumps/surya1/mdt0 > hexdump seq_ctl                             
0000000 0400 4000 0002 0000 ffff ffff ffff ffff                                    
0000010 0000 0000 0000 0000                                                        
0000018                                                                            
# grovei /tftpboot/dumps/surya1/mdt0 > hexdump changelog_users                     
0000000 0bd0 0000 0002 0000 ffff ffff ffff ffff                                    
0000010 0000 0000 0000 0000                                                        
0000018                                                                            

It seems like too much of a coincidence that changelog_users is exactly the size of struct lu_seq_range (which I believe seq_ctl contains) and has very similar contents.

Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty:

                                                                      
$ hexdump changelog_users                                                       
$ hexdump changelog_catalog                                                     
                                                                                
$ stat changelog_catalog                                                        
  File: `changelog_catalog'                                                     
  Size: 0               Blocks: 1          IO Block: 131072 regular empty file  
Device: 1fh/31d Inode: 191         Links: 2                                     
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)        
Access: 2012-11-19 13:33:36.328821000 -0800                                     
Modify: 1969-12-31 16:00:00.846817000 -0800                                     
Change: 1969-12-31 16:00:00.846817000 -0800                                     
                                                                                
$ stat changelog_users                                                          
  File: `changelog_users'                                                       
  Size: 0               Blocks: 1          IO Block: 131072 regular empty file  
Device: 1fh/31d Inode: 192         Links: 2                                     
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)        
Access: 2012-11-19 13:33:32.722166000 -0800                                     
Modify: 1969-12-31 16:00:00.846817000 -0800                                     
Change: 1969-12-31 16:00:00.846817000 -0800                                     

Is this normal?

Comment by Alex Zhuravlev [ 20/Nov/12 ]

> what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

given your filesystem is in use for quite long and LMA was not always set in Orion, I think it's OK to see this message on some objects.
though this also mean we can't verify OI in this case.

> You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files?

yes, I think that was possible.

> Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty:

this is because changelog was not used. the both files are supposed to be empty in this case. but any record written should grow them to 8K+

how often do you see "can't get LMA" ?

Comment by Prakash Surya (Inactive) [ 20/Nov/12 ]

how often do you see "can't get LMA" ?

Looking back at the logs, it looks like we've seen it about 15 times on the test MDS for 3 distinct FIDs

LustreError: 33431:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2
LustreError: 33231:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x3:0x0]: rc = -2
LustreError: 33244:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x4:0x0]: rc = -2

I see it on the production OSTs frequently:

<ConMan> Console [grove214] log at 2012-11-16 04:00:00 PST.
2012-11-16 04:18:37 LustreError: 5777:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac1:0x0]: rc = -2
2012-11-16 04:19:10 LustreError: 7522:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac4:0x0]: rc = -2
2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10aca:0x0]: rc = -2
2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) Skipped 3 previous similar messages
2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ad7:0x0]: rc = -2
2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) Skipped 5 previous similar messages

<ConMan> Console [grove214] log at 2012-11-16 05:00:00 PST.

When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

Comment by Alex Zhuravlev [ 20/Nov/12 ]

>> When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

commit 155e4b6cf45cc0ab21f72d94e5cccbd7a0939c58
Author: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Date: Tue Oct 2 23:52:42 2012 +0400

LU-2075 fld: use predefined FIDs

and let OSD do mapping to the names internally.

so, during the landing process we returned back to the schema when LMA is set by OSD itself (otherwise we'll have to set it in many places, in contrast with just osd-zfs and osd-ldiskfs now). so now every object created with OSD API is supposed to have LMA (which later can be used by LFSCK, for example).

Comment by Prakash Surya (Inactive) [ 20/Nov/12 ]

OK. That landed between 2.3.51 and 2.3.52.. We started seeing the message when we upgraded the test system to 2.3.51-Xchaos (from orion-2_3_49_92_1-72chaos). We haven't seen it on our production Grove FS, but were much more conservative with it's upgrade process, jumping from orion-2_3_49_54_2-68chaos to 2.3.54-6chaos.

I think I'm going to chalk this up to the FIDs being shared unless we have evidence to the contrary. I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce".

Also, what does LMA stand for and/or what's its purpose? Just curious.

Comment by Alex Zhuravlev [ 21/Nov/12 ]

> I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce"
it should be OK to just truncate it

> Also, what does LMA stand for and/or what's its purpose? Just curious.
it stands for lustre metadata attributes (struct lustre_mdt_attrs)

Comment by Prakash Surya (Inactive) [ 29/Nov/12 ]

Alex, I reverted 4376, reverted 4169, and truncated the changelog_users file to be zero length. Things are back online and look healthy, so I'll go ahead and resolve this issue. Thanks for the help!

Comment by Prakash Surya (Inactive) [ 29/Nov/12 ]

I believe the issue was a bug in previous versions of Lustre which has been detailed in the comments of this issue. It has been resolved and deemed fixed since v2.3.52.

Generated at Sat Feb 10 01:22:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.