[LU-2079] Error reading changelog_users file preventing successful changelog setup during init - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.4.0
Labels:
- topsequoia

Severity:
3
Rank (Obsolete):
4335

Description

I'm having issues mounting our 2.3.51-2chaos based FS after rebooting the clients. I see the following messages on the console:

LustreError: 5872:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881009598400 x1414745503563778/t0(0) o101->MGC172.20.5.2@o2ib500@172.20.5.2@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
ib0: no IPv6 routers present
LustreError: 5900:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880810225800 x1414745503563780/t0(0) o101->MGC172.20.5.2@o2ib500@172.20.5.2@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: Skipped 1 previous similar message
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: Skipped 2 previous similar messages
LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
LustreError: Skipped 5 previous similar messages
LustreError: 5954:0:(lmv_obd.c:1190:lmv_statfs()) can't stat MDS #0 (lstest-MDT0000-mdc-ffff88080ba12000), error -11
LustreError: 4827:0:(lov_obd.c:937:lov_cleanup()) lov tgt 385 not cleaned! deathrow=0, lovrc=1
LustreError: 4827:0:(lov_obd.c:937:lov_cleanup()) lov tgt 386 not cleaned! deathrow=1, lovrc=1
Lustre: Unmounted lstest-client
LustreError: 5954:0:(obd_mount.c:2332:lustre_fill_super()) Unable to mount  (-11)

I haven't had time to look into the cause, but thought it might be useful to open an issue about it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

systemtap-LU-2079.txt
212 kB
13/Nov/12 4:18 PM

Activity

[LU-2079] Error reading changelog_users file preventing successful changelog setup during init

Prakash Surya (Inactive) added a comment - 29/Nov/12 9:02 PM

I believe the issue was a bug in previous versions of Lustre which has been detailed in the comments of this issue. It has been resolved and deemed fixed since v2.3.52.

Prakash Surya (Inactive) added a comment - 29/Nov/12 9:02 PM I believe the issue was a bug in previous versions of Lustre which has been detailed in the comments of this issue. It has been resolved and deemed fixed since v2.3.52.

Prakash Surya (Inactive) added a comment - 29/Nov/12 8:55 PM - edited

Alex, I reverted 4376, reverted 4169, and truncated the changelog_users file to be zero length. Things are back online and look healthy, so I'll go ahead and resolve this issue. Thanks for the help!

Prakash Surya (Inactive) added a comment - 29/Nov/12 8:55 PM - edited Alex, I reverted 4376 , reverted 4169 , and truncated the changelog_users file to be zero length. Things are back online and look healthy, so I'll go ahead and resolve this issue. Thanks for the help!

Alex Zhuravlev added a comment - 21/Nov/12 12:19 AM

> I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce"
it should be OK to just truncate it

> Also, what does LMA stand for and/or what's its purpose? Just curious.
it stands for lustre metadata attributes (struct lustre_mdt_attrs)

Alex Zhuravlev added a comment - 21/Nov/12 12:19 AM > I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce" it should be OK to just truncate it > Also, what does LMA stand for and/or what's its purpose? Just curious. it stands for lustre metadata attributes (struct lustre_mdt_attrs)

Prakash Surya (Inactive) added a comment - 20/Nov/12 5:08 PM

OK. That landed between 2.3.51 and 2.3.52.. We started seeing the message when we upgraded the test system to 2.3.51-Xchaos (from orion-2_3_49_92_1-72chaos). We haven't seen it on our production Grove FS, but were much more conservative with it's upgrade process, jumping from orion-2_3_49_54_2-68chaos to 2.3.54-6chaos.

I think I'm going to chalk this up to the FIDs being shared unless we have evidence to the contrary. I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce".

Also, what does LMA stand for and/or what's its purpose? Just curious.

Prakash Surya (Inactive) added a comment - 20/Nov/12 5:08 PM OK. That landed between 2.3.51 and 2.3.52.. We started seeing the message when we upgraded the test system to 2.3.51-Xchaos (from orion-2_3_49_92_1-72chaos). We haven't seen it on our production Grove FS, but were much more conservative with it's upgrade process, jumping from orion-2_3_49_54_2-68chaos to 2.3.54-6chaos. I think I'm going to chalk this up to the FIDs being shared unless we have evidence to the contrary. I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce". Also, what does LMA stand for and/or what's its purpose? Just curious.

Alex Zhuravlev added a comment - 20/Nov/12 3:53 PM

>> When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

commit 155e4b6cf45cc0ab21f72d94e5cccbd7a0939c58
Author: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Date: Tue Oct 2 23:52:42 2012 +0400

~~LU-2075~~ fld: use predefined FIDs

and let OSD do mapping to the names internally.

so, during the landing process we returned back to the schema when LMA is set by OSD itself (otherwise we'll have to set it in many places, in contrast with just osd-zfs and osd-ldiskfs now). so now every object created with OSD API is supposed to have LMA (which later can be used by LFSCK, for example).

Alex Zhuravlev added a comment - 20/Nov/12 3:53 PM >> When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify. commit 155e4b6cf45cc0ab21f72d94e5cccbd7a0939c58 Author: Alex Zhuravlev <alexey.zhuravlev@intel.com> Date: Tue Oct 2 23:52:42 2012 +0400 LU-2075 fld: use predefined FIDs and let OSD do mapping to the names internally. so, during the landing process we returned back to the schema when LMA is set by OSD itself (otherwise we'll have to set it in many places, in contrast with just osd-zfs and osd-ldiskfs now). so now every object created with OSD API is supposed to have LMA (which later can be used by LFSCK, for example).

Prakash Surya (Inactive) added a comment - 20/Nov/12 3:41 PM

how often do you see "can't get LMA" ?

Looking back at the logs, it looks like we've seen it about 15 times on the test MDS for 3 distinct FIDs

LustreError: 33431:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2

LustreError: 33231:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x3:0x0]: rc = -2

LustreError: 33244:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x4:0x0]: rc = -2

I see it on the production OSTs frequently:

<ConMan> Console [grove214] log at 2012-11-16 04:00:00 PST.
2012-11-16 04:18:37 LustreError: 5777:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac1:0x0]: rc = -2
2012-11-16 04:19:10 LustreError: 7522:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac4:0x0]: rc = -2
2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10aca:0x0]: rc = -2
2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) Skipped 3 previous similar messages
2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ad7:0x0]: rc = -2
2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) Skipped 5 previous similar messages

<ConMan> Console [grove214] log at 2012-11-16 05:00:00 PST.

When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

Prakash Surya (Inactive) added a comment - 20/Nov/12 3:41 PM how often do you see "can't get LMA" ? Looking back at the logs, it looks like we've seen it about 15 times on the test MDS for 3 distinct FIDs LustreError: 33431:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2 LustreError: 33231:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x3:0x0]: rc = -2 LustreError: 33244:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x4:0x0]: rc = -2 I see it on the production OSTs frequently: <ConMan> Console [grove214] log at 2012-11-16 04:00:00 PST. 2012-11-16 04:18:37 LustreError: 5777:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac1:0x0]: rc = -2 2012-11-16 04:19:10 LustreError: 7522:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac4:0x0]: rc = -2 2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10aca:0x0]: rc = -2 2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) Skipped 3 previous similar messages 2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ad7:0x0]: rc = -2 2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) Skipped 5 previous similar messages <ConMan> Console [grove214] log at 2012-11-16 05:00:00 PST. When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

Alex Zhuravlev added a comment - 20/Nov/12 3:05 PM

> what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

given your filesystem is in use for quite long and LMA was not always set in Orion, I think it's OK to see this message on some objects.
though this also mean we can't verify OI in this case.

> You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files?

yes, I think that was possible.

> Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty:

this is because changelog was not used. the both files are supposed to be empty in this case. but any record written should grow them to 8K+

how often do you see "can't get LMA" ?

Alex Zhuravlev added a comment - 20/Nov/12 3:05 PM > what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"? given your filesystem is in use for quite long and LMA was not always set in Orion, I think it's OK to see this message on some objects. though this also mean we can't verify OI in this case. > You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? yes, I think that was possible. > Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty: this is because changelog was not used. the both files are supposed to be empty in this case. but any record written should grow them to 8K+ how often do you see "can't get LMA" ?

Prakash Surya (Inactive) added a comment - 19/Nov/12 4:57 PM - edited

Alex, a couple more questions when you have some time:

You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? I ask because those two look very similar:

                                                                         
# grovei /tftpboot/dumps/surya1/mdt0 > hexdump seq_ctl                             
0000000 0400 4000 0002 0000 ffff ffff ffff ffff                                    
0000010 0000 0000 0000 0000                                                        
0000018                                                                            
# grovei /tftpboot/dumps/surya1/mdt0 > hexdump changelog_users                     
0000000 0bd0 0000 0002 0000 ffff ffff ffff ffff                                    
0000010 0000 0000 0000 0000                                                        
0000018

It seems like too much of a coincidence that changelog_users is exactly the size of struct lu_seq_range (which I believe seq_ctl contains) and has very similar contents.

Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty:

                                                                      
$ hexdump changelog_users                                                       
$ hexdump changelog_catalog                                                     
                                                                                
$ stat changelog_catalog                                                        
  File: `changelog_catalog'                                                     
  Size: 0               Blocks: 1          IO Block: 131072 regular empty file  
Device: 1fh/31d Inode: 191         Links: 2                                     
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)        
Access: 2012-11-19 13:33:36.328821000 -0800                                     
Modify: 1969-12-31 16:00:00.846817000 -0800                                     
Change: 1969-12-31 16:00:00.846817000 -0800                                     
                                                                                
$ stat changelog_users                                                          
  File: `changelog_users'                                                       
  Size: 0               Blocks: 1          IO Block: 131072 regular empty file  
Device: 1fh/31d Inode: 192         Links: 2                                     
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)        
Access: 2012-11-19 13:33:32.722166000 -0800                                     
Modify: 1969-12-31 16:00:00.846817000 -0800                                     
Change: 1969-12-31 16:00:00.846817000 -0800

Is this normal?

Prakash Surya (Inactive) added a comment - 19/Nov/12 4:57 PM - edited Alex, a couple more questions when you have some time: You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? I ask because those two look very similar: # grovei /tftpboot/dumps/surya1/mdt0 > hexdump seq_ctl 0000000 0400 4000 0002 0000 ffff ffff ffff ffff 0000010 0000 0000 0000 0000 0000018 # grovei /tftpboot/dumps/surya1/mdt0 > hexdump changelog_users 0000000 0bd0 0000 0002 0000 ffff ffff ffff ffff 0000010 0000 0000 0000 0000 0000018 It seems like too much of a coincidence that changelog_users is exactly the size of struct lu_seq_range (which I believe seq_ctl contains) and has very similar contents. Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty: $ hexdump changelog_users $ hexdump changelog_catalog $ stat changelog_catalog File: `changelog_catalog' Size: 0 Blocks: 1 IO Block: 131072 regular empty file Device: 1fh/31d Inode: 191 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-11-19 13:33:36.328821000 -0800 Modify: 1969-12-31 16:00:00.846817000 -0800 Change: 1969-12-31 16:00:00.846817000 -0800 $ stat changelog_users File: `changelog_users' Size: 0 Blocks: 1 IO Block: 131072 regular empty file Device: 1fh/31d Inode: 192 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-11-19 13:33:32.722166000 -0800 Modify: 1969-12-31 16:00:00.846817000 -0800 Change: 1969-12-31 16:00:00.846817000 -0800 Is this normal?

Prakash Surya (Inactive) added a comment - 19/Nov/12 4:01 PM

Alex, what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

Prakash Surya (Inactive) added a comment - 19/Nov/12 4:01 PM Alex, what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

Alex Zhuravlev added a comment - 14/Nov/12 5:22 AM

thanks for the help guys. unfortunately it doesn't help very much that it's not changelog, but changelog_users - still the file is corrupted and I can't prove the root cause, sorry.

this is how the first bytes of a llog should look like:

0000000 2000 0000 0000 0000 5539 1064 0000 0000
0000010 9f4d 50a0 0000 0000 0013 0000 0058 0000
0000020 0000 0000 0004 0000 0000 0000 6f63 666e

struct llog_rec_hdr {
__u32 lrh_len;
__u32 lrh_index;
__u32 lrh_type;
__u32 lrh_id;
};

notice lrh_len=2000 (first 8K header of any llog)
lrh_type=10645539 (LLOG_HDR_MAGIC = LLOG_OP_MAGIC | 0x45539)

the good news is that the filesystem itself seem to be consistent (given no bad messages from the latest patch).
at least OI is not broken, there is no duplicate fids, etc.

so having this I'd suggest to remove changelog_users manually (or I can make a patch to do so at mount time).

Alex Zhuravlev added a comment - 14/Nov/12 5:22 AM thanks for the help guys. unfortunately it doesn't help very much that it's not changelog, but changelog_users - still the file is corrupted and I can't prove the root cause, sorry. this is how the first bytes of a llog should look like: 0000000 2000 0000 0000 0000 5539 1064 0000 0000 0000010 9f4d 50a0 0000 0000 0013 0000 0058 0000 0000020 0000 0000 0004 0000 0000 0000 6f63 666e struct llog_rec_hdr { __u32 lrh_len; __u32 lrh_index; __u32 lrh_type; __u32 lrh_id; }; notice lrh_len=2000 (first 8K header of any llog) lrh_type=10645539 (LLOG_HDR_MAGIC = LLOG_OP_MAGIC | 0x45539) the good news is that the filesystem itself seem to be consistent (given no bad messages from the latest patch). at least OI is not broken, there is no duplicate fids, etc. so having this I'd suggest to remove changelog_users manually (or I can make a patch to do so at mount time).

Prakash Surya (Inactive) added a comment - 13/Nov/12 8:05 PM

Buried in all the console noise, I managed to find this message:

2012-11-13 14:05:03 LustreError: 53596:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2

Prakash Surya (Inactive) added a comment - 13/Nov/12 8:05 PM Buried in all the console noise, I managed to find this message: 2012-11-13 14:05:03 LustreError: 53596:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2

People

Assignee:: Prakash Surya (Inactive)

Reporter:: Prakash Surya (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 02/Oct/12 3:48 PM

Updated:: 29/Nov/12 9:02 PM

Resolved:: 29/Nov/12 9:02 PM