Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2079

Error reading changelog_users file preventing successful changelog setup during init

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • None
    • Lustre 2.4.0
    • 3
    • 4335

    Description

      I'm having issues mounting our 2.3.51-2chaos based FS after rebooting the clients. I see the following messages on the console:

      LustreError: 5872:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff881009598400 x1414745503563778/t0(0) o101->MGC172.20.5.2@o2ib500@172.20.5.2@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      ib0: no IPv6 routers present
      LustreError: 5900:0:(client.c:1116:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880810225800 x1414745503563780/t0(0) o101->MGC172.20.5.2@o2ib500@172.20.5.2@o2ib500:26/25 lens 328/384 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: Skipped 1 previous similar message
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: Skipped 2 previous similar messages
      LustreError: 11-0: lstest-MDT0000-mdc-ffff881029e3c800: Communicating with 172.20.5.2@o2ib500, operation mds_connect failed with -11
      LustreError: Skipped 5 previous similar messages
      LustreError: 5954:0:(lmv_obd.c:1190:lmv_statfs()) can't stat MDS #0 (lstest-MDT0000-mdc-ffff88080ba12000), error -11
      LustreError: 4827:0:(lov_obd.c:937:lov_cleanup()) lov tgt 385 not cleaned! deathrow=0, lovrc=1
      LustreError: 4827:0:(lov_obd.c:937:lov_cleanup()) lov tgt 386 not cleaned! deathrow=1, lovrc=1
      Lustre: Unmounted lstest-client
      LustreError: 5954:0:(obd_mount.c:2332:lustre_fill_super()) Unable to mount  (-11)
      

      I haven't had time to look into the cause, but thought it might be useful to open an issue about it.

      Attachments

        Activity

          [LU-2079] Error reading changelog_users file preventing successful changelog setup during init

          I believe the issue was a bug in previous versions of Lustre which has been detailed in the comments of this issue. It has been resolved and deemed fixed since v2.3.52.

          prakash Prakash Surya (Inactive) added a comment - I believe the issue was a bug in previous versions of Lustre which has been detailed in the comments of this issue. It has been resolved and deemed fixed since v2.3.52.
          prakash Prakash Surya (Inactive) added a comment - - edited

          Alex, I reverted 4376, reverted 4169, and truncated the changelog_users file to be zero length. Things are back online and look healthy, so I'll go ahead and resolve this issue. Thanks for the help!

          prakash Prakash Surya (Inactive) added a comment - - edited Alex, I reverted 4376 , reverted 4169 , and truncated the changelog_users file to be zero length. Things are back online and look healthy, so I'll go ahead and resolve this issue. Thanks for the help!

          > I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce"
          it should be OK to just truncate it

          > Also, what does LMA stand for and/or what's its purpose? Just curious.
          it stands for lustre metadata attributes (struct lustre_mdt_attrs)

          bzzz Alex Zhuravlev added a comment - > I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce" it should be OK to just truncate it > Also, what does LMA stand for and/or what's its purpose? Just curious. it stands for lustre metadata attributes (struct lustre_mdt_attrs)

          OK. That landed between 2.3.51 and 2.3.52.. We started seeing the message when we upgraded the test system to 2.3.51-Xchaos (from orion-2_3_49_92_1-72chaos). We haven't seen it on our production Grove FS, but were much more conservative with it's upgrade process, jumping from orion-2_3_49_54_2-68chaos to 2.3.54-6chaos.

          I think I'm going to chalk this up to the FIDs being shared unless we have evidence to the contrary. I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce".

          Also, what does LMA stand for and/or what's its purpose? Just curious.

          prakash Prakash Surya (Inactive) added a comment - OK. That landed between 2.3.51 and 2.3.52.. We started seeing the message when we upgraded the test system to 2.3.51-Xchaos (from orion-2_3_49_92_1-72chaos). We haven't seen it on our production Grove FS, but were much more conservative with it's upgrade process, jumping from orion-2_3_49_54_2-68chaos to 2.3.54-6chaos. I think I'm going to chalk this up to the FIDs being shared unless we have evidence to the contrary. I'll plan to remove or truncate the file to zero length (does it matter?), and if that goes fine, we can close this ticket as "cannot reproduce". Also, what does LMA stand for and/or what's its purpose? Just curious.

          >> When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

          commit 155e4b6cf45cc0ab21f72d94e5cccbd7a0939c58
          Author: Alex Zhuravlev <alexey.zhuravlev@intel.com>
          Date: Tue Oct 2 23:52:42 2012 +0400

          LU-2075 fld: use predefined FIDs

          and let OSD do mapping to the names internally.

          so, during the landing process we returned back to the schema when LMA is set by OSD itself (otherwise we'll have to set it in many places, in contrast with just osd-zfs and osd-ldiskfs now). so now every object created with OSD API is supposed to have LMA (which later can be used by LFSCK, for example).

          bzzz Alex Zhuravlev added a comment - >> When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify. commit 155e4b6cf45cc0ab21f72d94e5cccbd7a0939c58 Author: Alex Zhuravlev <alexey.zhuravlev@intel.com> Date: Tue Oct 2 23:52:42 2012 +0400 LU-2075 fld: use predefined FIDs and let OSD do mapping to the names internally. so, during the landing process we returned back to the schema when LMA is set by OSD itself (otherwise we'll have to set it in many places, in contrast with just osd-zfs and osd-ldiskfs now). so now every object created with OSD API is supposed to have LMA (which later can be used by LFSCK, for example).

          how often do you see "can't get LMA" ?

          Looking back at the logs, it looks like we've seen it about 15 times on the test MDS for 3 distinct FIDs

          LustreError: 33431:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2
          
          LustreError: 33231:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x3:0x0]: rc = -2
          
          LustreError: 33244:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x4:0x0]: rc = -2
          

          I see it on the production OSTs frequently:

          <ConMan> Console [grove214] log at 2012-11-16 04:00:00 PST.
          2012-11-16 04:18:37 LustreError: 5777:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac1:0x0]: rc = -2
          2012-11-16 04:19:10 LustreError: 7522:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac4:0x0]: rc = -2
          2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10aca:0x0]: rc = -2
          2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) Skipped 3 previous similar messages
          2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ad7:0x0]: rc = -2
          2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) Skipped 5 previous similar messages
          
          <ConMan> Console [grove214] log at 2012-11-16 05:00:00 PST.
          

          When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

          prakash Prakash Surya (Inactive) added a comment - how often do you see "can't get LMA" ? Looking back at the logs, it looks like we've seen it about 15 times on the test MDS for 3 distinct FIDs LustreError: 33431:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2 LustreError: 33231:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x3:0x0]: rc = -2 LustreError: 33244:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bda:0x4:0x0]: rc = -2 I see it on the production OSTs frequently: <ConMan> Console [grove214] log at 2012-11-16 04:00:00 PST. 2012-11-16 04:18:37 LustreError: 5777:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac1:0x0]: rc = -2 2012-11-16 04:19:10 LustreError: 7522:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ac4:0x0]: rc = -2 2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10aca:0x0]: rc = -2 2012-11-16 04:20:16 LustreError: 7362:0:(osd_object.c:410:osd_object_init()) Skipped 3 previous similar messages 2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) ls1-OST00d6: can't get LMA on [0x100000000:0x10ad7:0x0]: rc = -2 2012-11-16 04:22:37 LustreError: 5770:0:(osd_object.c:410:osd_object_init()) Skipped 5 previous similar messages <ConMan> Console [grove214] log at 2012-11-16 05:00:00 PST. When was the "shared FID" bug fixed? I tried to grep through the master git logs, but nothing was immediately apparent to me. I am OK with removing the file and moving on, just as long as this issue wont come up in the future, which I'd like to verify.

          > what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

          given your filesystem is in use for quite long and LMA was not always set in Orion, I think it's OK to see this message on some objects.
          though this also mean we can't verify OI in this case.

          > You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files?

          yes, I think that was possible.

          > Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty:

          this is because changelog was not used. the both files are supposed to be empty in this case. but any record written should grow them to 8K+

          how often do you see "can't get LMA" ?

          bzzz Alex Zhuravlev added a comment - > what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"? given your filesystem is in use for quite long and LMA was not always set in Orion, I think it's OK to see this message on some objects. though this also mean we can't verify OI in this case. > You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? yes, I think that was possible. > Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty: this is because changelog was not used. the both files are supposed to be empty in this case. but any record written should grow them to 8K+ how often do you see "can't get LMA" ?
          prakash Prakash Surya (Inactive) added a comment - - edited

          Alex, a couple more questions when you have some time:

          You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? I ask because those two look very similar:

                                                                                   
          # grovei /tftpboot/dumps/surya1/mdt0 > hexdump seq_ctl                             
          0000000 0400 4000 0002 0000 ffff ffff ffff ffff                                    
          0000010 0000 0000 0000 0000                                                        
          0000018                                                                            
          # grovei /tftpboot/dumps/surya1/mdt0 > hexdump changelog_users                     
          0000000 0bd0 0000 0002 0000 ffff ffff ffff ffff                                    
          0000010 0000 0000 0000 0000                                                        
          0000018                                                                            
          

          It seems like too much of a coincidence that changelog_users is exactly the size of struct lu_seq_range (which I believe seq_ctl contains) and has very similar contents.

          Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty:

                                                                                
          $ hexdump changelog_users                                                       
          $ hexdump changelog_catalog                                                     
                                                                                          
          $ stat changelog_catalog                                                        
            File: `changelog_catalog'                                                     
            Size: 0               Blocks: 1          IO Block: 131072 regular empty file  
          Device: 1fh/31d Inode: 191         Links: 2                                     
          Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)        
          Access: 2012-11-19 13:33:36.328821000 -0800                                     
          Modify: 1969-12-31 16:00:00.846817000 -0800                                     
          Change: 1969-12-31 16:00:00.846817000 -0800                                     
                                                                                          
          $ stat changelog_users                                                          
            File: `changelog_users'                                                       
            Size: 0               Blocks: 1          IO Block: 131072 regular empty file  
          Device: 1fh/31d Inode: 192         Links: 2                                     
          Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)        
          Access: 2012-11-19 13:33:32.722166000 -0800                                     
          Modify: 1969-12-31 16:00:00.846817000 -0800                                     
          Change: 1969-12-31 16:00:00.846817000 -0800                                     
          

          Is this normal?

          prakash Prakash Surya (Inactive) added a comment - - edited Alex, a couple more questions when you have some time: You mentioned above that there was previously a bug which would cause the changelog_* files and seq_* files to share the same FID.. Is there a chance this happened with the changelog_users and seq_ctl files? I ask because those two look very similar: # grovei /tftpboot/dumps/surya1/mdt0 > hexdump seq_ctl 0000000 0400 4000 0002 0000 ffff ffff ffff ffff 0000010 0000 0000 0000 0000 0000018 # grovei /tftpboot/dumps/surya1/mdt0 > hexdump changelog_users 0000000 0bd0 0000 0002 0000 ffff ffff ffff ffff 0000010 0000 0000 0000 0000 0000018 It seems like too much of a coincidence that changelog_users is exactly the size of struct lu_seq_range (which I believe seq_ctl contains) and has very similar contents. Also, I created a new file system in a VM to use for testing. What should "normal" changelog_catalog and changelog_users files look like? I expected to see something like you posted earlier, but instead the files on my test MDS are empty: $ hexdump changelog_users $ hexdump changelog_catalog $ stat changelog_catalog File: `changelog_catalog' Size: 0 Blocks: 1 IO Block: 131072 regular empty file Device: 1fh/31d Inode: 191 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-11-19 13:33:36.328821000 -0800 Modify: 1969-12-31 16:00:00.846817000 -0800 Change: 1969-12-31 16:00:00.846817000 -0800 $ stat changelog_users File: `changelog_users' Size: 0 Blocks: 1 IO Block: 131072 regular empty file Device: 1fh/31d Inode: 192 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-11-19 13:33:32.722166000 -0800 Modify: 1969-12-31 16:00:00.846817000 -0800 Change: 1969-12-31 16:00:00.846817000 -0800 Is this normal?

          Alex, what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

          prakash Prakash Surya (Inactive) added a comment - Alex, what is considered a "bad" message? I see some of the "can't get LMA" messages, are those "bad"?

          thanks for the help guys. unfortunately it doesn't help very much that it's not changelog, but changelog_users - still the file is corrupted and I can't prove the root cause, sorry.

          this is how the first bytes of a llog should look like:

          0000000 2000 0000 0000 0000 5539 1064 0000 0000
          0000010 9f4d 50a0 0000 0000 0013 0000 0058 0000
          0000020 0000 0000 0004 0000 0000 0000 6f63 666e

          struct llog_rec_hdr {
          __u32 lrh_len;
          __u32 lrh_index;
          __u32 lrh_type;
          __u32 lrh_id;
          };

          notice lrh_len=2000 (first 8K header of any llog)
          lrh_type=10645539 (LLOG_HDR_MAGIC = LLOG_OP_MAGIC | 0x45539)

          the good news is that the filesystem itself seem to be consistent (given no bad messages from the latest patch).
          at least OI is not broken, there is no duplicate fids, etc.

          so having this I'd suggest to remove changelog_users manually (or I can make a patch to do so at mount time).

          bzzz Alex Zhuravlev added a comment - thanks for the help guys. unfortunately it doesn't help very much that it's not changelog, but changelog_users - still the file is corrupted and I can't prove the root cause, sorry. this is how the first bytes of a llog should look like: 0000000 2000 0000 0000 0000 5539 1064 0000 0000 0000010 9f4d 50a0 0000 0000 0013 0000 0058 0000 0000020 0000 0000 0004 0000 0000 0000 6f63 666e struct llog_rec_hdr { __u32 lrh_len; __u32 lrh_index; __u32 lrh_type; __u32 lrh_id; }; notice lrh_len=2000 (first 8K header of any llog) lrh_type=10645539 (LLOG_HDR_MAGIC = LLOG_OP_MAGIC | 0x45539) the good news is that the filesystem itself seem to be consistent (given no bad messages from the latest patch). at least OI is not broken, there is no duplicate fids, etc. so having this I'd suggest to remove changelog_users manually (or I can make a patch to do so at mount time).

          Buried in all the console noise, I managed to find this message:

          2012-11-13 14:05:03 LustreError: 53596:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2
          
          prakash Prakash Surya (Inactive) added a comment - Buried in all the console noise, I managed to find this message: 2012-11-13 14:05:03 LustreError: 53596:0:(osd_object.c:410:osd_object_init()) lstest-MDT0000: can't get LMA on [0x200000bd0:0x4f:0x0]: rc = -2

          People

            prakash Prakash Surya (Inactive)
            prakash Prakash Surya (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: