Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8608

Rolling upgrade between 2.8.x and master failed: Upon upgrading OSS, OSS restarts when mounted

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • None
    • Rolling Upgrade: Old version- b2_8_fe build# 25
      New version- master build# 3431
    • 3
    • 9223372036854775807

    Description

      While performing rolling upgrade testing the OSS got restarted when it was mounted after the upgrade.
      Following steps were taken:
      1. OSS, MDS and 2 clients were built with b2_8_fe build# 25 and the lustre system was set up.
      2. Unmounted OST and upgraded the OSS to master build# 3431.
      3. After upgrade on OSS was complete , the target was mounted back.

      Upon mounting, the OSS restarted abruptly.
      Following is the log for OSS when the mount command was run.

      [root@onyx-26 ~]# mount -t lustre -o acl,user_xattr /dev/sdb1 /mnt/ost0
      mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 512 to 16384
      mount.lustre: change scheduler of /sys/block/sdb/queue/scheduler from cfq to deadline
      [   79.285538] libcfs: module verification failed: signature and/or required key missing - tainting kernel
      [   79.302042] LNet: HW CPU cores: 32, npartitions: 4
      [   79.311423] alg: No test for adler32 (adler32-zlib)
      [   79.318433] alg: No test for crc32 (crc32-table)
      [   87.529705] Lustre: Lustre: Build Version: 2.8.57
      [   87.721568] LNet: Added LNI 10.2.4.56@tcp [8/256/0/180]
      [   87.728741] LNet: Accept secure, port 988
      [   88.022628] LDISKFS-fs (sdb1): file extents enabled, maximum tree depth=5
      [   88.426512] LDISKFS-fs (sdb1): recovery complete
      [   88.485928] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,,errors=remount-ro,no_mbcache
      [   88.864640] LustreError: 3112:0:(mgc_request.c:257:do_config_log_add()) MGC10.2.4.47@tcp: failed processing log, type 4: rc = -22
      [   88.971376] LustreError: 3368:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
      [   88.988471] LustreError: 3368:0:(nodemap_storage.c:1313:nodemap_fs_init()) lustre-OST0000: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
      [   89.067996] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b67832c0[0x0, 1, [0x1:0x0:0x0] hash exist]{
      [   89.067996] 
      [   89.085810] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6783310
      [   89.085810] 
      [   89.101070] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899c00osd-ldiskfs-object@ffff880035899c00(i:ffff880410851e88:81/3977440011)[plain]
      [   89.101070] 
      [   89.125243] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b67832c0
      [   89.125243] 
      [   89.139953] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880823297380[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
      [   89.139953] 
      [   89.159766] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8808232973d0
      [   89.159766] 
      [   89.174780] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880426ff8500osd-ldiskfs-object@ffff880426ff8500(i:ffff880426368d88:80/3977439977)[plain]
      [   89.174780] 
      [   89.198510] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880823297380
      [   89.198510] 
      [   89.213998] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b6782b40[0x0, 1, [0xa:0x0:0x0] hash exist]{
      [   89.213998] 
      [   89.231128] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6782b90
      [   89.231128] 
      [   89.245899] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899400osd-ldiskfs-object@ffff880035899400(i:ffff88041085af88:82/3977440045)[plain]
      [   89.245899] 
      [   89.269322] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b6782b40
      [   89.269322] 
      [   89.283367] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880802cd1c80[0x0, 1, [0x200000003:0x8:0x0] hash exist]{
      [   89.283367] 
      [   89.302572] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880802cd1cd0
      [   89.302572] 
      [   89.317098] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880823d2d900osd-ldiskfs-object@ffff880823d2d900(i:ffff8808163400c8:98/2123498910)[lfix]
      [   89.317098] 
      [   89.340058] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880802cd1c80
      [   89.340058] 
      [   89.355308] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b67829c0[0x0, 1, [0xa:0xa:0x0] hash exist]{
      [   89.355308] 
      [   89.372243] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6782a10
      [   89.372243] 
      [   89.386873] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899f00osd-ldiskfs-object@ffff880035899f00(i:ffff88041085b808:83/2755944006)[plain]
      [   89.386873] 
      [   89.410071] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b67829c0
      [   89.410071] 
      [   89.424408] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b6782c00[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
      [   89.424408] 
      [   89.443666] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6782c50
      [   89.443666] 
      [   89.458132] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899d00osd-ldiskfs-object@ffff880035899d00(i:ffff880035f15a08:12/2606405092)[plain]
      [   89.458132] 
      [   89.481146] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b6782c00
      [   89.481146] 
      [    0.000000] Initializing cgroup subsys cpuset
      [    0.000000] Initializing cgroup subsys cpu
      [    0.000000] Initializing cgroup subsys cpuacct
      [    0.000000] Linux version 3.10.0-327.28.2.el7_lustre.x86_64 (jenkins@onyx-1-sdh1-el7-x8664.onyx.hpdd.intel.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Sep 1 10:55:39 PDT 2016
      

      Not sure whether it is related to LU-8498.

      Attachments

        1. debug_log_mds.txt
          6 kB
        2. mgs.log
          442 kB
        3. oss.log
          648 kB

        Activity

          [LU-8608] Rolling upgrade between 2.8.x and master failed: Upon upgrading OSS, OSS restarts when mounted
          standan Saurabh Tandan (Inactive) made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Closed [ 6 ]
          standan Saurabh Tandan (Inactive) added a comment - - edited

          As the error message is expected hence closing the ticket.

          standan Saurabh Tandan (Inactive) added a comment - - edited As the error message is expected hence closing the ticket.

          Nope there are no functionality problems , OSS mounts okay.

          standan Saurabh Tandan (Inactive) added a comment - Nope there are no functionality problems , OSS mounts okay.

          Actually I think this is to be expected if the MGS is at 2.8. Does the OSS mount ok or are there functionality problems?

          kit.westneat Kit Westneat (Inactive) added a comment - Actually I think this is to be expected if the MGS is at 2.8. Does the OSS mount ok or are there functionality problems?
          standan Saurabh Tandan (Inactive) made changes -
          Attachment New: debug_log_mds.txt [ 23411 ]
          standan Saurabh Tandan (Inactive) added a comment - - edited

          MDS debug_log file attached. I have the OSS debug_log file as well but its too big and not getting attached. Incase you want that too please let me know I will send it some other way to you.

          standan Saurabh Tandan (Inactive) added a comment - - edited MDS debug_log file attached. I have the OSS debug_log file as well but its too big and not getting attached. Incase you want that too please let me know I will send it some other way to you.

          Hi Saurabh,

          Ah these look like the dmesg logs, do you have the Lustre debug logs? I mean the logs that are generated by the lctl debug_kernel command. I'll need the trace and info log levels enabled in order to see what's going on.

          Thanks,
          Kit

          kit.westneat Kit Westneat (Inactive) added a comment - Hi Saurabh, Ah these look like the dmesg logs, do you have the Lustre debug logs? I mean the logs that are generated by the lctl debug_kernel command. I'll need the trace and info log levels enabled in order to see what's going on. Thanks, Kit

          Hi Kit,
          I have attached the log files for both MGS and OSS above. I also have the system set up currently. Please let me know incase you need any more information.
          Thanks!

          standan Saurabh Tandan (Inactive) added a comment - - edited Hi Kit, I have attached the log files for both MGS and OSS above. I also have the system set up currently. Please let me know incase you need any more information. Thanks!
          standan Saurabh Tandan (Inactive) made changes -
          Attachment New: mgs.log [ 23233 ]
          Attachment New: oss.log [ 23234 ]

          Hi Saurabh,

          Sorry for the delay in responding. Do you have the -1 debug logs (or trace and info) from the MGS and the OSS? I'm not sure why it'd be returning an error.

          Thanks,
          Kit

          kit.westneat Kit Westneat (Inactive) added a comment - Hi Saurabh, Sorry for the delay in responding. Do you have the -1 debug logs (or trace and info) from the MGS and the OSS? I'm not sure why it'd be returning an error. Thanks, Kit

          People

            kit.westneat Kit Westneat (Inactive)
            standan Saurabh Tandan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: