Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8608

Rolling upgrade between 2.8.x and master failed: Upon upgrading OSS, OSS restarts when mounted

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • None
    • Rolling Upgrade: Old version- b2_8_fe build# 25
      New version- master build# 3431
    • 3
    • 9223372036854775807

    Description

      While performing rolling upgrade testing the OSS got restarted when it was mounted after the upgrade.
      Following steps were taken:
      1. OSS, MDS and 2 clients were built with b2_8_fe build# 25 and the lustre system was set up.
      2. Unmounted OST and upgraded the OSS to master build# 3431.
      3. After upgrade on OSS was complete , the target was mounted back.

      Upon mounting, the OSS restarted abruptly.
      Following is the log for OSS when the mount command was run.

      [root@onyx-26 ~]# mount -t lustre -o acl,user_xattr /dev/sdb1 /mnt/ost0
      mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 512 to 16384
      mount.lustre: change scheduler of /sys/block/sdb/queue/scheduler from cfq to deadline
      [   79.285538] libcfs: module verification failed: signature and/or required key missing - tainting kernel
      [   79.302042] LNet: HW CPU cores: 32, npartitions: 4
      [   79.311423] alg: No test for adler32 (adler32-zlib)
      [   79.318433] alg: No test for crc32 (crc32-table)
      [   87.529705] Lustre: Lustre: Build Version: 2.8.57
      [   87.721568] LNet: Added LNI 10.2.4.56@tcp [8/256/0/180]
      [   87.728741] LNet: Accept secure, port 988
      [   88.022628] LDISKFS-fs (sdb1): file extents enabled, maximum tree depth=5
      [   88.426512] LDISKFS-fs (sdb1): recovery complete
      [   88.485928] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,,errors=remount-ro,no_mbcache
      [   88.864640] LustreError: 3112:0:(mgc_request.c:257:do_config_log_add()) MGC10.2.4.47@tcp: failed processing log, type 4: rc = -22
      [   88.971376] LustreError: 3368:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
      [   88.988471] LustreError: 3368:0:(nodemap_storage.c:1313:nodemap_fs_init()) lustre-OST0000: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
      [   89.067996] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b67832c0[0x0, 1, [0x1:0x0:0x0] hash exist]{
      [   89.067996] 
      [   89.085810] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6783310
      [   89.085810] 
      [   89.101070] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899c00osd-ldiskfs-object@ffff880035899c00(i:ffff880410851e88:81/3977440011)[plain]
      [   89.101070] 
      [   89.125243] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b67832c0
      [   89.125243] 
      [   89.139953] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880823297380[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
      [   89.139953] 
      [   89.159766] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8808232973d0
      [   89.159766] 
      [   89.174780] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880426ff8500osd-ldiskfs-object@ffff880426ff8500(i:ffff880426368d88:80/3977439977)[plain]
      [   89.174780] 
      [   89.198510] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880823297380
      [   89.198510] 
      [   89.213998] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b6782b40[0x0, 1, [0xa:0x0:0x0] hash exist]{
      [   89.213998] 
      [   89.231128] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6782b90
      [   89.231128] 
      [   89.245899] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899400osd-ldiskfs-object@ffff880035899400(i:ffff88041085af88:82/3977440045)[plain]
      [   89.245899] 
      [   89.269322] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b6782b40
      [   89.269322] 
      [   89.283367] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880802cd1c80[0x0, 1, [0x200000003:0x8:0x0] hash exist]{
      [   89.283367] 
      [   89.302572] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880802cd1cd0
      [   89.302572] 
      [   89.317098] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880823d2d900osd-ldiskfs-object@ffff880823d2d900(i:ffff8808163400c8:98/2123498910)[lfix]
      [   89.317098] 
      [   89.340058] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880802cd1c80
      [   89.340058] 
      [   89.355308] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b67829c0[0x0, 1, [0xa:0xa:0x0] hash exist]{
      [   89.355308] 
      [   89.372243] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6782a10
      [   89.372243] 
      [   89.386873] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899f00osd-ldiskfs-object@ffff880035899f00(i:ffff88041085b808:83/2755944006)[plain]
      [   89.386873] 
      [   89.410071] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b67829c0
      [   89.410071] 
      [   89.424408] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8800b6782c00[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
      [   89.424408] 
      [   89.443666] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8800b6782c50
      [   89.443666] 
      [   89.458132] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880035899d00osd-ldiskfs-object@ffff880035899d00(i:ffff880035f15a08:12/2606405092)[plain]
      [   89.458132] 
      [   89.481146] LustreError: 3368:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8800b6782c00
      [   89.481146] 
      [    0.000000] Initializing cgroup subsys cpuset
      [    0.000000] Initializing cgroup subsys cpu
      [    0.000000] Initializing cgroup subsys cpuacct
      [    0.000000] Linux version 3.10.0-327.28.2.el7_lustre.x86_64 (jenkins@onyx-1-sdh1-el7-x8664.onyx.hpdd.intel.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Thu Sep 1 10:55:39 PDT 2016
      

      Not sure whether it is related to LU-8498.

      Attachments

        1. debug_log_mds.txt
          6 kB
        2. mgs.log
          442 kB
        3. oss.log
          648 kB

        Activity

          [LU-8608] Rolling upgrade between 2.8.x and master failed: Upon upgrading OSS, OSS restarts when mounted

          Hi Saurabh,

          Ah these look like the dmesg logs, do you have the Lustre debug logs? I mean the logs that are generated by the lctl debug_kernel command. I'll need the trace and info log levels enabled in order to see what's going on.

          Thanks,
          Kit

          kit.westneat Kit Westneat (Inactive) added a comment - Hi Saurabh, Ah these look like the dmesg logs, do you have the Lustre debug logs? I mean the logs that are generated by the lctl debug_kernel command. I'll need the trace and info log levels enabled in order to see what's going on. Thanks, Kit

          Hi Kit,
          I have attached the log files for both MGS and OSS above. I also have the system set up currently. Please let me know incase you need any more information.
          Thanks!

          standan Saurabh Tandan (Inactive) added a comment - - edited Hi Kit, I have attached the log files for both MGS and OSS above. I also have the system set up currently. Please let me know incase you need any more information. Thanks!

          Hi Saurabh,

          Sorry for the delay in responding. Do you have the -1 debug logs (or trace and info) from the MGS and the OSS? I'm not sure why it'd be returning an error.

          Thanks,
          Kit

          kit.westneat Kit Westneat (Inactive) added a comment - Hi Saurabh, Sorry for the delay in responding. Do you have the -1 debug logs (or trace and info) from the MGS and the OSS? I'm not sure why it'd be returning an error. Thanks, Kit

          Hi Kit,
          I tried the testing with the patch mentioned above. The mount worked and the system did not restarted this time. But I could see a LustreError message in logs while OST was mounting. Is there any extra work needed for this?

          [root@onyx-26 ~]# mount -t lustre -o acl,user_xattr /dev/sdb1 /mnt/ost0
          mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 512 to 16384
          mount.lustre: change scheduler o[ 2836.318943] libcfs: module verification failed: signature and/or required key missing - tainting kernel
          f /sys/block/sdb/queue/scheduler from cfq to dea[ 2836.333593] LNet: HW CPU cores: 32, npartitions: 4
          dline
          [ 2836.343150] alg: No test for adler32 (adler32-zlib)
          [ 2836.348967] alg: No test for crc32 (crc32-table)
          [ 2844.384607] Lustre: Lustre: Build Version: 2.8.57_22_g5cb1549
          [ 2844.422845] LNet: Added LNI 10.2.4.56@tcp [8/256/0/180]
          [ 2844.428845] LNet: Accept secure, port 988
          [ 2844.498034] LDISKFS-fs (sdb1): file extents enabled, maximum tree depth=5
          [ 2844.525873] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,,errors=remount-ro,no_mbcache
          [ 2844.883460] LustreError: 38233:0:(mgc_request.c:253:do_config_log_add()) MGC10.2.4.47@tcp: failed processing log, type 4: rc = -22
          [ 2845.382949] Lustre: lustre-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450
          [root@onyx-26 ~]# [ 2852.091748] Lustre: lustre-OST0000: Will be in recovery for at least 2:30, or until 3 clients reconnect
          [ 2852.102484] Lustre: lustre-OST0000: Connection restored to b0ab0605-5282-cb64-ddd3-483f2393ac20 (at 10.2.4.36@tcp)
          [ 2853.948292] Lustre: lustre-OST0000: Connection restored to lustre-MDT0000-mdtlov_UUID (at 10.2.4.47@tcp)
          [ 2895.399155] Lustre: lustre-OST0000: Connection restored to 15ce59bd-a3c6-167b-84dd-730a88c0fe5f (at 10.2.4.37@tcp)
          [ 2895.801444] Lustre: lustre-OST0000: Recovery over after 0:44, of 3 clients 3 recovered and 0 were evicted.
          [ 2895.830113] Lustre: lustre-OST0000: deleting orphan objects from 0x0:4 to 0x0:33
          

          Thanks!

          standan Saurabh Tandan (Inactive) added a comment - Hi Kit, I tried the testing with the patch mentioned above. The mount worked and the system did not restarted this time. But I could see a LustreError message in logs while OST was mounting. Is there any extra work needed for this? [root@onyx-26 ~]# mount -t lustre -o acl,user_xattr /dev/sdb1 /mnt/ost0 mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 512 to 16384 mount.lustre: change scheduler o[ 2836.318943] libcfs: module verification failed: signature and/or required key missing - tainting kernel f /sys/block/sdb/queue/scheduler from cfq to dea[ 2836.333593] LNet: HW CPU cores: 32, npartitions: 4 dline [ 2836.343150] alg: No test for adler32 (adler32-zlib) [ 2836.348967] alg: No test for crc32 (crc32-table) [ 2844.384607] Lustre: Lustre: Build Version: 2.8.57_22_g5cb1549 [ 2844.422845] LNet: Added LNI 10.2.4.56@tcp [8/256/0/180] [ 2844.428845] LNet: Accept secure, port 988 [ 2844.498034] LDISKFS-fs (sdb1): file extents enabled, maximum tree depth=5 [ 2844.525873] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,,errors=remount-ro,no_mbcache [ 2844.883460] LustreError: 38233:0:(mgc_request.c:253:do_config_log_add()) MGC10.2.4.47@tcp: failed processing log, type 4: rc = -22 [ 2845.382949] Lustre: lustre-OST0000: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-450 [root@onyx-26 ~]# [ 2852.091748] Lustre: lustre-OST0000: Will be in recovery for at least 2:30, or until 3 clients reconnect [ 2852.102484] Lustre: lustre-OST0000: Connection restored to b0ab0605-5282-cb64-ddd3-483f2393ac20 (at 10.2.4.36@tcp) [ 2853.948292] Lustre: lustre-OST0000: Connection restored to lustre-MDT0000-mdtlov_UUID (at 10.2.4.47@tcp) [ 2895.399155] Lustre: lustre-OST0000: Connection restored to 15ce59bd-a3c6-167b-84dd-730a88c0fe5f (at 10.2.4.37@tcp) [ 2895.801444] Lustre: lustre-OST0000: Recovery over after 0:44, of 3 clients 3 recovered and 0 were evicted. [ 2895.830113] Lustre: lustre-OST0000: deleting orphan objects from 0x0:4 to 0x0:33 Thanks!

          I will try it out with this patch.

          standan Saurabh Tandan (Inactive) added a comment - I will try it out with this patch.
          kit.westneat Kit Westneat (Inactive) added a comment - Hi Peter, This looks like a dupe of the second issue in LU-8508 : https://jira.hpdd.intel.com/browse/LU-8508?focusedCommentId=162247&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-162247 Is it possible to test with this patch? http://review.whamcloud.com/#/c/22004/ Thanks, Kit
          pjones Peter Jones added a comment -

          Kit

          What do you advise here?

          Peter

          pjones Peter Jones added a comment - Kit What do you advise here? Peter

          I also tried the same steps above with master build# 3437 which included LU-8498 but the issue persists and following are the logs of OSS with that build.

          [root@onyx-26 ~]# mount -t lustre -o acl,user_xattr /dev/sdb1 /mnt/ost0
          mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 512 to 16384
          mount.lustre: change scheduler o[  117.134758] libcfs: module verification failed: signature and/or required key missing - tainting kernel
          f /sys/block/sdb/queue/scheduler from cfq to deadline
          [  117.150964] LNet: HW CPU cores: 32, npartitions: 4
          [  117.160981] alg: No test for adler32 (adler32-zlib)
          [  117.168217] alg: No test for crc32 (crc32-table)
          [  125.396239] Lustre: Lustre: Build Version: 2.8.57_50_g2fd1081
          [  125.431861] LNet: Added LNI 10.2.4.56@tcp [8/256/0/180]
          [  125.439130] LNet: Accept secure, port 988
          [  125.499505] LDISKFS-fs (sdb1): file extents enabled, maximum tree depth=5
          [  125.527499] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,,errors=remount-ro,no_mbcache
          [  125.840774] LustreError: 9640:0:(mgc_request.c:253:do_config_log_add()) MGC10.2.4.47@tcp: failed processing log, type 4: rc = -22
          [  125.920049] LustreError: 11501:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS.
          [  125.936801] LustreError: 11501:0:(nodemap_storage.c:1324:nodemap_fs_init()) lustre-OST0000: error loading nodemap config file, file must be removed via ldiskfs: rc = -22
          [  126.009549] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880825f99200[0x0, 1, [0x1:0x0:0x0] hash exist]{
          [  126.009549] 
          [  126.027440] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880825f99250
          [  126.027440] 
          [  126.042772] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826527800osd-ldiskfs-object@ffff880826527800(i:ffff8803fa4d55c8:81/3977440011)[plain]
          [  126.042772] 
          [  126.067001] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880825f99200
          [  126.067001] 
          [  126.081779] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808232dd380[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
          [  126.081779] 
          [  126.101690] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8808232dd3d0
          [  126.101690] 
          [  126.116828] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff8800b5d71100osd-ldiskfs-object@ffff8800b5d71100(i:ffff8803fa4cc4c8:80/3977439977)[plain]
          [  126.116828] 
          [  126.140682] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8808232dd380
          [  126.140682] 
          [  126.156197] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880825f98840[0x0, 1, [0xa:0x0:0x0] hash exist]{
          [  126.156197] 
          [  126.173455] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880825f98890
          [  126.173455] 
          [  126.188219] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826526300osd-ldiskfs-object@ffff880826526300(i:ffff8803fa4de6c8:82/3977440045)[plain]
          [  126.188219] 
          [  126.211805] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880825f98840
          [  126.211805] 
          [  126.226004] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880424c200c0[0x0, 1, [0x200000003:0x8:0x0] hash exist]{
          [  126.226004] 
          [  126.245393] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880424c20110
          [  126.245393] 
          [  126.260065] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826527900osd-ldiskfs-object@ffff880826527900(i:ffff8803fa4df388:98/2123498910)[lfix]
          [  126.260065] 
          [  126.283156] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880424c200c0
          [  126.283156] 
          [  126.298736] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880424c20f00[0x0, 1, [0xa:0xc:0x0] hash exist]{
          [  126.298736] 
          [  126.315798] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880424c20f50
          [  126.315798] 
          [  126.330523] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826525000osd-ldiskfs-object@ffff880826525000(i:ffff8803fa4def48:83/2922743499)[plain]
          [  126.330523] 
          [  126.353865] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880424c20f00
          [  126.353865] 
          [  126.367878] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880825f99ec0[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{
          [  126.367878] 
          [  126.387236] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880825f99f10
          [  126.387236] 
          [  126.401810] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826525a00osd-ldiskfs-object@ffff880826525a00(i:ffff8803fa4c55c8:12/2606405092)[plain]
          [  126.401810] 
          [  126.424933] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880825f99ec0
          [  126.424933] 
          [  126.475139] LustreError: 11501:0:(obd_config.c:578:class_setup()) setup lustre-OST0000 failed (-22)
          [  126.488291] LustreError: 11501:0:(obd_config.c:1671:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22
          [  126.507069] Lustre:    cmd=cf003 0:lustre-OST0000  1:dev  2:0  3:f  
          [  126.507069] 
          [  126.521361] LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-OST0000'failed from the MGS (-22).  Make sure this client and the MGS are running compatible versions of Lustre.
          [  126.547018] LustreError: 9640:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server lustre-OST0000: -22
          [  126.562626] LustreError: 9640:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
          [  126.581424] LustreError: 9640:0:(lu_object.c:1243:lu_device_fini()) LBUG
          [  126.591487] Pid: 9640, comm: mount.lustre
          [  126.598335] 
          [  126.598335] Call Trace:
          [  126.607357]  [<ffffffffa072c7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
          [  126.617283]  [<ffffffffa072cd75>] lbug_with_loc+0x45/0xc0 [libcfs]
          [  126.626242]  [<ffffffffa0867c78>] lu_device_fini+0xb8/0xc0 [obdclass]
          [  126.635477]  [<ffffffffa084cd72>] ls_device_put+0x82/0x2a0 [obdclass]
          [  126.644565]  [<ffffffffa084d06d>] local_oid_storage_fini+0xdd/0x210 [obdclass]
          [  126.654470]  [<ffffffffa0806281>] mgc_set_info_async+0x951/0x1630 [mgc]
          [  126.663594]  [<ffffffffa08611c9>] ? lustre_process_log+0x9e9/0xc00 [obdclass]
          [  126.673310]  [<ffffffffa0737957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
          
          Message from[  126.682342]  [<ffffffffa088bbf4>] server_start_targets+0x794/0x2d20 [obdclass]
           syslogd@onyx-26[  126.692039]  [<ffffffffa0864ab6>] ? lustre_start_mgc+0x996/0x2490 [obdclass]
           at Sep 12 17:52[  126.701460]  [<ffffffffa085d030>] ? class_config_llog_handler+0x0/0x1b60 [obdclass]
          :37 ...
           kerne[  126.711580]  [<ffffffffa088f20d>] server_fill_super+0x108d/0x184c [obdclass]
          l:LustreError: 9[  126.720983]  [<ffffffffa0867058>] lustre_fill_super+0x328/0x950 [obdclass]
          640:0:(lu_object[  126.730180]  [<ffffffffa0866d30>] ? lustre_fill_super+0x0/0x950 [obdclass]
          .c:1243:lu_devic[  126.739418]  [<ffffffff811e235d>] mount_nodev+0x4d/0xb0
          e_fini()) ASSERT[  126.746774]  [<ffffffffa085ef88>] lustre_mount+0x38/0x60 [obdclass]
          ION( atomic_read[  126.755355]  [<ffffffff811e2d09>] mount_fs+0x39/0x1b0
          (&d->ld_ref) == [  126.762525]  [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0
          0 ) failed: Refc[  126.770221]  [<ffffffff81200b2e>] do_mount+0x24e/0xa40
          ount is 1
          
          [  126.777478]  [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50
          Message from sys[  126.785472]  [<ffffffff812013b6>] SyS_mount+0x96/0xf0
          logd@onyx-26 at [  126.792693]  [<ffffffff81646d89>] system_call_fastpath+0x16/0x1b
          Sep 12 17:52:37 [  126.800984] 
          ...
           kernel:Lu[  126.804371] Kernel panic - not syncing: LBUG
          streError: 9640:0:(lu_object.c:1[  126.811718] CPU: 19 PID: 9640 Comm: mount.lustre Tainted: G          IOE  ------------   3.10.0-327.28.3.el7_lustre.x86_64 #1
          243:lu_device_fini()) LBUG
          [  126.827584] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.99.99.x045.022820121209 02/28/2012
          [  126.842017]  ffffffffa0749def 000000007f8e88de ffff880824ef79e8 ffffffff8163667b
          [  126.852647]  ffff880824ef7a68 ffffffff8162ff0a ffffffff00000008 ffff880824ef7a78
          [  126.863183]  ffff880824ef7a18 000000007f8e88de ffffffffa08981d5 0000000000000000
          [  126.873670] Call Trace:
          [  126.878607]  [<ffffffff8163667b>] dump_stack+0x19/0x1b
          [  126.886490]  [<ffffffff8162ff0a>] panic+0xd8/0x1e7
          [  126.893930]  [<ffffffffa072cddb>] lbug_with_loc+0xab/0xc0 [libcfs]
          [  126.903113]  [<ffffffffa0867c78>] lu_device_fini+0xb8/0xc0 [obdclass]
          [  126.912379]  [<ffffffffa084cd72>] ls_device_put+0x82/0x2a0 [obdclass]
          [  126.921613]  [<ffffffffa084d06d>] local_oid_storage_fini+0xdd/0x210 [obdclass]
          [  126.931643]  [<ffffffffa0806281>] mgc_set_info_async+0x951/0x1630 [mgc]
          [  126.941011]  [<ffffffffa08611c9>] ? lustre_process_log+0x9e9/0xc00 [obdclass]
          [  126.950943]  [<ffffffffa0737957>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
          [  126.960275]  [<ffffffffa088bbf4>] server_start_targets+0x794/0x2d20 [obdclass]
          [  126.970247]  [<ffffffffa0864ab6>] ? lustre_start_mgc+0x996/0x2490 [obdclass]
          [  126.979979]  [<ffffffffa085d030>] ? class_config_dump_handler+0xb30/0xb30 [obdclass]
          [  126.990495]  [<ffffffffa088f20d>] server_fill_super+0x108d/0x184c [obdclass]
          [  127.000180]  [<ffffffffa0867058>] lustre_fill_super+0x328/0x950 [obdclass]
          [  127.009642]  [<ffffffffa0866d30>] ? lustre_common_put_super+0x270/0x270 [obdclass]
          [  127.019858]  [<ffffffff811e235d>] mount_nodev+0x4d/0xb0
          [  127.027464]  [<ffffffffa085ef88>] lustre_mount+0x38/0x60 [obdclass]
          [  127.036232]  [<ffffffff811e2d09>] mount_fs+0x39/0x1b0
          [  127.043618]  [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0
          [  127.051489]  [<ffffffff81200b2e>] do_mount+0x24e/0xa40
          [  127.058959]  [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50
          [  127.067116]  [<ffffffff812013b6>] SyS_mount+0x96/0xf0
          [  127.074491]  [<ffffffff81646d89>] system_call_fastpath+0x16/0x1b
          [    0.000000] Initializing cgroup subsys cpuset
          [    0.000000] Initializing cgroup subsys cpu
          [    0.000000] Initializing cgroup subsys cpuacct
          [    0.000000] Linux version 3.10.0-327.28.3.el7_lustre.x86_64 (jenkins@onyx-5-sdh1-el7-x8664.onyx.hpdd.intel.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 9 20:39:59 PDT 2016
          
          standan Saurabh Tandan (Inactive) added a comment - I also tried the same steps above with master build# 3437 which included LU-8498 but the issue persists and following are the logs of OSS with that build. [root@onyx-26 ~]# mount -t lustre -o acl,user_xattr /dev/sdb1 /mnt/ost0 mount.lustre: increased /sys/block/sdb/queue/max_sectors_kb from 512 to 16384 mount.lustre: change scheduler o[ 117.134758] libcfs: module verification failed: signature and/or required key missing - tainting kernel f /sys/block/sdb/queue/scheduler from cfq to deadline [ 117.150964] LNet: HW CPU cores: 32, npartitions: 4 [ 117.160981] alg: No test for adler32 (adler32-zlib) [ 117.168217] alg: No test for crc32 (crc32-table) [ 125.396239] Lustre: Lustre: Build Version: 2.8.57_50_g2fd1081 [ 125.431861] LNet: Added LNI 10.2.4.56@tcp [8/256/0/180] [ 125.439130] LNet: Accept secure, port 988 [ 125.499505] LDISKFS-fs (sdb1): file extents enabled, maximum tree depth=5 [ 125.527499] LDISKFS-fs (sdb1): mounted filesystem with ordered data mode. Opts: acl,user_xattr,,errors=remount-ro,no_mbcache [ 125.840774] LustreError: 9640:0:(mgc_request.c:253:do_config_log_add()) MGC10.2.4.47@tcp: failed processing log, type 4: rc = -22 [ 125.920049] LustreError: 11501:0:(nodemap_storage.c:368:nodemap_idx_nodemap_add_update()) cannot add nodemap config to non-existing MGS. [ 125.936801] LustreError: 11501:0:(nodemap_storage.c:1324:nodemap_fs_init()) lustre-OST0000: error loading nodemap config file, file must be removed via ldiskfs: rc = -22 [ 126.009549] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880825f99200[0x0, 1, [0x1:0x0:0x0] hash exist]{ [ 126.009549] [ 126.027440] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880825f99250 [ 126.027440] [ 126.042772] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826527800osd-ldiskfs-object@ffff880826527800(i:ffff8803fa4d55c8:81/3977440011)[plain] [ 126.042772] [ 126.067001] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880825f99200 [ 126.067001] [ 126.081779] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff8808232dd380[0x0, 1, [0x200000003:0x0:0x0] hash exist]{ [ 126.081779] [ 126.101690] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff8808232dd3d0 [ 126.101690] [ 126.116828] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff8800b5d71100osd-ldiskfs-object@ffff8800b5d71100(i:ffff8803fa4cc4c8:80/3977439977)[plain] [ 126.116828] [ 126.140682] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff8808232dd380 [ 126.140682] [ 126.156197] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880825f98840[0x0, 1, [0xa:0x0:0x0] hash exist]{ [ 126.156197] [ 126.173455] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880825f98890 [ 126.173455] [ 126.188219] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826526300osd-ldiskfs-object@ffff880826526300(i:ffff8803fa4de6c8:82/3977440045)[plain] [ 126.188219] [ 126.211805] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880825f98840 [ 126.211805] [ 126.226004] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880424c200c0[0x0, 1, [0x200000003:0x8:0x0] hash exist]{ [ 126.226004] [ 126.245393] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880424c20110 [ 126.245393] [ 126.260065] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826527900osd-ldiskfs-object@ffff880826527900(i:ffff8803fa4df388:98/2123498910)[lfix] [ 126.260065] [ 126.283156] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880424c200c0 [ 126.283156] [ 126.298736] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880424c20f00[0x0, 1, [0xa:0xc:0x0] hash exist]{ [ 126.298736] [ 126.315798] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880424c20f50 [ 126.315798] [ 126.330523] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826525000osd-ldiskfs-object@ffff880826525000(i:ffff8803fa4def48:83/2922743499)[plain] [ 126.330523] [ 126.353865] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880424c20f00 [ 126.353865] [ 126.367878] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) header@ffff880825f99ec0[0x0, 1, [0x200000001:0x1017:0x0] hash exist]{ [ 126.367878] [ 126.387236] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....local_storage@ffff880825f99f10 [ 126.387236] [ 126.401810] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) ....osd-ldiskfs@ffff880826525a00osd-ldiskfs-object@ffff880826525a00(i:ffff8803fa4c55c8:12/2606405092)[plain] [ 126.401810] [ 126.424933] LustreError: 11501:0:(ofd_dev.c:248:ofd_stack_fini()) } header@ffff880825f99ec0 [ 126.424933] [ 126.475139] LustreError: 11501:0:(obd_config.c:578:class_setup()) setup lustre-OST0000 failed (-22) [ 126.488291] LustreError: 11501:0:(obd_config.c:1671:class_config_llog_handler()) MGC10.2.4.47@tcp: cfg command failed: rc = -22 [ 126.507069] Lustre: cmd=cf003 0:lustre-OST0000 1:dev 2:0 3:f [ 126.507069] [ 126.521361] LustreError: 15b-f: MGC10.2.4.47@tcp: The configuration from log 'lustre-OST0000'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. [ 126.547018] LustreError: 9640:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server lustre-OST0000: -22 [ 126.562626] LustreError: 9640:0:(lu_object.c:1243:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 [ 126.581424] LustreError: 9640:0:(lu_object.c:1243:lu_device_fini()) LBUG [ 126.591487] Pid: 9640, comm: mount.lustre [ 126.598335] [ 126.598335] Call Trace: [ 126.607357] [<ffffffffa072c7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [ 126.617283] [<ffffffffa072cd75>] lbug_with_loc+0x45/0xc0 [libcfs] [ 126.626242] [<ffffffffa0867c78>] lu_device_fini+0xb8/0xc0 [obdclass] [ 126.635477] [<ffffffffa084cd72>] ls_device_put+0x82/0x2a0 [obdclass] [ 126.644565] [<ffffffffa084d06d>] local_oid_storage_fini+0xdd/0x210 [obdclass] [ 126.654470] [<ffffffffa0806281>] mgc_set_info_async+0x951/0x1630 [mgc] [ 126.663594] [<ffffffffa08611c9>] ? lustre_process_log+0x9e9/0xc00 [obdclass] [ 126.673310] [<ffffffffa0737957>] ? libcfs_debug_msg+0x57/0x80 [libcfs] Message from[ 126.682342] [<ffffffffa088bbf4>] server_start_targets+0x794/0x2d20 [obdclass] syslogd@onyx-26[ 126.692039] [<ffffffffa0864ab6>] ? lustre_start_mgc+0x996/0x2490 [obdclass] at Sep 12 17:52[ 126.701460] [<ffffffffa085d030>] ? class_config_llog_handler+0x0/0x1b60 [obdclass] :37 ... kerne[ 126.711580] [<ffffffffa088f20d>] server_fill_super+0x108d/0x184c [obdclass] l:LustreError: 9[ 126.720983] [<ffffffffa0867058>] lustre_fill_super+0x328/0x950 [obdclass] 640:0:(lu_object[ 126.730180] [<ffffffffa0866d30>] ? lustre_fill_super+0x0/0x950 [obdclass] .c:1243:lu_devic[ 126.739418] [<ffffffff811e235d>] mount_nodev+0x4d/0xb0 e_fini()) ASSERT[ 126.746774] [<ffffffffa085ef88>] lustre_mount+0x38/0x60 [obdclass] ION( atomic_read[ 126.755355] [<ffffffff811e2d09>] mount_fs+0x39/0x1b0 (&d->ld_ref) == [ 126.762525] [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0 0 ) failed: Refc[ 126.770221] [<ffffffff81200b2e>] do_mount+0x24e/0xa40 ount is 1 [ 126.777478] [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50 Message from sys[ 126.785472] [<ffffffff812013b6>] SyS_mount+0x96/0xf0 logd@onyx-26 at [ 126.792693] [<ffffffff81646d89>] system_call_fastpath+0x16/0x1b Sep 12 17:52:37 [ 126.800984] ... kernel:Lu[ 126.804371] Kernel panic - not syncing: LBUG streError: 9640:0:(lu_object.c:1[ 126.811718] CPU: 19 PID: 9640 Comm: mount.lustre Tainted: G IOE ------------ 3.10.0-327.28.3.el7_lustre.x86_64 #1 243:lu_device_fini()) LBUG [ 126.827584] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.99.99.x045.022820121209 02/28/2012 [ 126.842017] ffffffffa0749def 000000007f8e88de ffff880824ef79e8 ffffffff8163667b [ 126.852647] ffff880824ef7a68 ffffffff8162ff0a ffffffff00000008 ffff880824ef7a78 [ 126.863183] ffff880824ef7a18 000000007f8e88de ffffffffa08981d5 0000000000000000 [ 126.873670] Call Trace: [ 126.878607] [<ffffffff8163667b>] dump_stack+0x19/0x1b [ 126.886490] [<ffffffff8162ff0a>] panic+0xd8/0x1e7 [ 126.893930] [<ffffffffa072cddb>] lbug_with_loc+0xab/0xc0 [libcfs] [ 126.903113] [<ffffffffa0867c78>] lu_device_fini+0xb8/0xc0 [obdclass] [ 126.912379] [<ffffffffa084cd72>] ls_device_put+0x82/0x2a0 [obdclass] [ 126.921613] [<ffffffffa084d06d>] local_oid_storage_fini+0xdd/0x210 [obdclass] [ 126.931643] [<ffffffffa0806281>] mgc_set_info_async+0x951/0x1630 [mgc] [ 126.941011] [<ffffffffa08611c9>] ? lustre_process_log+0x9e9/0xc00 [obdclass] [ 126.950943] [<ffffffffa0737957>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [ 126.960275] [<ffffffffa088bbf4>] server_start_targets+0x794/0x2d20 [obdclass] [ 126.970247] [<ffffffffa0864ab6>] ? lustre_start_mgc+0x996/0x2490 [obdclass] [ 126.979979] [<ffffffffa085d030>] ? class_config_dump_handler+0xb30/0xb30 [obdclass] [ 126.990495] [<ffffffffa088f20d>] server_fill_super+0x108d/0x184c [obdclass] [ 127.000180] [<ffffffffa0867058>] lustre_fill_super+0x328/0x950 [obdclass] [ 127.009642] [<ffffffffa0866d30>] ? lustre_common_put_super+0x270/0x270 [obdclass] [ 127.019858] [<ffffffff811e235d>] mount_nodev+0x4d/0xb0 [ 127.027464] [<ffffffffa085ef88>] lustre_mount+0x38/0x60 [obdclass] [ 127.036232] [<ffffffff811e2d09>] mount_fs+0x39/0x1b0 [ 127.043618] [<ffffffff811fe5df>] vfs_kern_mount+0x5f/0xf0 [ 127.051489] [<ffffffff81200b2e>] do_mount+0x24e/0xa40 [ 127.058959] [<ffffffff8116e30e>] ? __get_free_pages+0xe/0x50 [ 127.067116] [<ffffffff812013b6>] SyS_mount+0x96/0xf0 [ 127.074491] [<ffffffff81646d89>] system_call_fastpath+0x16/0x1b [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-327.28.3.el7_lustre.x86_64 (jenkins@onyx-5-sdh1-el7-x8664.onyx.hpdd.intel.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 9 20:39:59 PDT 2016

          People

            kit.westneat Kit Westneat (Inactive)
            standan Saurabh Tandan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: