Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10881

OST fails to mount after installing 2.10.3

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.4
    • Soak cluster, lustre-b2_10-ib build 33
    • 3
    • 9223372036854775807

    Description

      Downgraded cluster from 2.11 to 2.10.3. OSTs refuse to mount

      Apr  4 22:20:57 soak-2 sshd[5680]: pam_unix(sshd:session): session opened for user root by (uid=0)
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804147f6780[0x0, 1, [0x1:0x0:0x0] hash exist]{
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804147f67d0
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff880403614618osd-zfs-object@ffff880403614618
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804147f6780
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800aeac7b00[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800aeac7b50
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803fed72970osd-zfs-object@ffff8803fed72970
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8800aeac7b00
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804147f7a40[0x0, 1, [0x200000003:0x2:0x0] hash exist]{
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804147f7a90
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff880403614750osd-zfs-object@ffff880403614750
      Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804147f7a40
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff88040a271ec0[0x0, 1, [0xa:0x0:0x0] hash exist]{
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff88040a271f10
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff88041434c888osd-zfs-object@ffff88041434c888
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff88040a271ec0
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff88041722fa40[0x0, 1, [0xa:0x9:0x0] hash exist]{
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff88041722fa90
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff880035d05998osd-zfs-object@ffff880035d05998
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff88041722fa40
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(obd_config.c:558:class_setup()) setup soaked-OST0000 failed (-17)
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(obd_config.c:1682:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -17
      Apr  4 22:20:59 soak-2 kernel: Lustre:    cmd=cf003 0:soaked-OST0000  1:dev  2:0  3:f
      Apr  4 22:20:59 soak-2 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-OST0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5705:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server soaked-OST0000: -17
      Apr  4 22:20:59 soak-2 kernel: LustreError: 5705:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -17
      

      Will try re-formatting the fs

      Attachments

        Activity

          [LU-10881] OST fails to mount after installing 2.10.3

          I've also reproduced the failure on soak-3, so it's definitely a code issue. stack dumping/Crash dumping soak-3 now, crash dump will be available on Spirit

          cliffw Cliff White (Inactive) added a comment - I've also reproduced the failure on soak-3, so it's definitely a code issue. stack dumping/Crash dumping soak-3 now, crash dump will be available on Spirit

          Console log. mount attempt was made after reboot.

          
          

          [ 102.088947] Lustre: Lustre: Build Version: 2.10.3_132_g6910400
          [ 102.282996] LNet: Added LNI 192.168.1.102@o2ib [8/256/0/180]
          [ 124.448446] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0d80[0x0, 1, [0x1:0x0:0x0] hash exist]

          { [ 124.462670] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0dd0 [ 124.474514] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a49c0osd-zfs-object@ffff8800ae6a49c0 [ 124.488780] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8804046e0d80

          [ 124.499730] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8801797dccc0[0x0, 1, [0x200000003:0x0:0x0] hash exist]

          { [ 124.514673] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8801797dcd10 [ 124.526505] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803ff7f1728osd-zfs-object@ffff8803ff7f1728 [ 124.540748] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8801797dccc0

          [ 124.551682] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0540[0x0, 1, [0x200000003:0x2:0x0] hash exist]

          { [ 124.566619] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0590 [ 124.578435] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a4af8osd-zfs-object@ffff8800ae6a4af8 [ 124.592668] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8804046e0540

          [ 124.603600] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0840[0x0, 1, [0xa:0x0:0x0] hash exist]

          { [ 124.617728] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0890 [ 124.629531] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a55f0osd-zfs-object@ffff8800ae6a55f0 [ 124.643755] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8804046e0840

          [ 124.654672] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800af39af00[0x0, 1, [0xa:0x5:0x0] hash exist]

          { [ 124.668796] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800af39af50 [ 124.514673] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8801797dcd10 [ 124.526505] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803ff7f1728osd-zfs-object@ffff8803ff7f1728 [ 124.540748] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8801797dccc0

          [ 124.551682] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0540[0x0, 1, [0x200000003:0x2:0x0] hash exist]

          { [ 124.566619] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0590 [ 124.578435] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a4af8osd-zfs-object@ffff8800ae6a4af8 [ 124.592668] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8804046e0540

          [ 124.603600] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0840[0x0, 1, [0xa:0x0:0x0] hash exist]

          { [ 124.617728] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0890 [ 124.629531] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a55f0osd-zfs-object@ffff8800ae6a55f0 [ 124.643755] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8804046e0840

          [ 124.654672] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800af39af00[0x0, 1, [0xa:0x5:0x0] hash exist]

          { [ 124.668796] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800af39af50 [ 124.681882] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff88040c464138osd-zfs-object@ffff88040c464138 [ 124.698719] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

          header@ffff8800af39af00

          [ 124.712451] LustreError: 5487:0:(obd_config.c:558:class_setup()) setup soaked-OST0000 failed (-17)
          [ 124.725499] LustreError: 5487:0:(obd_config.c:1682:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -17
          [ 124.744933] Lustre: cmd=cf003 0:soaked-OST0000 1:dev 2:0 3:f

          [ 124.759606] LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-OST0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          [ 124.792996] LustreError: 5358:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server soaked-OST0000: -17
          [ 124.808813] LustreError: 5358:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -17
          [ 124.822620] LustreError: 5358:0:(obd_config.c:609:class_cleanup()) Device 3 not setup
          [ 124.836307] Lustre: server umount soaked-OST0000 complete
          [ 124.844943] LustreError: 5358:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount soaked-ost0/ost0 (-17)

          cliffw Cliff White (Inactive) added a comment - Console log. mount attempt was made after reboot. [ 102.088947] Lustre: Lustre: Build Version: 2.10.3_132_g6910400 [ 102.282996] LNet: Added LNI 192.168.1.102@o2ib [8/256/0/180] [ 124.448446] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0d80[0x0, 1, [0x1:0x0:0x0] hash exist] { [ 124.462670] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0dd0 [ 124.474514] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a49c0osd-zfs-object@ffff8800ae6a49c0 [ 124.488780] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804046e0d80 [ 124.499730] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8801797dccc0[0x0, 1, [0x200000003:0x0:0x0] hash exist] { [ 124.514673] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8801797dcd10 [ 124.526505] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803ff7f1728osd-zfs-object@ffff8803ff7f1728 [ 124.540748] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8801797dccc0 [ 124.551682] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0540[0x0, 1, [0x200000003:0x2:0x0] hash exist] { [ 124.566619] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0590 [ 124.578435] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a4af8osd-zfs-object@ffff8800ae6a4af8 [ 124.592668] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804046e0540 [ 124.603600] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0840[0x0, 1, [0xa:0x0:0x0] hash exist] { [ 124.617728] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0890 [ 124.629531] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a55f0osd-zfs-object@ffff8800ae6a55f0 [ 124.643755] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804046e0840 [ 124.654672] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800af39af00[0x0, 1, [0xa:0x5:0x0] hash exist] { [ 124.668796] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800af39af50 [ 124.514673] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8801797dcd10 [ 124.526505] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803ff7f1728osd-zfs-object@ffff8803ff7f1728 [ 124.540748] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8801797dccc0 [ 124.551682] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0540[0x0, 1, [0x200000003:0x2:0x0] hash exist] { [ 124.566619] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0590 [ 124.578435] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a4af8osd-zfs-object@ffff8800ae6a4af8 [ 124.592668] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804046e0540 [ 124.603600] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0840[0x0, 1, [0xa:0x0:0x0] hash exist] { [ 124.617728] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0890 [ 124.629531] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a55f0osd-zfs-object@ffff8800ae6a55f0 [ 124.643755] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804046e0840 [ 124.654672] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800af39af00[0x0, 1, [0xa:0x5:0x0] hash exist] { [ 124.668796] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800af39af50 [ 124.681882] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff88040c464138osd-zfs-object@ffff88040c464138 [ 124.698719] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8800af39af00 [ 124.712451] LustreError: 5487:0:(obd_config.c:558:class_setup()) setup soaked-OST0000 failed (-17) [ 124.725499] LustreError: 5487:0:(obd_config.c:1682:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -17 [ 124.744933] Lustre: cmd=cf003 0:soaked-OST0000 1:dev 2:0 3:f [ 124.759606] LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-OST0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [ 124.792996] LustreError: 5358:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server soaked-OST0000: -17 [ 124.808813] LustreError: 5358:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -17 [ 124.822620] LustreError: 5358:0:(obd_config.c:609:class_cleanup()) Device 3 not setup [ 124.836307] Lustre: server umount soaked-OST0000 complete [ 124.844943] LustreError: 5358:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount soaked-ost0/ost0 (-17)

          Hit this again when downgrading to tip of 2.10. Will leave system in this state if further information desired.

          cliffw Cliff White (Inactive) added a comment - Hit this again when downgrading to tip of 2.10. Will leave system in this state if further information desired.

          Unfortunately no. We dumped lctl log after one failure, file attached. System has been re-formatted, which seems to have removed the problem

          cliffw Cliff White (Inactive) added a comment - Unfortunately no. We dumped lctl log after one failure, file attached. System has been re-formatted, which seems to have removed the problem

          Finding the "hash exist" message was not very clear. This message is generated in two parts in lu_object_header_print(), and it appears that this is from lu_site_print->lu_site_obj_print->lu_object_print(). This is printed from ofd_stack_fini() during cleanup if ls_obj_hash() is not empty (apparently because local_oid_storage_fini() did not clean up properly), but that doesn't appear to be the reason why the startup failed.

          Are there earlier messages in the logs that indicate why the mount failed?

          adilger Andreas Dilger added a comment - Finding the " hash exist " message was not very clear. This message is generated in two parts in lu_object_header_print() , and it appears that this is from lu_site_print->lu_site_obj_print->lu_object_print() . This is printed from ofd_stack_fini() during cleanup if ls_obj_hash() is not empty (apparently because local_oid_storage_fini() did not clean up properly), but that doesn't appear to be the reason why the startup failed. Are there earlier messages in the logs that indicate why the mount failed?

          People

            wc-triage WC Triage
            cliffw Cliff White (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: