[LU-10881] OST fails to mount after installing 2.10.3 Created: 04/Apr/18  Updated: 11/May/18

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.4
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: soak
Environment:

Soak cluster, lustre-b2_10-ib build 33


Attachments: Text File soak-2.mount.fail.txt    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Downgraded cluster from 2.11 to 2.10.3. OSTs refuse to mount

Apr  4 22:20:57 soak-2 sshd[5680]: pam_unix(sshd:session): session opened for user root by (uid=0)
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804147f6780[0x0, 1, [0x1:0x0:0x0] hash exist]{
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804147f67d0
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff880403614618osd-zfs-object@ffff880403614618
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804147f6780
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800aeac7b00[0x0, 1, [0x200000003:0x0:0x0] hash exist]{
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800aeac7b50
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803fed72970osd-zfs-object@ffff8803fed72970
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8800aeac7b00
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804147f7a40[0x0, 1, [0x200000003:0x2:0x0] hash exist]{
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804147f7a90
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff880403614750osd-zfs-object@ffff880403614750
Apr  4 22:20:58 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff8804147f7a40
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff88040a271ec0[0x0, 1, [0xa:0x0:0x0] hash exist]{
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff88040a271f10
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff88041434c888osd-zfs-object@ffff88041434c888
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff88040a271ec0
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff88041722fa40[0x0, 1, [0xa:0x9:0x0] hash exist]{
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff88041722fa90
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff880035d05998osd-zfs-object@ffff880035d05998
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(ofd_dev.c:251:ofd_stack_fini()) } header@ffff88041722fa40
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(obd_config.c:558:class_setup()) setup soaked-OST0000 failed (-17)
Apr  4 22:20:59 soak-2 kernel: LustreError: 5814:0:(obd_config.c:1682:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -17
Apr  4 22:20:59 soak-2 kernel: Lustre:    cmd=cf003 0:soaked-OST0000  1:dev  2:0  3:f
Apr  4 22:20:59 soak-2 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-OST0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Apr  4 22:20:59 soak-2 kernel: LustreError: 5705:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server soaked-OST0000: -17
Apr  4 22:20:59 soak-2 kernel: LustreError: 5705:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -17

Will try re-formatting the fs



 Comments   
Comment by Andreas Dilger [ 05/Apr/18 ]

Finding the "hash exist" message was not very clear. This message is generated in two parts in lu_object_header_print(), and it appears that this is from lu_site_print->lu_site_obj_print->lu_object_print(). This is printed from ofd_stack_fini() during cleanup if ls_obj_hash() is not empty (apparently because local_oid_storage_fini() did not clean up properly), but that doesn't appear to be the reason why the startup failed.

Are there earlier messages in the logs that indicate why the mount failed?

Comment by Cliff White (Inactive) [ 05/Apr/18 ]

Unfortunately no. We dumped lctl log after one failure, file attached. System has been re-formatted, which seems to have removed the problem

Comment by Cliff White (Inactive) [ 11/May/18 ]

Hit this again when downgrading to tip of 2.10. Will leave system in this state if further information desired.

Comment by Cliff White (Inactive) [ 11/May/18 ]

Console log. mount attempt was made after reboot.


[ 102.088947] Lustre: Lustre: Build Version: 2.10.3_132_g6910400
[ 102.282996] LNet: Added LNI 192.168.1.102@o2ib [8/256/0/180]
[ 124.448446] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0d80[0x0, 1, [0x1:0x0:0x0] hash exist]

{ [ 124.462670] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0dd0 [ 124.474514] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a49c0osd-zfs-object@ffff8800ae6a49c0 [ 124.488780] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8804046e0d80

[ 124.499730] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8801797dccc0[0x0, 1, [0x200000003:0x0:0x0] hash exist]

{ [ 124.514673] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8801797dcd10 [ 124.526505] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803ff7f1728osd-zfs-object@ffff8803ff7f1728 [ 124.540748] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8801797dccc0

[ 124.551682] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0540[0x0, 1, [0x200000003:0x2:0x0] hash exist]

{ [ 124.566619] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0590 [ 124.578435] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a4af8osd-zfs-object@ffff8800ae6a4af8 [ 124.592668] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8804046e0540

[ 124.603600] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0840[0x0, 1, [0xa:0x0:0x0] hash exist]

{ [ 124.617728] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0890 [ 124.629531] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a55f0osd-zfs-object@ffff8800ae6a55f0 [ 124.643755] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8804046e0840

[ 124.654672] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800af39af00[0x0, 1, [0xa:0x5:0x0] hash exist]

{ [ 124.668796] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800af39af50 [ 124.514673] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8801797dcd10 [ 124.526505] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8803ff7f1728osd-zfs-object@ffff8803ff7f1728 [ 124.540748] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8801797dccc0

[ 124.551682] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0540[0x0, 1, [0x200000003:0x2:0x0] hash exist]

{ [ 124.566619] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0590 [ 124.578435] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a4af8osd-zfs-object@ffff8800ae6a4af8 [ 124.592668] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8804046e0540

[ 124.603600] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8804046e0840[0x0, 1, [0xa:0x0:0x0] hash exist]

{ [ 124.617728] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8804046e0890 [ 124.629531] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff8800ae6a55f0osd-zfs-object@ffff8800ae6a55f0 [ 124.643755] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8804046e0840

[ 124.654672] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) header@ffff8800af39af00[0x0, 1, [0xa:0x5:0x0] hash exist]

{ [ 124.668796] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....local_storage@ffff8800af39af50 [ 124.681882] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) ....osd-zfs@ffff88040c464138osd-zfs-object@ffff88040c464138 [ 124.698719] LustreError: 5487:0:(ofd_dev.c:251:ofd_stack_fini()) }

header@ffff8800af39af00

[ 124.712451] LustreError: 5487:0:(obd_config.c:558:class_setup()) setup soaked-OST0000 failed (-17)
[ 124.725499] LustreError: 5487:0:(obd_config.c:1682:class_config_llog_handler()) MGC192.168.1.108@o2ib: cfg command failed: rc = -17
[ 124.744933] Lustre: cmd=cf003 0:soaked-OST0000 1:dev 2:0 3:f

[ 124.759606] LustreError: 15c-8: MGC192.168.1.108@o2ib: The configuration from log 'soaked-OST0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
[ 124.792996] LustreError: 5358:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server soaked-OST0000: -17
[ 124.808813] LustreError: 5358:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -17
[ 124.822620] LustreError: 5358:0:(obd_config.c:609:class_cleanup()) Device 3 not setup
[ 124.836307] Lustre: server umount soaked-OST0000 complete
[ 124.844943] LustreError: 5358:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount soaked-ost0/ost0 (-17)

Comment by Cliff White (Inactive) [ 11/May/18 ]

I've also reproduced the failure on soak-3, so it's definitely a code issue. stack dumping/Crash dumping soak-3 now, crash dump will be available on Spirit

Generated at Sat Feb 10 02:39:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.