Prakash, you are correct that this can happen if the MDS is started before the OSS. The message is printed to the console to alert the sysadmin in case the target OST is not starting up properly, but I agree it is a distraction if it is printed due to some transient condition.
That said, when Brian submitted the patch to update this console message he left in the printing of errors during the initial connection attempt. I think it would make sense to avoid printing this error if there are just a small number of failed initial connection attempts, but still print something if the connection is failing for a long time. It seems reasonable to only print out such messages when there are persistent problems on the connection.
I've pushed an RFC patch http://review.whamcloud.com/10057 but I haven't tested it at all. In particular, I'm not sure if the same request is used repeatedly for the initial connection (which means rq_nr_resends is properly incremented) or if a new request is used each time (which means my attempt at squashing the initial connect messages will fail). Bobijam, could you please take a look at this?
Prakash, you are correct that this can happen if the MDS is started before the OSS. The message is printed to the console to alert the sysadmin in case the target OST is not starting up properly, but I agree it is a distraction if it is printed due to some transient condition.
That said, when Brian submitted the patch to update this console message he left in the printing of errors during the initial connection attempt. I think it would make sense to avoid printing this error if there are just a small number of failed initial connection attempts, but still print something if the connection is failing for a long time. It seems reasonable to only print out such messages when there are persistent problems on the connection.
I've pushed an RFC patch http://review.whamcloud.com/10057 but I haven't tested it at all. In particular, I'm not sure if the same request is used repeatedly for the initial connection (which means rq_nr_resends is properly incremented) or if a new request is used each time (which means my attempt at squashing the initial connect messages will fail). Bobijam, could you please take a look at this?