[LU-899] Client Connectivity Issues in Complex Lustre Environment Created: 05/Dec/11 Updated: 14/Dec/11 Resolved: 14/Dec/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Dennis Nelson | Assignee: | Cliff White (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
The cluster configuration is as follows: scratch1 Lustre filesystem - 2 MDS, 16 OSS, 4 DDN SFA 10K arrays The scratch1 and scratch2 servers each have 4 IB ports. The ports are used for client connectivity as follows: Production compute clients access scratch1 via the ib0 port. The servers are running CentOS 5.5 (2.6.18-238.12.1.el5) Server Configuration: [root@lfs-mds-1-1 ~]# cat /etc/modprobe.d/lustre.conf [root@lfs-mds-2-1 ~]# cat /etc/modprobe.d/lustre.conf [root@lfs-mds-2-1 ~]# lctl list_nids lfs-mds-2-2: [root@lfs-mds-2-2 ~]# cat /etc/modprobe.d/lustre.conf [root@lfs-mds-2-2 ~]# lctl list_nids |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 6508 |
| Description |
|
Connectivity Issues: client fe2: [root@fe2 ~]# date Client fe2: [root@fe2 ~]# cat /etc/modprobe.d/lustre.conf
[root@fe2 ~]# lctl list_nids [root@fe2 ~]# cat /etc/fstab | grep lustre [root@fe2 ~]# df -h | grep lustre The configuration of the data transfer nodes differs in that they only have 1 active ib port where the login nodes have 3. Even so, they both use the same ib fabric to connect to the production filesystems. The dtn nodes are able to mount the scratch2 filesystem without issue, but cannot mount the scratch1 filesystem. dtn1: [root@dtn1 ~]# cat /etc/modprobe.d/lustre.conf
[root@dtn1 ~]# lctl list_nids [root@dtn1 ~]# lctl ping 10.174.80.40@o2ib2 [root@dtn1 ~]# mount /mnt/lustre2 [root@dtn1 ~]# mount /mnt/lustre1 [root@dtn1 ~]# cat /etc/fstab | grep lustre Finally, the TDS compute nodes cannot access the production filesystems. They have the TDS filesystems mounted (lustre1 and lustre2). |
| Comments |
| Comment by Cliff White (Inactive) [ 05/Dec/11 ] |
|
I do not see any information about you MGS, are you running the MGS co-located with the MDS? It might be better for this configuration to have one separate MGS for all the filesystems. If lctl ping works, it is odd that the mount would fail, it may be indeed a network issue. Also, you might check the MDS disk (tunefs --print) to see if the failover NIDS are correct on the disk. |
| Comment by Dennis Nelson [ 05/Dec/11 ] |
|
The MGS is co-located with the MDS. I did neglect to include the MDT nid information: [root@lfs-mds-1-1 ~]# tunefs.lustre --dryrun /dev/vg_scratch1/mdt Read previous values: Permanent disk data: exiting before disk write. The MGT does not have any NID information. It is my understanding that the client mount command specifies the nids of the systems where the MGT will be mounted. |
| Comment by Dennis Nelson [ 05/Dec/11 ] |
|
The other thing I would point out is that the dtn nodes are able to mount the scratch2 filesystem. They are attempting to mount the scratch1 filesystem over the same ib subnet (10.174.80.xx). Additionally, the login nodes are able to mount both filesystems using that same subnet. I don't understand how the cause could be a network issue when both the client and the server seem to be able to communicate over the subnet without issues. |
| Comment by Cliff White (Inactive) [ 05/Dec/11 ] |
|
You didn't include the scratch2 info, but on scratch1 I notice you have everything listed twice: Parameters: mgsnode=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 mgsnode=10.174.31.251@o2ib,10.174.79.251@o2ib1,10.174.80.41@o2ib2 failover.node=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 failover.node=10.174.31.251@o2ib,10.174.79.251@o2ib1,10.174.80.41@o2ib2 mdt.quota_type=ug I think what you want would be everything listed once: |
| Comment by Dennis Nelson [ 05/Dec/11 ] |
|
I do not know what you mean. I have mgsnode listed twice, once for one mds, listing the 3 nids that are used for scratch1, and another time for the other mds, again, listing the 3 nids that are used for scratch1. None of the nids are repeated. |
| Comment by Cliff White (Inactive) [ 06/Dec/11 ] |
|
You have the same NIDs listed as 'mgsnode' and as 'failnode' mgsnode=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 ... failover.node=10.174.31.241@o2ib,10.174.79.241@o2ib1,10.174.80.40@o2ib2 ... mdt.quota_type=ug That is not necessary, and may have something to do with the delay in mounting. A NID should be listed either as mgsnode, or failnode, not both as you have here. A NID only needs to be listed once, as mgsnode and failnode are both used when finding a server. The NIDs in the 'failover.node' list do NOT have to be in the 'mgsnode' list and should not be duplicated in this fashion. |
| Comment by Dennis Nelson [ 06/Dec/11 ] |
|
So, would you suggest using mgsnode or failover.node? Are they identical? It appears that the DDN tools add both of these. |
| Comment by Ashley Pittman (Inactive) [ 06/Dec/11 ] |
|
This is the output of tunefs.lustre --print for the MDT on scratch2, it's from a snapshot taken last week so may not be up to date, Dennis, can you check this and update if it's wrong. checking for existing Lustre data: found CONFIGS/mountdata Read previous values: Permanent disk data: exiting before disk write. |
| Comment by Dennis Nelson [ 06/Dec/11 ] |
|
Yes, it is the same. It has not been modified since it was initially installed. |
| Comment by Ashley Pittman (Inactive) [ 06/Dec/11 ] |
|
To be clear, when Dennis says "The MGS is co-located with the MDS" what we mean is that it's running from a different partition on the same hardware. The NIDS used to access it should be the same but it is a different partition and external to the MDT. |
| Comment by Cliff White (Inactive) [ 06/Dec/11 ] |
|
The primary NIDS for the node do not need to be listed as 'failnode' When the MDT is registered with the MGS (which should always be done from the primary) |
| Comment by Cliff White (Inactive) [ 06/Dec/11 ] |
|
You have multiple issues going on here. First, can you attach syslogs from a mount attempt from dtn1? Dec 5 17:33:15 fe2 kernel: Lustre: scratch1-MDT0000-mdc-ffff8817f3d6e400: Connection to service scratch1-MDT0000 via nid 10.174.31.241@o2ib was lost; in progress operations using this service will wait for recovery to complete. Dec 5 17:33:29 fe2 kernel: Lustre: 20703:0:(import.c:517:import_select_connection()) scratch1-MDT0000-mdc-ffff8817f3d6e400: tried all connections, increasing latency to 2s Dec 5 17:34:24 fe2 kernel: Lustre: scratch1-MDT0000-mdc-ffff8817f3d6e400: Connection restored to service scratch1-MDT0000 using nid 10.174.31.241@o2ib. |
| Comment by Dennis Nelson [ 06/Dec/11 ] |
|
Yes, I believe there are multiple issues. Different systems have different symptoms even across the same subnet. I have not seen any indication of a network issue, although, I certainly will not rule out network issues. |
| Comment by Dennis Nelson [ 07/Dec/11 ] |
|
Sorry, I sent the trace for the other case but did not send the trace for dtn1. Here it is. [root@dtn1 ~]# lustre_rmmod |
| Comment by Cliff White (Inactive) [ 07/Dec/11 ] |
|
Ah, the lustre dumps were requested on bug 890 - I need the syslog (/var/log/messages or dmesg) from the dtn1 mount attempt. Thanks again |
| Comment by Cliff White (Inactive) [ 07/Dec/11 ] |
|
Hmm. I am starting to suspect there may be config log issues, I see this: and i don't see it trying any of the alternate addresses. |
| Comment by Dennis Nelson [ 07/Dec/11 ] |
|
So, would a --writeconf for all devices be in order. I was thinking of doing that but I have not had any downtime to do it. The filesystem is not in production but others are using it to prepare for acceptance. I can certainly schedule time if you suggest that is the right plan of action. That might also fix the issue that I am seeing in |
| Comment by Dennis Nelson [ 07/Dec/11 ] |
|
Sorry, trying to do many things at once. Here are the syslogs from dtn1: [root@dtn1 ~]# umount -at lustre Dec 7 14:51:49 dtn1 root: Start Test |
| Comment by Cliff White (Inactive) [ 07/Dec/11 ] |
|
We'd really like to know what is wrong before doing the writeconf. Would be good to know what the current config is.
The result should be something like this: Target uuid : config_uuid In my config, 10.67.73.82 is the MDS failover NID, the other NIDS are OSS/MDS primary NID. Check your results and verify the proper NIDS are being given to the clients. If all the NIDS aren't in the config log, a writeconf is needed. If the clients are getting a correct NID list from these config logs, then the issue is most likely networking. |
| Comment by Dennis Nelson [ 08/Dec/11 ] |
|
Sorry for the delay. I captured the info last night and before I could upload the files my laptop died. I had to reschedule time to get the system again. |
| Comment by Dennis Nelson [ 08/Dec/11 ] |
|
So, it is my understanding that I should find the client nids in the output. I cannot find the dtn1 nid in either although dtn1 has scratch2 mounted. In fact, I cannot find any of the 10.174.81.xx nids in the lustre1. I assume that is part of our problem but what would cause that? |
| Comment by Cliff White (Inactive) [ 08/Dec/11 ] |
|
No, there are no client NIDs in the config log. As I mentioned previous, there are only server NIDS, we wanted to see if all the server NIDS were being given to the clients. The error from the dtn1 mount would appear to indicate a possible corrupt config log. |
| Comment by Cliff White (Inactive) [ 08/Dec/11 ] |
|
this is being escalated - please attach the full config logs from both systems. Do the same thing as previous, but instead of the 'grep uuid' just > the whole thing to a file and attach. |
| Comment by Dennis Nelson [ 08/Dec/11 ] |
|
OK, here they are. |
| Comment by Johann Lombardi (Inactive) [ 08/Dec/11 ] |
|
It seems that the config logs of scratch1 do not properly set up the 3 nids of lfs-mds-1-1. Only 10.174.31.241@o2ib is added to the niduuid before attach/setup. Could you please run the following command on a login node which has the 2 filesystems mounted: The only way to fix this would be to regenerate the config logs with writeconf. |
| Comment by Dennis Nelson [ 08/Dec/11 ] |
|
[root@fe2 ~]# lctl get_param mdc.*.import Any ideas why scratch2 would not show the 10.174.80.[42,43] addresses listed? The way this was designed by the customer was that the login nodes would use the .80 subnet. The login nodes really only should have the single nid I added the others as a workaround. |
| Comment by Cliff White (Inactive) [ 08/Dec/11 ] |
|
writeconf should fix the issue |
| Comment by Dennis Nelson [ 08/Dec/11 ] |
|
OK, so I understand we are going to need to do a writeconf. I actually asked about doing that already. My customer is going to be asking a lot of questions tomorrow morning, so let me ask a few now. 1. Any idea why this did not work in the first place? Thanks, |
| Comment by Johann Lombardi (Inactive) [ 09/Dec/11 ] |
|
> Any ideas why scratch2 would not show the 10.174.80.[42,43] addresses listed? The first time a target is mounted, it registers all its configured nids to the MGS. > 1. Any idea why this did not work in the first place? Could you please tell us the exact commands you ran when you formatted the the MDTs? > 2. Is there a limit to the numbers of nids that a server has? Are we reaching some limit in LNET? Not as far as i know. However, configuring multiple failover nids can increase the recovery time significantly. > 3. What makes us think it will work after doing a writeconf? Let's first check how you configured the filesystem in the first place before going down this path. |
| Comment by Dennis Nelson [ 09/Dec/11 ] |
|
I used the DDN tools to format the MDTs. It is done in two steps. First, the formatting command uses generic placeholders for the NIDs. Then, a tunefs step is performed: Step 1: Step 2: [root@lfs-mds-1-1 scratch1]# tunefs.lustre --print /dev/vg_scratch1/mdt Read previous values: Permanent disk data: exiting before disk write. |
| Comment by Johann Lombardi (Inactive) [ 09/Dec/11 ] |
|
Could you please also run tunefs.lustre on the MDT of scracth2? |
| Comment by Ashley Pittman (Inactive) [ 09/Dec/11 ] |
|
What options do you recommend for the writeconf? As well as the configuration data I assume they options --erase-params and --writeconf should both be set? |
| Comment by Dennis Nelson [ 09/Dec/11 ] |
|
Commands used: mkfs.lustre --mgsnode=127.0.0.2@tcp --failnode=127.0.0.2@tcp --fsname=scratch2 --mdt /dev/vg_scratch2/mdt tunefs.lustre --erase-params /dev/vg_scratch2/mgs [root@lfs-mds-2-1 ~]# tunefs.lustre --print /dev/vg_scratch2/mdt Read previous values: Permanent disk data: exiting before disk write. |
| Comment by Johann Lombardi (Inactive) [ 09/Dec/11 ] |
|
> What options do you recommend for the writeconf? writeconf can now be passed as a mount option. So i would unmount all clients, MDT and OSTs of scratch1 (not the shared mgs). Then mount the MDT again with -o writeconf and then the OSTs with the same mount option. Then once all targets are up and running, you can mount clients again. I also looked at the commands you used to format the MDS and everything looks good. It is still unclear why the resulting logs are bogus. |
| Comment by Dennis Nelson [ 09/Dec/11 ] |
|
Let me clarify something. The customer specified that each filesystem needed to be fully independent, Given that, there is not a common MDS. There is an MDS/MGT for each filesystem. The MGS services runs co-located on the MDS system. The MGT is a separate LVM from the MDT although they both reside on a single volume group. |
| Comment by Dennis Nelson [ 09/Dec/11 ] |
|
Another question, after looking at the tunefs.lustre output, Cliff suggested that the mgsnode and failover.node definitions were duplicates of each other and did not both need to be there. Now, you are saying that the commands are correct. We use the --servicenode syntax in our tunefs commands yet lustre.tunefs displays failover.node. Just to confirm, the mgsnode and failover.node (or servicenode) options must both be present, and the servicenode/failover.node entries are different ways of setting the same parameter? In our case where the mgs and the mds are always on the same server, mgsnode and servicenodes are identical but that would not always be the case. |
| Comment by Johann Lombardi (Inactive) [ 09/Dec/11 ] |
|
> Let me clarify something. The customer specified that each filesystem needed to be fully independent ... Understood. This should not change the procedure in any case. > Just to confirm, the mgsnode and failover.node (or servicenode) options must both be present Yes. The mgsnode(s) is the nid(s) that will be used by the MDT to connect to the MGS, while failover.node is what will be registered by the MDS to to the MDS and used by client nodes to reach the MDS. > the servicenode/failover.node entries are different ways of setting the same parameter? Correct. > In our case where the mgs and the mds are always on the same server, mgsnode and servicenodes are identical but that would not always be the case Exactly. It can (must) only be skipped if you run a combo MGT/MDT on the same device. |
| Comment by Cliff White (Inactive) [ 09/Dec/11 ] |
|
I was unaware you were using --servicenode instead of --failnode, that explains the discrepancy. |
| Comment by Dennis Nelson [ 09/Dec/11 ] |
|
Thanks. I need to give an update to my customer. At what point are we going to decide to do the writeconf and how confident are we that it will work when we do it? |
| Comment by Cliff White (Inactive) [ 09/Dec/11 ] |
|
Per Johann, go ahead and do the writeconf. We will attempt to replicate the issue in our lab with multiple nids. |
| Comment by Johann Lombardi (Inactive) [ 09/Dec/11 ] |
|
I think it would be interesting to collect debug logs of the MGS during the re-registration. $ lctl set_param subsystem_debug=mgs # only collect debug messages from the MGS And then run lctl dk > /tmp/logs once the MDS has been successfully mounted. We will try to reproduce on our side too. |
| Comment by Dennis Nelson [ 09/Dec/11 ] |
|
OK, one more question. You mentioned that the writeconf could be done as a mount option. Should I do it that way or should I use the script previously provided? Does it make any difference? Do you have a preference of how we do it? If we do it as a mount option, I think I would need more info on how that works. How do the mgsnode and servicenode options get added doing it with as a mount option? |
| Comment by Johann Lombardi (Inactive) [ 09/Dec/11 ] |
|
> OK, one more question. You mentioned that the writeconf could be done as a mount option. Right. > Should I do it that way or should I use the script previously provided? It is really as you prefer. If your tool supports writeconf, then you can just use it. On my side, I just find it very convenient to pass -o writeconf as a mount option. > Does it make any difference? With tunefs.lustre, you can also erase the paramaters & restore them. However i don't think we need to do this here since we don't intend to change anything (like a nid). > If we do it as a mount option, I think I would need more info on how that works. It really works as i mentioned in my comment earlier, unmount everything and then mount the MDT with -o writeconf and then the OSTs with the same mount option. > How do the mgsnode and servicenode options get added doing it with as a mount option? Those parameters are removed if you use the --erase-params (e.g. when you want to change one of the parameters). In this case, i don't think we need to do this and we just want to run a simple writeconf. BTW, if you use OST pools, be aware that running writeconf erases all pools information on the MGS (as well as any other parameters set via lctl conf_param). |
| Comment by Dennis Nelson [ 10/Dec/11 ] |
|
OK, Finally got the time scheduled to do the writeconf: [root@lfs-mds-1-1 ~]# pdsh -a modprobe lustre After the writeconf, failover works |
| Comment by Dennis Nelson [ 10/Dec/11 ] |
|
[root@fe1 ~]# cat /etc/modprobe.d/lustre.conf
[root@fe1 ~]# df |
| Comment by Dennis Nelson [ 12/Dec/11 ] |
|
After some more testing, I think we have also resolved the issue mounting the filesystems on the login clients. I had tried with the following in /etc/modprobe.d/lustre.conf: What I have found is that if I set it to this: The mounts succeed. It appears that I have 1 problem remaining: I cannot mount the production filesystems on the TDS (Test and Development System) clients: The TDS clients mount the TDS Lustre filesystems over the client ib1 ports. They are supposed to mount the production filesystems over ib0. The ib2 ports of the production lustre servers are connected into the ib0 fabric of the TDS cluster. TDS Client (r1i3n15): [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf
[root@r1i3n15 ~]# lctl list_nids [root@r1i3n15 ~]# cat /etc/fstab
When I attempt to mount, the mount command simply hangs with with very little in the client log file: Dec 12 16:26:28 r1i3n15 kernel: Lustre: MGC10.174.79.241@o2ib1: Reactivating import It appears to be communicating with the server because I initially, inadvertently used the wrong filesystem name and got this message: Dec 12 15:52:43 r1i3n15 kernel: Lustre: MGC10.174.79.241@o2ib1: Reactivating import It correctly responded that the lustre1-client profile does not exist. |
| Comment by Cliff White (Inactive) [ 12/Dec/11 ] |
|
since you have resolved the initial issue and this new problem is on a different set of servers, please close this bug and open up a new bug for the new issue. |
| Comment by Dennis Nelson [ 12/Dec/11 ] |
|
I'll be glad to open a new ticket for this but it is the same set of servers and it was referenced in my initial post. |
| Comment by Cliff White (Inactive) [ 12/Dec/11 ] |
|
I am sorry I am a bit confused as to which servers are which. Dec 12 15:52:43 r1i3n15 kernel: Lustre: MGC10.174.79.241@o2ib1: Reactivating import |
| Comment by Dennis Nelson [ 12/Dec/11 ] |
|
I understand about being confused. As I said, it is a very complex configuration. I think it is going to be a nightmare to support. I threw in the error message of the lustre1-client simply because I had made a mistake and put lustre1 instead of scratch1. To me, this indicates that it is communicating with the MGS since it was able to tell me that that the lustre1-client profile does not exist. When I use the right filesystem name, the mount just hangs, Here is the tunefs.lustre --print information after the writeconf. [root@lfs-mds-1-2 ~]# tunefs.lustre --print /dev/vg_scratch1/mdt Read previous values: Permanent disk data: exiting before disk write. |
| Comment by Johann Lombardi (Inactive) [ 12/Dec/11 ] |
|
I looked at the new configuration log (i.e. file log.client) and the nid setup now looks fine: #06 (088)add_uuid nid=10.174.31.241@o2ib(0x500000aae1ff1) 0: 1:10.174.31.241@o2ib while in the previous file: #06 (088)add_uuid nid=10.174.31.241@o2ib(0x500000aae1ff1) 0: 1:10.174.31.241@o2ib While the mount hangs, could you please try to collect import information (lctl get_param mdc.*.import) to check what nid we try to access? |
| Comment by Dennis Nelson [ 12/Dec/11 ] |
|
Note; The filesystems listed as lustre1 and lustre2 are the TDS lustre filesystems, not production. They use different servers. The problem filesystems are scratch1 and scratch2. [root@r1i3n15 ~]# lctl get_param mdc.*.import I notice that it says that it is using 10.174.31.241@o2ib. Currently, the MGS and the MDT are mounted on the other node (10.174.31..251). Also, it just says o2ib not o2ib1? Previously, I would not have worried about that but it seemed to make a difference on the production login clients (On the production clients, I tried defining the nid as o2ib0 and they would not mount, yet they did mount when the nid was defined as o2ib2). |
| Comment by Johann Lombardi (Inactive) [ 12/Dec/11 ] |
|
So the failover_nids list looks good. The client tries to reach the MDS via 10.174.31.241@o2ib and it should then try through 10.174.31.251@o2ib. Can you successfully ping 10.174.31.251@o2ib from r1i3n15? |
| Comment by Dennis Nelson [ 12/Dec/11 ] |
|
Sorry, I made a mistake. That is the problem. It is trying to connect through the wrong nid. Copying the original data from above: [root@r1i3n15 ~]# lctl list_nids [root@r1i3n15 ~]# cat /etc/fstab <file system> <mount point> <type> <options> <dump> <pass> The only path to these servers from this client is through the 10.174.79.xxx addresses. Why is it trying 10.174.31.xxx? There is no route for that subnet on these clients: [root@r1i3n15 ~]# netstat -rn |
| Comment by Johann Lombardi (Inactive) [ 13/Dec/11 ] |
|
I have no idea why lnet selected 10.174.31.241@o2ib/10.174.31.251@o2ib instead of 10.174.79.241@o2ib/10.174.79.251@o2ib. |
| Comment by Liang Zhen (Inactive) [ 13/Dec/11 ] |
|
Sorry I'm a little confused about this setting, and have a few questions:
Thanks |
| Comment by Dennis Nelson [ 14/Dec/11 ] |
|
No, these clients cannot lctl ping, or ping, the 10.174.31.241 address. That bid exists on the servers to support the scratch1 filesystems from the production clients. Yes, r1i3n15 can mount the TDS filesystem. |
| Comment by Dennis Nelson [ 14/Dec/11 ] |
|
I realized that I did not answer one question. There is only one MDS on the TDS filesystem and it has only one nid: [root@mds01 ~]# lctl list_nids [root@r1i3n15 ~]# netstat -rn As you can see, there is no route to 10.174.31.241. [root@r1i3n15 ~]# ping 10.174.31.241 [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf
[root@r1i3n15 ~]# df If I unmount the TDS filesystems and change the modprobe.d/lustre.conf file to only include the ib0 port: [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf
I cannot communicate with the MDS. I get this error: [root@r1i3n15 ~]# mount -at lustre Dec 14 12:17:23 r1i3n15 kernel: Lustre: 27084:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388172991791113 sent from MGC10.174.79.241@o2ib to NID 10.174.79.241@o2ib 0s ago has failed due to network error (5s prior to deadline). From what I can see, there is no indication of a network problem: [root@lfs-mds-1-1 ~]# ibstat [root@r1i3n15 ~]# ibping -G 0x0002c9030010c6b1 Yet, lctl ping fails: [root@r1i3n15 ~]# lctl ping 10.174.79.241@o2ib If I go back to the original configuration: [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf
[root@r1i3n15 ~]# lctl list_nids [root@r1i3n15 ~]# cat /etc/fstab
The TDS filesystems mount (lustre1, lustre2) and the production filesystems (scratch1, scratch2) just hang while performing the mount. |
| Comment by Liang Zhen (Inactive) [ 14/Dec/11 ] |
|
Here is my undestanding about your setting, please correct me if I was wrong: client TDS MDS Production MDS --------- --------- ------- rli3n15 mds01 lfs-mds-1-1 (scratch1) 10.174.96.64@o2ib0(ib1) 10.174.96.138@o2ib0 [y] 10.174.31.241@o2ib0 [n] 10.174.64.65@o2ib1(ib0) 10.174.79.241@o2ib1 [y] [y] == [yes], means we can reach that NID via "lctl ping" from rli3n15 [n] == [no], means we can not reach that NID via "lctl ping" from rli3n15 So between rli3n15 and lfs-mds-1-1:
I think if you try to mount scratch1 from rli3n15, it will firstly look at all N I would suggest to try with this one rli3n15: and try to mount scratch1,2, if it can work, I would suggest to use configuratio
client TDS MDS Production MDS
--------- --------- -------
rli3n15 mds01 lfs-mds-1-1 (scratch1)
10.174.96.64@o2ib3(ib1) 10.174.96.138@o2ib3 [y]
10.174.64.65@o2ib1(ib0) 10.174.79.241@o2ib1 [y]
10.174.31.241@o2ib0 [y]
The only change we made here is: |
| Comment by Dennis Nelson [ 14/Dec/11 ] |
|
OK, I tried the following: [root@r1i3n15 ~]# cat /etc/modprobe.d/lustre.conf
[root@r1i3n15 ~]# lctl list_nids [root@r1i3n15 ~]# cat /etc/fstab Now, the production filesystems (scratch1, scratch2) mount and the TDS filesystems fail to mount. [root@r1i3n15 ~]# mount -at lustre |
| Comment by Liang Zhen (Inactive) [ 14/Dec/11 ] |
|
have you also changed MDS/MGS and other servers in TDS filesystem to o2ib3 as well (i.e: mds01)? Because you are using o2ib3 as TDS network number, so all clients and servers on TDS network should use that network number (o2ib3). |
| Comment by Dennis Nelson [ 14/Dec/11 ] |
|
Ah, no. I will have to schedule some time with the customer to do that. I have one node that is not currently in the job queue that I can use for testing. To take the whole filesystem down, I will have to schedule it. I will get that scheduled today. |
| Comment by Dennis Nelson [ 14/Dec/11 ] |
|
I made the change on the TDS servers and had to perform a writeconf in order to get it mounted up again. Everything seems to be working now. Thank you very much for all of your help! |
| Comment by Peter Jones [ 14/Dec/11 ] |
|
Dennis Thanks for the update. So can we close both this ticket and LU890? Peter |
| Comment by Dennis Nelson [ 14/Dec/11 ] |
|
Yes. I already suggested that |
| Comment by Peter Jones [ 14/Dec/11 ] |
|
Great - thanks Dennis! |