[LU-4560] Network error on samba share with lustre backend Created: 29/Jan/14  Updated: 27/Feb/14  Resolved: 27/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Critical
Reporter: Supporto Lustre Jnet2000 (Inactive) Assignee: Emoly Liu
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Operating system
Red Hat Enterprise Linux Server release 5.7 (Tikanga)

Samba
samba-client-3.0.33-3.39.el5_8
samba-3.0.33-3.39.el5_8
samba-common-3.0.33-3.39.el5_8

Lustre
lustre-modules-1.8.7-wc1_2.6.18_274.3.1.el5xen
lustre-1.8.7-wc1_2.6.18_274.3.1.el5xen


Attachments: File log-rete-occupata.tgz     Text File scp001.log    
Rank (Obsolete): 12450

 Description   

Our users with windows 7 64bit and Office 32bit, often receive the message "network is busy" saving .doc files over samba share (lustre backend). Sometimes it's sufficient to press try again, sometimes the file must be saved locally. Users with windows xp 32 bit and office 2007 32bit do not experience this problem. When the problem occur, samba seem to loose track of the opened files We attach files with samba log at level 4 and 5 and 10.

Regards



 Comments   
Comment by Peter Jones [ 29/Jan/14 ]

Hi Emoly

Could you please look into this one?

Thanks

Peter

Comment by Malcolm Cowe (Inactive) [ 03/Feb/14 ]

While we are not ruling out a potential issue with Lustre, there are a number of potential causes for this behaviour that relate to Samba or to Windows (sometimes both). I realise that you probably have far more experience with Samba and Windows networking than I do, but it would be helpful if we could narrow down the root cause by eliminating possible non-Lustre issues first.

There is a widely reported issue with Windows 7 and Office 2007/2010 that matches the symptoms described in the ticket. It is believed that disabling SMB2 protocol on both client and server may fix this. For example:

http://community.spiceworks.com/topic/203084-the-network-is-busy-in-excel

with proposed solution to disable SMB2:

http://social.technet.microsoft.com/Forums/office/en-US/46c082ad-2455-4120-beea-c69ff2219ed9/word-2010-slow-opening-files-on-network-share?forum=officeitproprevious

I also found that there is a bug with older builds of Samba on RHEL, as described here:

http://lists.samba.org/archive/samba/2009-March/146830.html

(Read through the whole thread – very interesting).

Other obvious things to look at are name service resolution (/etc/hosts – e.g. make sure the first line is the localhost entry, DNS, NetBIOS, Winbind, etc.) and date/time synchronisation. Slow networks and server loading can also lead to slowness symptoms. Are the files in question shared with many users, or heavily accessed by lots of individuals or clients?

Opportunistic locking has been known to create problems with share exports. I don't have a lot of experience with oplocks, but the Samba documentation is comprehensive:

http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/locking.html

I mention this only because the logs have some oplocks "permission denied" errors.

If you have security = server, there can be problems with the responsiveness of the Samba server when there are a lot of active users. If the connection is interrupted for any reason, then the client has to completely disconnect and reconnect to the server in order to reestablish the share. Security = server also adds significant overhead to the PDC because the connection must remain active for the duration of a user's session.

Comment by Emoly Liu [ 07/Feb/14 ]

Malcolm, thanks for your information. I just get back to work after a vacation and start to look into this ticket.

Does this samba error only happen on windows+office specially? Is that possible to reproduce it on some linux system?

Thanks.

Comment by Malcolm Cowe (Inactive) [ 07/Feb/14 ]

Emoly,

There may be many causes, but the Windows+Office combination is commonly reported. I am not sure if there is a linux-only reproducer but I think that a windows client is required.

Comment by Emoly Liu [ 10/Feb/14 ]

Malcolm, I built my samba testing network, including one centos samba server and one windows7 client. Do you know how to reproduce this error in detail? I tried to save some office documents on lustre filesystem and modify them, but no errors happen.

Comment by Malcolm Cowe (Inactive) [ 10/Feb/14 ]

Emoly, I don't have a reproducer for this error. When I was searching for a cause, there are many reports of similar symptoms but from a range of configurations. It might be related to locking or the oplocks feature, where multiple users try to access the same files at the same time. Or it could be one of the other causes I put in the comment. I suspect that the problem is most likely to occur at scale, when there is more than one client connected.

Comment by Emoly Liu [ 11/Feb/14 ]

Hmm, as you said, this problem is most likely to occur at scale, so it's very hard for me to reproduce it in my simple local samba testing environment. BTW, I am not so familiar with windows and samba, so could anyone provide some lustre logs so that I can analyze if this issue is related to lustre?

Comment by Gabriele Paciucci (Inactive) [ 27/Feb/14 ]

Upgrading to lustre 1.8.9 and samba 3.6, the problem seems to be fixed. Please close this issue.

Comment by Peter Jones [ 27/Feb/14 ]

ok - thanks Gabriele!

Generated at Sat Feb 10 01:43:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.