[LU-5173] During the shutdown\reboot process the system hangs on filesytems umount. Created: 11/Jun/14  Updated: 26/Jun/14  Resolved: 26/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Supporto Lustre Jnet2000 (Inactive) Assignee: Cliff White (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

redhat 5.7
lustre-1.8.9-2.6.18_348.3.1.el5_g7880158


Attachments: HTML File lustremount     HTML File messages     Text File putty.log    
Rank (Obsolete): 14360

 Description   

When try to shutdown or reboot the system freze on the umount on the filesystem.

regards



 Comments   
Comment by Cliff White (Inactive) [ 11/Jun/14 ]

Please provide us with some more detail on this incident. Are you referred to the umount of the clients, or the servers? Please attach system logs /var/log/messages for the affected machines for the time period when you are rebooting the system. Please provide us with some details of exactly how you are doing the shutdown.

Comment by Supporto Lustre Jnet2000 (Inactive) [ 13/Jun/14 ]

log client

Comment by Supporto Lustre Jnet2000 (Inactive) [ 13/Jun/14 ]

The problem is at the clients. I have attached the log of the client.

regards

Comment by Cliff White (Inactive) [ 13/Jun/14 ]

Please provide more details of what problem you actually observed. Please provide a timestamp so I can reference the logs.
When, exactly did this issue occur? What error messages concerned you?

Comment by Supporto Lustre Jnet2000 (Inactive) [ 16/Jun/14 ]

messages of umount

Comment by Supporto Lustre Jnet2000 (Inactive) [ 16/Jun/14 ]

hi, i hava attached the new log file

regards

Comment by Cliff White (Inactive) [ 16/Jun/14 ]

You are furnishing quite a bit of data, but no information.
Please:

  • The exact time the incident occured.
  • The exact error messages that concern you.

Thank you.

Comment by Supporto Lustre Jnet2000 (Inactive) [ 17/Jun/14 ]

Sorry, but we don't have many more infos. The system hangs with umounting filestsems message. The attached putty.log contains all the console messages.

Comment by Cliff White (Inactive) [ 17/Jun/14 ]

Do you know what time the system hung? Do you know what day the system hung?
I see no such hang in the console messages you have furnished. I do see that lustremount had to do a force umount, which is not an issue.

2014-06-11T09:58:30.544811+02:00 osiride-lp-002 lustre_mclient: Ambiente macchina osiride-lp-002.utenze.bankit.it ERROR Lustre umount - force umount between 30 sec

Was 09:58:30 on 2014-06-11 when you had an issue?

These messages are from the forced umount and are harmless

2014-06-11T11:15:04.956170+02:00 osiride-lp-002 kernel: LustreError: 9669:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
2014-06-11T11:15:04.956192+02:00 osiride-lp-002 kernel: LustreError: 9669:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
2014-06-11T11:15:04.961406+02:00 osiride-lp-002 kernel: LustreError: 9669:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway
2014-06-11T11:15:04.961417+02:00 osiride-lp-002 kernel: LustreError: 9669:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108
2014-06-11T11:15:04.972380+02:00 osiride-lp-002 kernel: Lustre: client home-client(ffff810fe05c4400) umount complete

Were you having issues on 2014-06-11 at 11:15:04?

If this issue re-appears, please attempt to determine a timestamp, collect system logs from all servers for the period 4 hours prior to the timestamp and collect relevant log entries from any impacted clients.

Comment by Gabriele Paciucci (Inactive) [ 17/Jun/14 ]

I talked with the partner. The partner will look deep in the unmount script and in the position in the run level. Please hold on until the partner come back with more details.

Comment by Supporto Lustre Jnet2000 (Inactive) [ 26/Jun/14 ]

Dear Gabriele, the problem is not related to the runlevel, but is located in the /etc/init.d/lustremount script that doesn’t respect the rigth skeleton for the init files. According to redhat sample script, after a successuful start the file /var/lock/subsys/<servicename> must be created; the same file must be removed after a stop.

See /usr/share/doc/initscripts-<version>/sysvinitfiles

I fixed the script and now the shutdown/reboot works fine.
Ciao ! See you at the next event !
Demetrio

Comment by Gabriele Paciucci (Inactive) [ 26/Jun/14 ]

thank you Demetrio.
Cliff, could you please close this ticket?

Comment by Peter Jones [ 26/Jun/14 ]

Thanks Demetrio/Gabriele.

Generated at Sat Feb 10 01:49:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.