[LU-8297] mount with missing key leads to leaked MGC device and LBUG Created: 13/Jun/16  Updated: 26/Jul/16  Resolved: 26/Jul/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mount.dk    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Mounting a kerberized Lustre FS but with the needed lustre_root key not installed on the client (e3) leak a partially initialized MGC device:

e3:~# LOAD=1 /usr/lib64/lustre/tests/llmount.sh
...
e3:~# lctl set_param debug='+mount sec trace'
debug=+mount sec trace
error: set_param: setting /proc/sys/lnet/debug=+mount sec trace: Invalid argument
e3:~# lctl set_param debug='+sec trace'
debug=+sec trace
e3:~# lctl set_param debug_mb=64
debug_mb=64
e3:~# lctl clear; mount e1@tcp:/lustre /mnt/lustre -t lustre -o user_xattr,flock,mgssec=krb5p; lctl dk > mount.dk
mount.lustre: mount e1@tcp:/lustre at /mnt/lustre failed: Connection refused
e3:~# lctl dl
  0 UP mgc MGC192.168.122.181@tcp 93abf5fe-744b-364d-3106-e11ed3e6c2ca 4
e3:~# mount e1@tcp:/lustre /mnt/lustre -t lustre -o user_xattr,flock,mgssec=krb5p

Message from syslogd@e3 at Jun 13 13:00:22 ...
 kernel:LustreError: 4678:0:(import.c:553:import_select_connection()) ASSERTION( dlmexp != ((void *)0) ) failed:

Message from syslogd@e3 at Jun 13 13:00:22 ...
 kernel:LustreError: 4678:0:(import.c:553:import_select_connection()) LBUG
[ 3061.041621] LustreError: 4518:0:(gss_keyring.c:791:gss_sec_lookup_ctx_kr()) failed request key: -126
[ 3061.044555] LustreError: 4518:0:(sec.c:440:sptlrpc_req_get_ctx()) req ffff8800365d4000: fail to get context
[ 3061.047399] LustreError: 4518:0:(obd_mount.c:467:lustre_start_mgc()) connect failed -111
[ 3061.049724] LustreError: 4518:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount  (-111)
[ 3132.373912] LustreError: 4678:0:(import.c:553:import_select_connection()) ASSERTION( dlmexp != ((void *)0) ) failed: 
[ 3132.377365] LustreError: 4678:0:(import.c:553:import_select_connection()) LBUG
[ 3132.379678] Pid: 4678, comm: mount.lustre
[ 3132.379717] 
Call Trace:
[ 3132.379755]  [<ffffffffa04e1853>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[ 3132.379765]  [<ffffffffa04e1df5>] lbug_with_loc+0x45/0xc0 [libcfs]
[ 3132.379806]  [<ffffffffa08a7c3d>] ptlrpc_connect_import+0x133d/0x1380 [ptlrpc]
[ 3132.379834]  [<ffffffffa0882258>] ptlrpc_recover_import+0x258/0x550 [ptlrpc]
[ 3132.379863]  [<ffffffffa089d8e8>] ? ptlrpc_pinger_ir_up+0x78/0x90 [ptlrpc]
[ 3132.379868]  [<ffffffffa09d38ce>] ? mgc_import_event+0x10e/0x2c0 [mgc]
[ 3132.379895]  [<ffffffffa08aae48>] ptlrpc_reconnect_import+0x78/0x1e0 [ptlrpc]
[ 3132.379908]  [<ffffffffa04f2167>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 3132.379913]  [<ffffffffa09d992a>] mgc_set_info_async+0x6ea/0x1620 [mgc]
[ 3132.379923]  [<ffffffffa04f2167>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[ 3132.379957]  [<ffffffffa0671370>] lustre_start_mgc+0x1570/0x2a90 [obdclass]
[ 3132.379980]  [<ffffffffa0673266>] lustre_fill_super+0x246/0xb80 [obdclass]
[ 3132.380000]  [<ffffffffa0673020>] ? lustre_fill_super+0x0/0xb80 [obdclass]
[ 3132.380027]  [<ffffffff811e1ded>] mount_nodev+0x4d/0xb0
[ 3132.380048]  [<ffffffffa06695e8>] lustre_mount+0x38/0x60 [obdclass]
[ 3132.380052]  [<ffffffff811e2799>] mount_fs+0x39/0x1b0
[ 3132.380056]  [<ffffffff811fe05f>] vfs_kern_mount+0x5f/0xf0
[ 3132.380059]  [<ffffffff812005ae>] do_mount+0x24e/0xa40
[ 3132.380063]  [<ffffffff8116dffe>] ? __get_free_pages+0xe/0x50
[ 3132.380067]  [<ffffffff81200e36>] SyS_mount+0x96/0xf0
[ 3132.380072]  [<ffffffff81645ec9>] system_call_fastpath+0x16/0x1b
[ 3132.380073] 
[ 3132.380345] Kernel panic - not syncing: LBUG
[ 3132.381034] CPU: 0 PID: 4678 Comm: mount.lustre Tainted: G           OE  ------------   3.10.0-327.13.1.el7.x86_64 #1


 Comments   
Comment by Gerrit Updater [ 17/Jun/16 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/20851
Subject: LU-8297 obd: release MGC device if connect fails
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 33b7da4d7a9f38a3807ae489e7f8a5d17369c415

Comment by Gerrit Updater [ 20/Jul/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20851/
Subject: LU-8297 obd: release MGC device if connect fails
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fb5a79bf1e060a753a2e32e9a7b1f03fa4915ee2

Comment by Joseph Gmitter (Inactive) [ 26/Jul/16 ]

Landed to master for 2.9.0

Generated at Sat Feb 10 02:16:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.