[LU-4943] Client Failes to mount filesystem Created: 23/Apr/14  Updated: 12/Apr/16  Resolved: 04/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Major
Reporter: Mahmoud Hanafi Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: patch
Environment:

lustre-client-modules-2.4.1-6nasC OFED3.5
server lustre2.4.1 and 2.1.5 OFED1.5.4


Attachments: File LU-4943-mgc-cleanup.tar.gz     File lu-4943.log.tar.bz2     File r403i1n1.1398271435.out.gz    
Issue Links:
Related
is related to LU-5582 mount.lustre failed due to presence o... Resolved
Severity: 3
Rank (Obsolete): 13682

 Description   

Upgrading to ofed3.5 we have started to get random mount failures during client boot. The filesystem that failed to mount is random. Here it client side debug output.

0000000:00000001:1.0:1398271322.986806:0:7677:0:(mgc_request.c:947:mgc_enqueue()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
10000000:01000000:1.0:1398271322.986808:0:7677:0:(mgc_request.c:1852:mgc_process_log()) Can't get cfg lock: -5
10000000:00000001:1.0:1398271322.986810:0:7677:0:(mgc_request.c:125:config_log_get()) Process entered
10000000:00000001:1.0:1398271322.986811:0:7677:0:(mgc_request.c:129:config_log_get()) Process leaving (rc=0 : 0 : 0)
10000000:00000001:1.0:1398271322.986813:0:7677:0:(mgc_request.c:1713:mgc_process_cfg_log()) Process entered
10000000:00000001:1.0:1398271322.986815:0:7677:0:(mgc_request.c:1774:mgc_process_cfg_log()) Process leaving via out_pop (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
10000000:00000001:1.0:1398271322.986818:0:7677:0:(mgc_request.c:1811:mgc_process_cfg_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
10000000:01000000:1.0:1398271322.986819:0:7677:0:(mgc_request.c:1871:mgc_process_log()) MGC10.151.25.171@o2ib: configuration from log 'nbp3-client' failed (-5).
10000000:00000001:1.0:1398271322.986822:0:7677:0:(mgc_request.c:1883:mgc_process_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
10000000:00000001:1.0:1398271322.986824:0:7677:0:(mgc_request.c:136:config_log_put()) Process entered
10000000:00000001:1.0:1398271322.986825:0:7677:0:(mgc_request.c:160:config_log_put()) Process leaving
10000000:00000001:1.0:1398271322.986826:0:7677:0:(mgc_request.c:1982:mgc_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00000020:00000001:1.0:1398271322.986829:0:7677:0:(obd_class.h:714:obd_process_config()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)
00000020:00000001:1.0:1398271322.986830:0:7677:0:(lustre_cfg.h:214:lustre_cfg_len()) Process entered
00000020:00000001:1.0:1398271322.986831:0:7677:0:(lustre_cfg.h:220:lustre_cfg_len()) Process leaving (rc=176 : 176 : b0)
00000020:00000001:1.0:1398271322.986833:0:7677:0:(lustre_cfg.h:259:lustre_cfg_free()) Process leaving
00000020:02020000:1.0:1398271322.986834:0:7677:0:(obd_mount.c:119:lustre_process_log()) 15c-8: MGC10.151.25.171@o2ib: The configuration from log 'nbp3-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
00000020:00000001:1.0:1398271323.010020:0:7677:0:(obd_mount.c:122:lustre_process_log()) Process leaving (rc=18446744073709551611 : -5 : fffffffffffffffb)

Complete Debug output is attached



 Comments   
Comment by Peter Jones [ 23/Apr/14 ]

Amir is looking into this issue

Comment by Amir Shehata (Inactive) [ 24/Apr/14 ]

From the logs:

00000100:00020000:1.0:1398271322.963863:0:7677:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8807f6f84400 x1466193689968652/t0(0) o101->MGC10.151.25.171@o2ib@10.151.25.171@o2ib:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1

This indicates that requests being sent from the client are queued, but never sent within a specified time limit.  Thus -5 (-EIO) is returned.

Looking further back in the log

00000800:00000100:0.0:1398271416.231812:0:7154:0:(o2iblnd_cb.c:2844:kiblnd_cm_callback()) 10.151.25.171@o2ib: ADDR ERROR -110
00000800:00000100:0.0:1398271416.231822:0:7154:0:(o2iblnd_cb.c:2072:kiblnd_peer_connect_failed()) Deleting messages for 10.151.25.171@o2ib: connection failed
...
00000100:00080000:0.0:1398271370.987478:0:7628:0:(import.c:1187:ptlrpc_connect_interpret()) recovery of MGS on MGC10.151.25.171@o2ib_0 failed (-110)

This seems to indicate that the connection between client and server is timing out, -110 (-ETIMEDOUT)

Since clients are using OFED3.5 and servers are using OFED 1.5.4, could there be incompatibility between the versions?

Comment by Mahmoud Hanafi [ 24/Apr/14 ]

Agreed this could be an issue between ofed3.5 and ofed1.5.4.

But there is a problem with how the failures is handled by lustre. When the filesystem failed to mount the mgc is put into a Stale state, which prevents any further mount attempts. The only way to recover from that is umonting all the filesystems to clear the Stale mgc, then remounting works.

Comment by Peter Jones [ 24/Apr/14 ]

Bobijam

Could you please advise on this one?

Thanks

Peter

Comment by Zhenyu Xu [ 25/Apr/14 ]

what the address of the MGS? Is 10.151.25.172@o2ib a valid server NID?

Comment by Mahmoud Hanafi [ 25/Apr/14 ]

10.151.25.172? the MGS nid is 10.151.25.171@o2ib. it is a valid MGS server NID address.

Comment by Jay Lan (Inactive) [ 25/Apr/14 ]

10.151.25.172@o2ib is also a valid NID. It is a client.

Comment by Zhenyu Xu [ 28/Apr/14 ]

From the log I can see that client nbp3-client cannot mount because of the first lustre_fill_super() failed for -5 (I could not find out the root cause of this failure, only know that the client MGC cannot get its lock from MGS to process MGS llogs), and this mount process has registered the mgc device, and later mount fails before of this existing mgc device attached.

$ grep lustre_fill_super r403i1n1.1398271435.out 
18:00000020:00000001:0.0:1398271265.936424:0:7677:0:(obd_mount.c:1223:lustre_fill_super()) Process entered
19:00000020:01200004:0.0:1398271265.936425:0:7677:0:(obd_mount.c:1225:lustre_fill_super()) VFS Op: sb ffff8807f6f9e000
29:00000020:01000004:0.0:1398271265.936437:0:7677:0:(obd_mount.c:1250:lustre_fill_super()) Mounting client nbp3-client
37282:00000020:00000001:1.0:1398271370.987877:0:7677:0:(obd_mount.c:1285:lustre_fill_super()) Process leaving via out (rc=18446744073709551611 : -5 : 0xfffffffffffffffb)
37283:00000020:00020000:1.0:1398271370.987879:0:7677:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-5)
38596:00000020:00000001:0.0:1398271375.047962:0:7703:0:(obd_mount.c:1223:lustre_fill_super()) Process entered
38597:00000020:01200004:0.0:1398271375.047964:0:7703:0:(obd_mount.c:1225:lustre_fill_super()) VFS Op: sb ffff880822c25c00
38607:00000020:01000004:0.0:1398271375.047985:0:7703:0:(obd_mount.c:1250:lustre_fill_super()) Mounting client nbp3-client
38656:00000020:00000001:0.0:1398271375.080789:0:7703:0:(obd_mount.c:1261:lustre_fill_super()) Process leaving via out (rc=18446744073709551599 : -17 : 0xffffffffffffffef)
38657:00000020:00020000:0.0:1398271375.080790:0:7703:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-17)
41938:00000020:00000001:0.0:1398271385.134777:0:7706:0:(obd_mount.c:1223:lustre_fill_super()) Process entered
41939:00000020:01200004:0.0:1398271385.134780:0:7706:0:(obd_mount.c:1225:lustre_fill_super()) VFS Op: sb ffff8807eaaa8c00
41949:00000020:01000004:0.0:1398271385.134800:0:7706:0:(obd_mount.c:1250:lustre_fill_super()) Mounting client nbp3-client
41998:00000020:00000001:0.0:1398271385.167599:0:7706:0:(obd_mount.c:1261:lustre_fill_super()) Process leaving via out (rc=18446744073709551599 : -17 : 0xffffffffffffffef)
41999:00000020:00020000:0.0:1398271385.167601:0:7706:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-17)
48640:00000020:00000001:0.0:1398271405.222478:0:7709:0:(obd_mount.c:1223:lustre_fill_super()) Process entered
48641:00000020:01200004:0.0:1398271405.222480:0:7709:0:(obd_mount.c:1225:lustre_fill_super()) VFS Op: sb ffff88083c939400
48651:00000020:01000004:0.0:1398271405.222501:0:7709:0:(obd_mount.c:1250:lustre_fill_super()) Mounting client nbp3-client
48700:00000020:00000001:0.0:1398271405.255289:0:7709:0:(obd_mount.c:1261:lustre_fill_super()) Process leaving via out (rc=18446744073709551599 : -17 : 0xffffffffffffffef)
48701:00000020:00020000:0.0:1398271405.255291:0:7709:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-17)
58847:00000020:00000001:0.0:1398271435.314276:0:7712:0:(obd_mount.c:1223:lustre_fill_super()) Process entered
58848:00000020:01200004:0.0:1398271435.314278:0:7712:0:(obd_mount.c:1225:lustre_fill_super()) VFS Op: sb ffff8807f6f76800
58858:00000020:01000004:0.0:1398271435.314300:0:7712:0:(obd_mount.c:1250:lustre_fill_super()) Mounting client nbp3-client
85984:00000020:00000001:1.0:1398271435.639442:0:7712:0:(obd_mount.c:1285:lustre_fill_super()) Process leaving via out (rc=0 : 0 : 0x0)

We should handle lustre_fill_super() error path to release this device on error. And you can work around this issue to bring down this client (rmmod all lustre modules) then start lustre modules again and mount this client. I think you don't need to bring down the whole system.

Comment by Zhenyu Xu [ 28/Apr/14 ]

b2_4 patch tracking at http://review.whamcloud.com/10127

Comment by Zhenyu Xu [ 28/Apr/14 ]

master patch tracking at http://review.whamcloud.com/10129

Comment by parinay v kondekar (Inactive) [ 29/Apr/14 ]
  • To test the patch for b2_4 ( Lustre version: 2.4.3-g35d73cf-CHANGED-2.6.32-358.23.2.el6.x86_64) , I tried to reproduce the issue with the help of "lctl set_param fail_val=120 ; lctl set_param fail_loc=0x8000050". This is to introduce the "first lustre_fill_super() failed for -5"
  • The setup is, 2 MDS/MDT, 2 OSS/OST, 1 client, mounting the two file systems ( namely lustre, lustre1)
  • The first file system lustre is mounted. For the second file system, lustre1, before executing "mount", I set fail_val and fail_loc as mentioned above.
  • what happens thereafter, is through the logs below,
Apr 28 23:49:09 osh-1 kernel: Lustre: Mounted lustre-client
Apr 28 23:49:35 osh-1 kernel: LustreError: 29991:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 50c sleeping for 120000ms
Apr 28 23:49:41 osh-1 kernel: LustreError: 18261:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8800181b2400 x1466653941460808/t0(0) o101->MGC192.168.177.145@tcp@192.168.177.145@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Apr 28 23:50:32 osh-1 kernel: LustreError: 18261:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff8800181b2400 x1466653941460812/t0(0) o101->MGC192.168.177.145@tcp@192.168.177.145@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
Apr 28 23:50:32 osh-1 kernel: LustreError: 15c-8: MGC192.168.177.145@tcp: The configuration from log 'lustre1-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Apr 28 23:50:32 osh-1 kernel: LustreError: 18261:0:(llite_lib.c:1043:ll_fill_super()) Unable to process log: -5
Apr 28 23:50:32 osh-1 kernel: Lustre: Unmounted lustre1-client
Apr 28 23:51:35 osh-1 kernel: LustreError: 29991:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 50c awake
Apr 28 23:51:35 osh-1 kernel: LustreError: 18261:0:(obd_mount.c:1293:lustre_fill_super()) Unable to mount  (-5)
Apr 28 23:51:36 osh-1 kernel: LustreError: 18278:0:(genops.c:320:class_newdev()) Device MGC192.168.177.145@tcp already exists at 6, won't add
Apr 28 23:51:36 osh-1 kernel: LustreError: 18278:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.177.145@tcp of type mgc : -17
Apr 28 23:51:36 osh-1 kernel: LustreError: 18278:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.177.145@tcp attach error -17
Apr 28 23:51:36 osh-1 kernel: LustreError: 18278:0:(obd_mount.c:1293:lustre_fill_super()) Unable to mount  (-17)
Apr 28 23:51:38 osh-1 kernel: LustreError: 18280:0:(genops.c:320:class_newdev()) Device MGC192.168.177.145@tcp already exists at 6, won't add
Apr 28 23:51:38 osh-1 kernel: LustreError: 18280:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.177.145@tcp of type mgc : -17
Apr 28 23:51:38 osh-1 kernel: LustreError: 18280:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.177.145@tcp attach error -17
Apr 28 23:51:38 osh-1 kernel: LustreError: 18280:0:(obd_mount.c:1293:lustre_fill_super()) Unable to mount  (-17)
Apr 28 23:51:39 osh-1 kernel: LustreError: 18282:0:(genops.c:320:class_newdev()) Device MGC192.168.177.145@tcp already exists at 6, won't add
Apr 28 23:51:39 osh-1 kernel: LustreError: 18282:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.177.145@tcp of type mgc : -17
Apr 28 23:51:39 osh-1 kernel: LustreError: 18282:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.177.145@tcp attach error -17
Apr 28 23:51:39 osh-1 kernel: LustreError: 18282:0:(obd_mount.c:1293:lustre_fill_super()) Unable to mount  (-17)
Apr 28 23:51:41 osh-1 kernel: LustreError: 18284:0:(genops.c:320:class_newdev()) Device MGC192.168.177.145@tcp already exists at 6, won't add
Apr 28 23:51:41 osh-1 kernel: LustreError: 18284:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.177.145@tcp of type mgc : -17
Apr 28 23:51:41 osh-1 kernel: LustreError: 18284:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.177.145@tcp attach error -17
Apr 28 23:51:48 osh-1 kernel: Lustre: Mounted lustre1-client

Thanks

Comment by Zhenyu Xu [ 29/Apr/14 ]

updated the patch

commit message
    LU-4943 obdclass: detach MGC dev on error
    
    lustre_start_mgc() creates MGC device, if error happens later on
    ll_fill_super(), this device is still attached, and later mount
    fails by keep complaining that the MGC device's already in the
    client node.
    
    It turns out that the device was referenced by mgc config llog data
    which is arranged in the mgc lock requeue threadn for re-trying to
    get mgc lock, and in normal case, this llog reference only released
    in mgc_blocking_ast() when the system is umount.
    
    This patch handles the error path when mgc connect fails, it releases
    the config llog data reference in config_log_end() since there will
    be no mgc_blocking_ast() happen in this case.
Comment by parinay v kondekar (Inactive) [ 06/May/14 ]
  • I tried latest patch set (6). The logs,
     66 10000000:01000000:0.0:1399359209.816771:0:9520:0:(mgc_request.c:1864:mgc_process_log()) Can't get cfg lock: -5
     67 10000000:01000000:0.0:1399359209.816775:0:9520:0:(mgc_request.c:1883:mgc_process_log()) MGC192.168.177.145@tcp: configuration from log 'lustre1-client' failed (-5).
     68 00000020:02020000:0.0:1399359209.816779:0:9520:0:(obd_mount.c:119:lustre_process_log()) 15c-8: MGC192.168.177.145@tcp: The configuration from log 'lustre1-client' failed (-5). This may be the result of commun    ication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
     69 00000080:00020000:0.0:1399359209.816787:0:9520:0:(llite_lib.c:1043:ll_fill_super()) Unable to process log: -5
     70 10000000:01000000:0.0:1399359209.816800:0:9520:0:(mgc_request.c:147:config_log_put()) dropping config log lustre1-cliir
     71 10000000:01000000:0.0:1399359209.816802:0:9520:0:(mgc_request.c:147:config_log_put()) dropping config log lustre1-client
     72 10000000:01000000:0.0:1399359209.816803:0:9520:0:(mgc_request.c:419:config_log_end()) end config log lustre1-client (0)
     73 00000080:02000400:0.0:1399359209.816921:0:9520:0:(llite_lib.c:1132:ll_put_super()) Unmounted lustre1-client
     74 00000020:01000000:0.0:1399359209.816924:0:9520:0:(obd_config.c:911:class_del_profile()) Del profile lustre1-client
     75 00000020:01000004:0.0:1399359209.817022:0:9520:0:(obd_mount.c:785:lustre_common_put_super()) dropping sb ffff88001bfea800
     76 00000100:00080000:0.0:1399359209.817026:0:9520:0:(pinger.c:499:ptlrpc_pinger_del_import()) removing pingable import 547e172a-7802-ef66-bd18-f473852ac6b1->MGS
     77 00010000:00080000:0.0:1399359227.816973:0:9416:0:(ldlm_request.c:1311:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 25600)  req@ffff88000ee7ac00 x1467333992251936/t0(0) o400->lustre-OST0    001-osc-ffff880019eaac00@192.168.177.129@tcp:28/4 lens 224/192 e 0 to 0 dl 1399359234 ref 1 fl Rpc:RN/0/0 rc 0/0
     78 00010000:00080000:0.0:1399359227.816992:0:9416:0:(ldlm_request.c:1311:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 25600)  req@ffff88000ee7a800 x1467333992251932/t0(0) o400->lustre-OST0    000-osc-ffff880019eaac00@192.168.177.129@tcp:28/4 lens 224/192 e 0 to 0 dl 1399359234 ref 1 fl Rpc:RN/0/0 rc 0/0
     79 00010000:00080000:0.0:1399359252.816299:0:9416:0:(ldlm_request.c:1311:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 25600)  req@ffff880006505800 x1467333992251952/t0(0) o400->lustre-OST0    001-osc-ffff880019eaac00@192.168.177.129@tcp:28/4 lens 224/192 e 0 to 0 dl 1399359259 ref 1 fl Rpc:RN/0/0 rc 0/0
     80 00010000:00080000:0.0:1399359252.816315:0:9416:0:(ldlm_request.c:1311:ldlm_cli_update_pool()) @@@ Zero SLV or Limit found (SLV: 0, Limit: 25600)  req@ffff88000ee7a800 x1467333992251948/t0(0) o400->lustre-OST0    000-osc-ffff880019eaac00@192.168.177.129@tcp:28/4 lens 224/192 e 0 to 0 dl 1399359259 ref 1 fl Rpc:RN/0/0 rc 0/0
     81 00000001:00020000:0.0:1399359272.816677:0:9414:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 50c awake
     82 00000100:00080000:0.0:1399359272.816729:0:9414:0:(import.c:816:ptlrpc_connect_interpret()) MGC192.168.177.145@tcp: connect to target with instance 0
     83 10000000:01000000:0.0:1399359272.816738:0:9414:0:(mgc_request.c:1160:mgc_import_event()) import event 0x808005
     84 00000100:00080000:0.0:1399359272.816741:0:9414:0:(import.c:871:ptlrpc_connect_interpret()) ffff88001d374800 MGS: changing import state from CONNECTING to FULL
     85 10000000:01000000:0.0:1399359272.816745:0:9414:0:(mgc_request.c:1160:mgc_import_event()) import event 0x808004
     86 00000100:00080000:0.0:1399359272.816756:0:9414:0:(pinger.c:239:ptlrpc_pinger_ir_up()) IR up
     87 00000100:00080000:0.0:1399359272.816763:0:9414:0:(import.c:1106:ptlrpc_connect_interpret()) MGC192.168.177.145@tcp: Resetting ns_connect_flags to server flags: 0x11005000020
     88 00000100:00080000:0.0:1399359272.816822:0:9520:0:(import.c:1482:ptlrpc_disconnect_import()) ffff88001d374800 MGS: changing import state from FULL to CONNECTING
     89 00000100:00080000:0.0:1399359272.817942:0:9520:0:(import.c:1495:ptlrpc_disconnect_import()) ffff88001d374800 MGS: changing import state from CONNECTING to CLOSED
     90 00000100:00080000:0.0:1399359272.817948:0:9520:0:(import.c:204:ptlrpc_deactivate_and_unlock_import()) setting import MGS INVALID
     91 10000000:01000000:0.0:1399359272.817951:0:9520:0:(mgc_request.c:1160:mgc_import_event()) import event 0x808002
     92 10000000:01000000:0.0:1399359272.817954:0:9520:0:(mgc_request.c:1160:mgc_import_event()) import event 0x808003
     93 00000020:00000080:0.0:1399359272.817961:0:9520:0:(genops.c:1225:class_disconnect()) disconnect: cookie 0xa9a320b358ce08f3
     94 00000020:00000080:0.0:1399359272.817965:0:9520:0:(genops.c:825:class_export_put()) final put ffff880019ec0000/547e172a-7802-ef66-bd18-f473852ac6b1
     95 00000020:01000000:0.0:1399359272.817972:0:9520:0:(obd_config.c:1727:class_manual_cleanup()) Manual cleanup of MGC192.168.177.145@tcp (flags='')
     96 00000020:00000080:0.0:1399359272.817977:0:9520:0:(obd_config.c:1068:class_process_config()) processing cmd: cf004
     97 00000020:00000080:0.0:1399359272.818061:0:9520:0:(obd_config.c:674:class_cleanup()) MGC192.168.177.145@tcp: forcing exports to disconnect: 2
     98 00000020:00080000:0.0:1399359272.818065:0:9520:0:(genops.c:1541:print_export_data()) MGC192.168.177.145@tcp: ACTIVE ffff880019eaa000 547e172a-7802-ef66-bd18-f473852ac6b1 (no nid) 3 (0 0 0) 0 0 0 0: (null)  0
     99 00000020:00080000:0.0:1399359272.818073:0:9520:0:(genops.c:1541:print_export_data()) MGC192.168.177.145@tcp: ZOMBIE ffff880019ec0000 547e172a-7802-ef66-bd18-f473852ac6b1 192.168.177.145@tcp 0 (0 0 0) 1 0 0 0:     (null)  0
    100 00000020:00080000:0.0:1399359272.818077:0:9520:0:(genops.c:1314:class_disconnect_exports()) OBD device 6 (ffff88001b13e3b8) has exports, disconnecting them
    101 00000020:00080000:0.0:1399359272.818080:0:9520:0:(genops.c:1277:class_disconnect_export_list()) exp ffff880019eaa000 export uuid == obd uuid, don't discon
    102 10000000:01000000:0.0:1399359272.818083:0:9520:0:(obd_class.h:673:obd_cleanup_client_import()) MGC192.168.177.145@tcp: client import never connected
    103 10000000:01000000:0.0:1399359272.818085:0:9520:0:(mgc_request.c:1160:mgc_import_event()) import event 0x808003
    104 00000020:00000080:0.0:1399359272.818107:0:9520:0:(obd_config.c:1068:class_process_config()) processing cmd: cf002
    105 00000020:00000080:0.0:1399359272.818109:0:9520:0:(obd_config.c:599:class_detach()) detach on obd MGC192.168.177.145@tcp (uuid 547e172a-7802-ef66-bd18-f473852ac6b1)
    106 00000020:00000080:0.0:1399359272.818115:0:9520:0:(obd_config.c:1068:class_process_config()) processing cmd: cf006
    107 00000020:00000080:0.0:1399359272.818117:0:9520:0:(obd_config.c:1087:class_process_config()) removing mappings for uuid MGC192.168.177.145@tcp_0
    108 00000020:01000004:0.0:1399359272.818121:0:9520:0:(obd_mount.c:654:lustre_put_lsi()) put ffff88001bfea800 1
    109 00000020:01000004:0.0:1399359272.818123:0:9520:0:(obd_mount.c:604:lustre_free_lsi()) Freeing lsi ffff88001b314800
    110 00000020:00020000:0.0:1399359272.818133:0:9520:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-5)
    111 00000020:00000080:0.0:1399359272.818509:0:9402:0:(genops.c:955:class_import_destroy()) destroying import ffff88001d374800 for MGC192.168.177.145@tcp
    112 00000020:00000080:0.0:1399359272.818514:0:9402:0:(genops.c:779:class_export_destroy()) destroying export ffff880019ec0000/547e172a-7802-ef66-bd18-f473852ac6b1 for MGC192.168.177.145@tcp
    113 00000020:01200004:0.0:1399359274.021599:0:9559:0:(obd_mount.c:1225:lustre_fill_super()) VFS Op: sb ffff88001e3db800
    114 00000020:01000004:0.0:1399359274.021606:0:9559:0:(obd_mount.c:809:lmd_print())   mount data:
    115 00000020:01000004:0.0:1399359274.021606:0:9559:0:(obd_mount.c:811:lmd_print()) profile: lustre1-client
    116 00000020:01000004:0.0:1399359274.021606:0:9559:0:(obd_mount.c:812:lmd_print()) device:  192.168.177.145@tcp:/lustre1
    117 00000020:01000004:0.0:1399359274.021607:0:9559:0:(obd_mount.c:813:lmd_print()) flags:   2
    118 00000020:01000004:0.0:1399359274.021607:0:9559:0:(obd_mount.c:1250:lustre_fill_super()) Mounting client lustre1-client
    119 00000020:01000004:0.0:1399359274.021615:0:9559:0:(obd_mount.c:333:lustre_start_mgc()) Start MGC 'MGC192.168.177.145@tcp'
    120 00000020:00000080:0.0:1399359274.021617:0:9559:0:(obd_config.c:1068:class_process_config()) processing cmd: cf005
    121 00000020:00000080:0.0:1399359274.021619:0:9559:0:(obd_config.c:1079:class_process_config()) adding mapping from uuid MGC192.168.177.145@tcp_0 to nid 0x20000c0a8b191 (192.168.177.145@tcp)
    122 00000020:01000004:0.0:1399359274.021633:0:9559:0:(obd_mount.c:192:lustre_start_simple()) Starting obd MGC192.168.177.145@tcp (typ=mgc)
    123 00000020:00000080:0.0:1399359274.021634:0:9559:0:(obd_config.c:1068:class_process_config()) processing cmd: cf001
    124 00000020:00000080:0.0:1399359274.021635:0:9559:0:(obd_config.c:366:class_attach()) attach type mgc name: MGC192.168.177.145@tcp uuid: 36f7e595-bc95-0a5c-6291-bb650109f125
    125 00000020:00020000:0.0:1399359274.021660:0:9559:0:(genops.c:320:class_newdev()) Device MGC192.168.177.145@tcp already exists at 6, won't add
    126 00000020:00020000:0.0:1399359274.021666:0:9559:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.177.145@tcp of type mgc : -17
    127 00000020:00020000:0.0:1399359274.021668:0:9559:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.177.145@tcp attach error -17
    128 00000020:01000004:0.0:1399359274.021671:0:9559:0:(obd_mount.c:654:lustre_put_lsi()) put ffff88001e3db800 1
    129 00000020:01000004:0.0:1399359274.021673:0:9559:0:(obd_mount.c:604:lustre_free_lsi()) Freeing lsi ffff88001bfea800
    130 00000020:00020000:0.0:1399359274.021674:0:9559:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-17)
    
  • messages
    LustreError: 9414:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 50c sleeping for 120000ms
    LustreError: 9520:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880006505800 x1467333992251884/t0(0) o101->MGC192.168.177.145@tcp@192.168.177.145@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
    LustreError: 9520:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880006505800 x1467333992251888/t0(0) o101->MGC192.168.177.145@tcp@192.168.177.145@tcp:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
    LustreError: 15c-8: MGC192.168.177.145@tcp: The configuration from log 'lustre1-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
    LustreError: 9520:0:(llite_lib.c:1043:ll_fill_super()) Unable to process log: -5
    Lustre: Unmounted lustre1-client
    LustreError: 9414:0:(fail.c:137:__cfs_fail_timeout_set()) cfs_fail_timeout id 50c awake
    LustreError: 9520:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount  (-5)
    LustreError: 9559:0:(genops.c:320:class_newdev()) Device MGC192.168.177.145@tcp already exists at 6, won't add
    LustreError: 9559:0:(obd_config.c:374:class_attach()) Cannot create device MGC192.168.177.145@tcp of type mgc : -17
    LustreError: 9559:0:(obd_mount.c:196:lustre_start_simple()) MGC192.168.177.145@tcp attach error -17
    
Comment by Zhenyu Xu [ 06/May/14 ]

would you please collect -1 log and upload here? I've tried the patch with your reproduction procedure and it works.

Comment by parinay v kondekar (Inactive) [ 06/May/14 ]

attached are the logs requested. Let me know, if I am missing something.

Thanks

Comment by Zhenyu Xu [ 06/May/14 ]

in the -1 log
in 8711 line the mgc device reference count is 1

00000020:00000040:0.0:1399389218.949028:0:4386:0:(obd_config.c:733:class_decref()) Decref MGC192.168.177.145@tcp (ffff88001a5861b8) now 1

and until in line 9810, its reference count becomes to 0 after lustre1-sptlrpc llog is finally quit from its mgc requeue thread

00000020:00000040:0.0:1399389225.336730:0:4386:0:(obd_config.c:733:class_decref()) Decref MGC192.168.177.145@tcp (ffff88001a5861b8) now 0

and all the lustre_start_mgc() between them returns -17 since the llog request has reference of the mgc device.

$ grep -n "lustre_start_mgc().*-17" lu-4943.log
8825:00000020:00000001:0.0:1399389220.091202:0:4506:0:(obd_mount.c:404:lustre_start_mgc()) Process leaving via out_free (rc=18446744073709551599 : -17 : 0xffffffffffffffef)
8829:00000020:00000001:0.0:1399389220.091209:0:4506:0:(obd_mount.c:495:lustre_start_mgc()) Process leaving (rc=18446744073709551599 : -17 : ffffffffffffffef)
8917:00000020:00000001:0.0:1399389220.794620:0:4508:0:(obd_mount.c:404:lustre_start_mgc()) Process leaving via out_free (rc=18446744073709551599 : -17 : 0xffffffffffffffef)
8921:00000020:00000001:0.0:1399389220.794622:0:4508:0:(obd_mount.c:495:lustre_start_mgc()) Process leaving (rc=18446744073709551599 : -17 : ffffffffffffffef)
9041:00000020:00000001:0.0:1399389221.706351:0:4510:0:(obd_mount.c:404:lustre_start_mgc()) Process leaving via out_free (rc=18446744073709551599 : -17 : 0xffffffffffffffef)
9045:00000020:00000001:0.0:1399389221.706357:0:4510:0:(obd_mount.c:495:lustre_start_mgc()) Process leaving (rc=18446744073709551599 : -17 : ffffffffffffffef)

So I think the patch should work, we just need to wait longer.

Comment by parinay v kondekar (Inactive) [ 07/May/14 ]
  • Second mount works after wait.
  • Do you think, large number of lustre clients, would affect this wait? (IMO increasing it,as refcount would be more)
  • Can we change any timeouts to control/reduce this wait time ?
    e.g.
    484 #define MGC_TIMEOUT_MIN_SECONDS   5
    485 #define MGC_TIMEOUT_RAND_CENTISEC 0x1ff /* ~500 */
    

Please feel free to correct me, if wrong.

Thanks

Comment by Zhenyu Xu [ 07/May/14 ]

Do you think, large number of lustre clients, would affect this wait? (IMO increasing it,as refcount would be more)

Basically no, later lustre clients only increase the mgc's obd::u::cli::cl_mgc_refcount, which would be decreased when they fail to mount by calling lustre_common_put_super()->lustre_stop_mgc(), while it does not affect the cld in the requeue thread. But of course unless these large number of lustre clients error-out take too much time it sure will delay the mgc device cleanup.

I think shorten these mgc timeout values can reduce the wait time, but this really is a rare case and in the case the situation happens, we can umount lustre related devices and unload all lustre modules to expedite the cleanup and after fixing the issue preventing the mgc from connecting, we can restart the mount procedure again.

Comment by Ryan Haasken [ 09/May/14 ]

I have tested patch-set 7 of the master version of the patch: http://review.whamcloud.com/#/c/10129/

Using Parinay's reproducer steps, it does not seem to resolve the issue. I have uploaded LU-4943-mgc-cleanup.tar.gz, which contains the test script and the full dk logs.

Note that after I unmounted the first file system, the mgc from the first failed mount of the second file system was cleaned up.

Comment by Zhenyu Xu [ 12/May/14 ]

The explanation of this phenomenon (until umount the first client got the 2nd mgc released) is that mgc requeue thread is for getting updated MGS config logs if necessary, if the whole system config is still without any config data change, the mgc requeue thread will not be waked, so the 2nd mgc llog data will stayed in the thread.

In your test, the umount of the 1st client stur up the client's mgc requeue thread which make the 2nd mgc release, you can also simply make the stirring by 'lctl set_param -P debug=$olddebug"

Comment by Ryan Haasken [ 12/May/14 ]

So this command (or another command affecting config data) has to be run on the MGS of the mounted file system, is that right?

lctl set_param -P debug="$oldDebug"

In my testing, running this command on the MGS for the first, mounted file system caused the client's mgc device from the second file system to be cleaned up after about 5 seconds. Running this command on the MGS for the second file system did not affect the stale mgc device. Does that sound right? Is there some better fix in the works that does not require us to "stir up" the client's mgc requeue thread? This does not seem like an optimal solution.

Parinay, you did not mention this step in your reproducer. Did you do something else to "stir up" the mgc requeue thread?

Comment by Ryan Haasken [ 12/May/14 ]

I just tested the reproducer steps on a pristine master client without your patch, and I got the same results as I did with your patch. Here are the steps.

On the client:

mount -t lustre centss01:/centss01 /mnt/centss01 -o rw,flock,lazystatfs
lctl set_param fail_loc=0x8000050c
lctl set_param fail_val=120
# The following mount attempts will fail, first with rc=5, then with rc=17
mount -t lustre centss06:/centss06 /mnt/centss06 -o rw,flock,lazystatfs
mount -t lustre centss06:/centss06 /mnt/centss06 -o rw,flock,lazystatfs

On the MGS for the first file system, execute the following command

lctl set_param -P debug="ioctl neterror warning error emerg ha config console"

Then wait about a minute to be safe, and execute the mount command on the client again. This time it will succeed.

mount -t lustre centss06:/centss06 /mnt/centss06 -o rw,flock,lazystatfs

These steps give the same behavior between a master+patch client and a pristine master client. Is there some reproducer which distinguishes between a client with the patch and a client without the patch?

Comment by Zhenyu Xu [ 13/May/14 ]

1. yes, better execute it on MGS, since the 2nd system client does not established the mgc->mgs connection, the stir up won't work.
2. the mgc requeue mechanism requires that mgc requeue thread won't be waked unless there is system config change happens in the whole filesystem, so if a client is unfortunately failing to establish its mgc->mgs connection, this is the way to release its mgc device. Given this is a really rare case, I think it's not too much a burden solution.
3. there are several places to fail the mgc device setup, for example a) before any llog is inserted in the requeue thread; b) llog is inserted, while llog process failed for connection failure.

Comment by parinay v kondekar (Inactive) [ 13/May/14 ]
  • No, your reproducer is what I followed nothing to"stir up" the mgc requeue.
  • I have created a new patch, will soon upload it here.

Thanks

Comment by Zhenyu Xu [ 13/May/14 ]

It's timing that makes the difference, when llog hasn't sit tight in the requeue thread, my patch will remove it from the requeue thread list and release mgc device eariler, but if the llog is already passed to the bottom of the requeue process and the thread is waiting for system config change, then my patch won't speed up the release, we need manually stir up the requeue thread for another loop to release the mgc device.

Comment by parinay v kondekar (Inactive) [ 04/Jun/14 ]

http://review.whamcloud.com/#/c/10569/

Comment by Ryan Haasken [ 09/Jun/14 ]

I have tested the master patch, and I've found that it fixes this issue.

Parinay, please update the patch when you get a chance to fix the style issues. Then I'll inspect it again and approve. Thanks!

Comment by Mahmoud Hanafi [ 07/Aug/14 ]

Can be close from our end

Comment by Peter Jones [ 07/Aug/14 ]

Mahmoud

To be clear - which patches (if any) did you end up using to meet your requirements?

Thanks

Peter

Comment by Ryan Haasken [ 07/Aug/14 ]

From our end, we need http://review.whamcloud.com/#/c/10569/ to land, but there are some style issues with it right now.

Comment by Mahmoud Hanafi [ 07/Aug/14 ]

we used http://review.whamcloud.com/10127

Comment by Peter Jones [ 07/Aug/14 ]

Thanks Mahmoud. I suggest that we keep this ticket open until at least the master patch - http://review.whamcloud.com/#/c/10129/ - lands.

Ryan it would probably be best to decouple the upstreaming of http://review.whamcloud.com/#/c/10569/ from this NASA support issue and use a unique JIRA ticket reference on the next push.

Comment by Ryan Haasken [ 12/Aug/14 ]

Are the two changes compatible with each other?

Comment by Zhenyu Xu [ 13/Aug/14 ]

yes, I think they are compatible.

Comment by Ryan Haasken [ 04/Sep/14 ]

Mahmoud, do you know which patch-set of the change http://review.whamcloud.com/#/c/10129/ you used? The newest version of the patch looks like it addresses the problem similar to the way http://review.whamcloud.com/#/c/10569/ addresses the problem.

Also, there is a problem with the current patch-set (PS10) of http://review.whamcloud.com/#/c/10129/ . With this patch-set applied mount.lustre hangs with the following trace in dmesg:

INFO: task mount.lustre:12025 blocked for more than 120 seconds.
      Not tainted 2.6.32.431.23.3.el6_lustre #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount.lustre  D 0000000000000001     0 12025  12024 0x00000080
 ffff88013bb15828 0000000000000086 ffff88013bb157b8 ffffffff81069f15
 ffff88013bb15798 ffff88013dc08ad8 ffff8800283168e8 ffff880028316880
 ffff88013a56fab8 ffff88013bb15fd8 000000000000fbc8 ffff88013a56fab8
Call Trace:
 [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
 [<ffffffff8152a365>] schedule_timeout+0x215/0x2e0
 [<ffffffff81069f15>] ? enqueue_entity+0x125/0x450
 [<ffffffff81529fe3>] wait_for_common+0x123/0x180
 [<ffffffff81061d00>] ? default_wake_function+0x0/0x20
 [<ffffffff8152a0fd>] wait_for_completion+0x1d/0x20
 [<ffffffffa11fd0a8>] mgc_setup+0x4c8/0x5a0 [mgc]
 [<ffffffffa0c4de8b>] obd_setup+0x19b/0x290 [obdclass]
 [<ffffffffa0afc181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0af63a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa0c4e188>] class_setup+0x208/0x870 [obdclass]
 [<ffffffffa0c56a6c>] class_process_config+0xc6c/0x1ad0 [obdclass]
 [<ffffffffa0af63a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa0c5baab>] ? lustre_cfg_new+0x40b/0x6f0 [obdclass]
 [<ffffffffa0c5bee8>] do_lcfg+0x158/0x450 [obdclass]
 [<ffffffff8128daa0>] ? sprintf+0x40/0x50
 [<ffffffffa0c5c274>] lustre_start_simple+0x94/0x200 [obdclass]
 [<ffffffffa0c60993>] lustre_start_mgc+0xbd3/0x1e00 [obdclass]
 [<ffffffffa0af63a8>] ? libcfs_log_return+0x28/0x40 [libcfs]
 [<ffffffffa0afc181>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
 [<ffffffffa0c61ccc>] lustre_fill_super+0x10c/0x550 [obdclass]
 [<ffffffffa0c61bc0>] ? lustre_fill_super+0x0/0x550 [obdclass]
 [<ffffffff8118c5df>] get_sb_nodev+0x5f/0xa0
 [<ffffffffa0c59995>] lustre_get_sb+0x25/0x30 [obdclass]
 [<ffffffff8118bc3b>] vfs_kern_mount+0x7b/0x1b0
 [<ffffffff8118bde2>] do_kern_mount+0x52/0x130
 [<ffffffff811ad7bb>] do_mount+0x2fb/0x930
 [<ffffffff811ade80>] sys_mount+0x90/0xe0
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

I think this is because rq_start is never completed in mgc_requeue_thread().

Comment by Jay Lan (Inactive) [ 04/Sep/14 ]

Ryan, we use http://review.whamcloud.com/10127 (for b2_4 branch) instead.

The latest PS for #10129 (for master branch) is PS10 as you pointed out, but the latest PS for #10127 is still PS7. We use PS7 of #10127.

Comment by Ryan Haasken [ 04/Sep/14 ]

Thanks Jay. I have been testing on master, so that may explain why PS7 didn't fix the problem for me. PS10 of #10129 takes a different approach than the earlier patch-sets. It now takes a similar approach to #10569. PS10 of #10129 is broken, but when I fixed it locally and rebuilt, it resolved the problem in the same way that #10569 did.

It seems to me that we only need either #10129 or #10569. Can anybody confirm this?

Comment by Zhenyu Xu [ 05/Sep/14 ]

update http://review.whamcloud.com/#/c/10127/ (b2_4) to be sync with master.

Comment by Ryan Haasken [ 06/Oct/14 ]

Is the test failure in replay-ost-single on http://review.whamcloud.com/#/c/10129/ related to the patch? It doesn't seem like it to me, but I don't see a bug matching that failure.

Comment by Ryan Haasken [ 15/Oct/14 ]

The patch for master has landed.

This issue also exists in 2.5. Here is a port for b2_5: http://review.whamcloud.com/#/c/12303

Comment by Ryan Haasken [ 16/Oct/14 ]

It looks like we may have gotten the same spurious Maloo failures on the b2_5 patch as we did on other branches. Can somebody restart Maloo?

Comment by Peter Jones [ 04/Nov/14 ]

Landed for 2.7

Comment by Ryan Haasken [ 25/Nov/14 ]

I didn't notice that there was already a b2_5 version of this fix, so http://review.whamcloud.com/#/c/12303/ has been abandoned in favor of http://review.whamcloud.com/#/c/11765

Comment by Gerrit Updater [ 04/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11765/
Subject: LU-4943 obdclass: detach MGC dev on error
Project: fs/lustre-release
Branch: b2_5
Current Patch Set:
Commit: 8d1e9394d3a984e257e1e4b0f46f16b7ff2183cd

Generated at Sat Feb 10 01:47:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.