[LU-441] ll_fill_super()) Unable to process log: -108 Created: 21/Jun/11  Updated: 07/May/15  Resolved: 07/May/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.8, Lustre 1.8.7, Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Jian Yu Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Lustre Branch: v1_8_6_RC2
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/80/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/40/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED, kernel version: 2.6.32-131.2.1.el6)
RHEL5/x86_64(server, OFED 1.5.3.1, kernel version: 2.6.18-238.12.1.el5_lustre)
ENABLE_QUOTA=yes
FAILURE_MODE=HARD

MGS/MDS Nodes: client-10-ib(active), client-12-ib(passive)
\ /
1 combined MGS/MDT

OSS Nodes: fat-amd-1-ib(active), fat-amd-2-ib(active)
\ /
OST1 (active in fat-amd-1-ib)
OST2 (active in fat-amd-2-ib)
OST3 (active in fat-amd-1-ib)
OST4 (active in fat-amd-2-ib)
OST5 (active in fat-amd-1-ib)
OST6 (active in fat-amd-2-ib)

Client Nodes: fat-amd-3-ib, client-6-ib


Issue Links:
Related
is related to LU-630 mount failure after MGS connection lo... Resolved
Severity: 3
Bugzilla ID: 20,997
Rank (Obsolete): 5211

 Description   

replay-single test 0c failed as follows:

== test 0c: expired recovery with no clients == 22:09:59
Filesystem           1K-blocks      Used Available Use% Mounted on
client-10-ib@o2ib:client-12-ib@o2ib:/lustre
                      11811168    485956  10724828   5% /mnt/lustre
Failing mds on node client-12-ib
+ pm -h powerman --off client-12
Command completed successfully
affected facets: mds
+ pm -h powerman --on client-12
Command completed successfully
df pid is 14399
Failover mds to client-10-ib
22:10:44 (1308633044) waiting for client-10-ib network 900 secs ...
22:10:44 (1308633044) network interface is UP
Starting mds: -o user_xattr,acl  /dev/disk/by-id/scsi-1IET_00010001 /mnt/mds
client-10-ib: lnet.debug=0x33f1504
client-10-ib: lnet.subsystem_debug=0xffb7e3ff
client-10-ib: lnet.debug_mb=48
Started lustre-MDT0000
Starting client: fat-amd-3-ib: -o user_xattr,acl,flock client-10-ib@o2ib:client-12-ib@o2ib:/lustre /mnt/lustre
mount.lustre: mount client-10-ib@o2ib:client-12-ib@o2ib:/lustre at /mnt/lustre failed: Cannot send after transport endpoint shutdown
 replay-single test_0c: @@@@@@ FAIL: mount fails

Dmesg on the client node fat-amd-3:

Lustre: 9996:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1372200800092968 sent from MGC192.168.4.10@o2ib to NID 192.168.4.10@o2ib 0s ago has failed due to network error (5s prior to deadline).
  req@ffff8800d409d800 x1372200800092968/t0 o250->MGS@MGC192.168.4.10@o2ib_0:26/25 lens 368/584 e 0 to 1 dl 1308633093 ref 1 fl Rpc:N/0/0 rc 0/0
Lustre: 9996:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 6 previous similar messages
LustreError: 14511:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff8800d409d400 x1372200800092971/t0 o501->MGS@MGC192.168.4.10@o2ib_1:26/25 lens 264/432 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
LustreError: 15c-8: MGC192.168.4.10@o2ib: The configuration from log 'lustre-client' failed (-108). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
LustreError: 14511:0:(llite_lib.c:1095:ll_fill_super()) Unable to process log: -108
Lustre: client ffff8801192e9800 umount complete
LustreError: 14511:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount  (-108)
Lustre: DEBUG MARKER: replay-single test_0c: @@@@@@ FAIL: mount fails

Maloo report: https://maloo.whamcloud.com/test_sets/ed5b5ff0-9bca-11e0-9a27-52540025f9af

This is an known issue on Lustre b1_8 branch: bug 20997



 Comments   
Comment by Jian Yu [ 22/Jun/11 ]

recovery-small test 57 failed with the same issue (the subsequent sub-tests failed due to test 57 failure):
https://maloo.whamcloud.com/test_sets/60e96394-9c12-11e0-9a27-52540025f9af

replay-vbr test 0c failed with the same issue (the subsequent sub-tests failed due to test 0c failure):
https://maloo.whamcloud.com/test_sets/41c3eabe-9c8c-11e0-9a27-52540025f9af

Comment by Jian Yu [ 23/Jun/11 ]

Lustre Branch: v1_8_6_RC3
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/90/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED, kernel version: 2.6.32-131.2.1.el6)
                    RHEL5/x86_64(server, OFED 1.5.3.1, kernel version: 2.6.18-238.12.1.el5_lustre)
ENABLE_QUOTA=yes
FAILURE_MODE=HARD

MGS/MDS Nodes: client-10-ib(active), client-12-ib(passive)
                                 \  /
                            1 combined MGS/MDT

OSS Nodes:     fat-amd-1-ib(active), fat-amd-2-ib(active)
                                 \  /
                                 OST1 (active in fat-amd-1-ib)
                                 OST2 (active in fat-amd-2-ib)
                                 OST3 (active in fat-amd-1-ib)
                                 OST4 (active in fat-amd-2-ib)
                                 OST5 (active in fat-amd-1-ib)
                                 OST6 (active in fat-amd-2-ib)

Client Nodes:  fat-amd-3-ib,client-[6,7,16,21,24]-ib

After running recovery-double-scale test, mounting the local client on fat-amd-3-ib failed as follows:

++ sh -c 'mount -t lustre -o user_xattr,acl,flock client-10-ib@o2ib:client-12-ib@o2ib:/lustre /mnt/lustre'
mount.lustre: mount client-10-ib@o2ib:client-12-ib@o2ib:/lustre at /mnt/lustre failed: Cannot send after transport endpoint shutdown
+ return 108

Maloo report: https://maloo.whamcloud.com/test_sets/83e9914e-9d65-11e0-9a27-52540025f9af

Comment by Jian Yu [ 05/Aug/11 ]

Clean upgrading from Lustre 1.8.6-wc1 to 2.0.66.0 also hit this issue:
https://maloo.whamcloud.com/test_sets/d18a7578-bf78-11e0-8bdf-52540025f9af

Comment by Jian Yu [ 04/Sep/11 ]

Clean upgrading from Lustre 1.8.5/1.8.6-wc1 to 2.1.0 also hit this issue:
https://maloo.whamcloud.com/test_sets/3b960258-d76a-11e0-8d02-52540025f9af

Comment by Jian Yu [ 13/Oct/11 ]

Lustre Tag: v1_8_7_WC1_RC1
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/142/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/65/
Distro/Arch: RHEL5/x86_64(server, OFED 1.5.3.2, ext4-based ldiskfs), RHEL6/x86_64(client, in-kernel OFED)
ENABLE_QUOTA=yes
FAILURE_MODE=HARD

recovery-double-scale test: https://maloo.whamcloud.com/test_sets/625e6856-f53f-11e0-908b-52540025f9af

Comment by Jian Yu [ 24/Feb/12 ]

Clean upgrading from Lustre 1.8.7-wc1 to 2.1.1 also hit this issue:
https://maloo.whamcloud.com/test_sets/e25eacbe-5eda-11e1-ab6b-5254004bbbd3

Comment by Jian Yu [ 14/May/12 ]

Lustre Tag: v1_8_8_WC1_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-b1_8/195/
Distro/Arch: RHEL5.8/x86_64(server), RHEL6.2/x86_64(client)
Network: TCP (1GigE)
ENABLE_QUOTA=yes
FAILURE_MODE=HARD

recovery-double-scale test: https://maloo.whamcloud.com/test_sets/b1c8d1a8-9d8f-11e1-a1d8-52540035b04c
recovery-random-scale test: https://maloo.whamcloud.com/test_sets/f899ce66-9d8f-11e1-a1d8-52540035b04c

Comment by Isaac Huang (Inactive) [ 31/Aug/12 ]

This is likely a dup of LU-630, please see also:
http://jira.whamcloud.com/browse/LU-1809?focusedCommentId=44054&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-44054

Comment by Andreas Dilger [ 07/May/15 ]

Haven't seen this in a long time.

Generated at Sat Feb 10 01:07:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.