[LU-2336] mds llog_write_rec 'No space left on device' Created: 15/Nov/12  Updated: 15/Mar/14  Resolved: 08/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.2.0, Lustre 2.1.2
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Major
Reporter: ETHz Support (Inactive) Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: metadata, mn1, shh
Environment:

[root@n-mds1 ~]# cat /proc/fs/lustre/version
lustre: 2.2.0
kernel: patchless_client
build: 2.2.0-RC2--PRISTINE-2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# uname -r
2.6.32-220.4.2.el6_lustre.x86_64

[root@n-mds1 ~]# rpm -qa|grep lustre
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
kernel-headers-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
kernel-devel-2.6.32-220.4.2.el6_lustre.x86_64


Severity: 3
Epic: metadata
Rank (Obsolete): 5568

 Description   

I noticed that our MDS logs this message once in while:
LustreError: 3354:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff8809f8de6780
LustreError: 20199:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff8809c6b299c0
LustreError: 32022:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff8804dd524d80
LustreError: 32015:0:(llog_cat.c:298:llog_cat_add_rec()) llog_write_rec -28: lh=ffff8804dd76af00



 Comments   
Comment by ETHz Support (Inactive) [ 15/Nov/12 ]

Could be a problem? This is not related with the mds crashes that I have described in the LU-2323 issue.

thanks in advance

Comment by Peter Jones [ 15/Nov/12 ]

Hongchao

Could you please comment on this one?

Thanks

Peter

Comment by Hongchao Zhang [ 16/Nov/12 ]

no, this should be not a problem, the PIDs in these logs are different, a new llog file will be created if the current one
is full, and if there are two logs with same PID in a row, then it could indicate there is some problem. the related codes is,

        /* now let's try to add the record */
        rc = llog_write_rec(env, loghandle, rec, reccookie, 1, buf, -1, th);
        if (rc < 0)
                CERROR("llog_write_rec %d: lh=%p\n", rc, loghandle);
        cfs_up_write(&loghandle->lgh_lock);
        if (rc == -ENOSPC) {
                /* try to use next log */
                loghandle = llog_cat_current_log(cathandle, th);
                LASSERT(!IS_ERR(loghandle));
                /* new llog can be created concurrently */
                if (!llog_exist(loghandle)) {
                        rc = llog_cat_new_log(env, cathandle, loghandle, th);
                        if (rc < 0) {
                                cfs_up_write(&loghandle->lgh_lock);
                                RETURN(rc);
                        }
                }
                /* now let's try to add the record */
                rc = llog_write_rec(env, loghandle, rec, reccookie, 1, buf,
                                    -1, th);
                if (rc < 0)
                        CERROR("llog_write_rec %d: lh=%p\n", rc, loghandle);
                cfs_up_write(&loghandle->lgh_lock);
        }
Comment by Prakash Surya (Inactive) [ 13/Dec/12 ]

We're seeing this in production here at LLNL, and it is confusing our admins. If the message is harmless, it really should be removed (or at least masked behind a CDEBUG flag).

Comment by Hongchao Zhang [ 23/Jan/13 ]

the patch is tracked at http://review.whamcloud.com/#change,5146

Comment by Jodi Levi (Inactive) [ 19/Apr/13 ]

With Change, 5146 landed can this ticket be closed?

Comment by John Fuchs-Chesney (Inactive) [ 08/Mar/14 ]

Patch provided and landed.

Generated at Sat Feb 10 01:24:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.