Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.1.1
-
3
-
4003
Description
It seems the MDT catalog file may be damaged on our test filesystem. We were doing recovery testing with the patch for LU-1352. Sometime after power-cycling the MDS and letting it go through recovery, clients started getting EFAULT writing to lustre. These failures are accompanied by the following console errors on the MDS.
Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_cat.c:81:llog_cat_new_log()) no free catalog slots for log... Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_cat.c:81:llog_cat_new_log()) Skipped 3 previous similar messages Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_obd.c:454:llog_obd_origin_add()) write one catalog record failed: -28 Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(llog_obd.c:454:llog_obd_origin_add()) Skipped 3 previous similar messages Jun 28 12:08:45 zwicky-mds2 kernel: LustreError: 11841:0:(mdd_object.c:1330:mdd_changelog_data_store()) changelog failed: rc=-28 op17 t[0x200de60af:0x17913:0x0]
I mentioned this in LU-1570, but I figured a new ticket was needed.
-17 = EEXIST, so I would suspect it is complaining about a file in CONFIGS, but you reported that was cleared out as well.
You are correct that bringing the OSTs back online should cause the OST recovery logs to be cleaned up. Along with the message in
LU-1570, it seems there is something at your site that is consuming more llogs than normal. Each llog should allow up to 64k unlinks to be stored for recovery, and up to 64k llogs in a catalog PER OST, though new llog files are started for each boot. That means 4B unlinks or 64k reboots, or combinations thereof, per OST before the catalog wraps back to zero (perLU-1570).The logs should be deleted sequentially after the MDT->OST orphan recovery is completed when the OST reconnects, freeing up their slot in the catalog file. It is possible that something was broken in this code in 2.x and it hasn't been noticed until now, since it would take a long time to see the symptoms.
A simple test to reproduce this would be to create & delete files in a loop (~1M) on a specific OST (using "mkdir $LUSTRE/test_ostN; lfs setstripe -i N $LUSTRE/test_ostN; createmany -o $LUSTRE/test_ostN/f 1000000", where "N" is some OST number) and see if the number of llog files in the MDT OBJECTS/ directory is increasing steadily over time (beyond 2 or 3 files per OST). I don't recall specifically, but it may need an unmount and remount of the OST for the llog files to be cleaned up.
Failing that test, try creating a large number of files (~1M) in $LUSTRE/test_ostN, and then unmount OST N and delete all the files. This should succeed without error, but there will be many llog entries stored in the llog file. The llog files should be cleaned when this OST is mounted again.