So, someone deleted a log file of a running process because partition was running out of space. That’s seemingly logical thing to do, but not a wise one.

What ended up happening was that the filesystem was still getting filled and deleting the log file made no difference whatsoever. So what happened? When the log file was deleted the running process still had the log file’s file handle opened. By removing the log file, the file handle does not just magically disappear. So, the process continued writing more data through the file handle, consuming more space, even though the log file itself was already “gone”.

When I looked at the output of df I saw / partition was still 93% used:

[root@carbon]# df -kl
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p3      8123200   7138844    565060  93% /
/dev/cciss/c0d0p2      8123200    723544   6980360  10% /var
/dev/cciss/c0d0p1       147764     33592    106543  24% /boot
tmpfs                  6232424         0   6232424   0% /dev/shm

…but du gave a different output:

root@carbon /]# du -s -h -x ./*
5.6M    ./bin
28M     ./boot
94M     ./data
92K     ./dev
82M     ./etc
52K     ./home
261M    ./lib
16K     ./lost+found
8.0K    ./media
8.0K    ./mnt
131M    ./opt
0       ./proc
61M     ./root
24M     ./sbin
8.0K    ./selinux
8.0K    ./srv
0       ./sys
66M     ./tmp
1.1G    ./usr
561M    ./var

There was no way 7GB could be in use in /. Looking at the output of lsof, I saw deleted log file:

[root@carbon]# lsof|grep deleted
logger     7346      root    3w      REG      104,3 5281543076     131513 /data/apps/var/log/error_log (deleted)

Restarting the process took care of the problem, but the whole thing highlights the obvious: do not mock with active files. cat /dev/null > error_log would have been a better choice, but not necessarily 100% safe.

Related linkage:

Finding open files with lsof

What happens to a deleted file still subject to redirection on linux