So, someone deleted a log file of a running process because partition was running out of space. That’s seemingly logical thing to do, but not a wise one.
What ended up happening was that the filesystem was still getting filled and deleting the log file made no difference whatsoever. So what happened? When the log file was deleted the running process still had the log file’s file handle opened. By removing the log file, the file handle does not just magically disappear. So, the process continued writing more data through the file handle, consuming more space, even though the log file itself was already “gone”.
When I looked at the output of df I saw / partition was still 93% used:
[root@carbon]# df -kl Filesystem 1K-blocks Used Available Use% Mounted on /dev/cciss/c0d0p3 8123200 7138844 565060 93% / /dev/cciss/c0d0p2 8123200 723544 6980360 10% /var /dev/cciss/c0d0p1 147764 33592 106543 24% /boot tmpfs 6232424 0 6232424 0% /dev/shm
…but du gave a different output:
root@carbon /]# du -s -h -x ./* 5.6M ./bin 28M ./boot 94M ./data 92K ./dev 82M ./etc 52K ./home 261M ./lib 16K ./lost+found 8.0K ./media 8.0K ./mnt 131M ./opt 0 ./proc 61M ./root 24M ./sbin 8.0K ./selinux 8.0K ./srv 0 ./sys 66M ./tmp 1.1G ./usr 561M ./var
There was no way 7GB could be in use in /. Looking at the output of lsof, I saw deleted log file:
[root@carbon]# lsof|grep deleted logger 7346 root 3w REG 104,3 5281543076 131513 /data/apps/var/log/error_log (deleted)
Restarting the process took care of the problem, but the whole thing highlights the obvious: do not mock with active files. cat /dev/null > error_log would have been a better choice, but not necessarily 100% safe.