Like any agency owner, I wear multiple hats: marketer, accountant, server administrator, etc. And while I have managed servers and the support is excellent, I do dig on my own sometimes when there's an issue. In this instance I was getting some 500 errors on sites while I was doing some routine checks, and it look me a while to try to understand the hierarchy of errors and how to work with the logs that you need to diagnose the errors....so I figured I'd write a post about the steps I took to diagnose the errors I was seeing.
Here's how I diagnosed my WordPress errors, and I'm hoping it will help someone else do the same. Please understand that I'm NOT a professional CentOS admin -- I'm a working user that knows how to do what I need to do -- so your mileage my vary. And of course, please BACKUP anything and everything before you make system changes.
Have a full-time 500 error? This article assumes that you are dealing with a situation where your WordPress sites behaves strangely - sometimes - meaning that if you load your site and you're always getting a 500 server error, then it's likely an issue with your .htaccess configuration.
The first thing you need to determine is what's causing the problem. I'm also a dentist and the most important thing I do all day is determine WHAT caused my patient's problem -- if I don't know WHY something happened, then I can't fix it. [You could always adjust the wp-config.php debug setting to true which is fine for a single site or two, but if you have multiple sites, it will take a lot of time to turn debug on and then off for all of these sites.]
The first place I looked was the Apache log for the entire server, because I wanted to see if the problem was happening on many accounts, or just a few. If there was a problem at the Apache level, then it's likely that many, if not all the accounts would be affected, and not just a few.
You can find that log here /etc/apache2/logs/error_log . The first time you look at the log your eyes will fall out of your head (if you haven't looked at one before) and there are plenty of tutorials about how to read these logs, so I won't rehash all that here. But what I will tell you is that there are many online services that will let you upload the logs (which loosely speaking, "translates" the logs into more of a table so you can see what's going on, and eliminate the data you don't need to look at). I found that loggly.com and sumologic.com were really good for this. I used the free versions and I downloaded the log from the server, and uploaded it to the services and did my digging.
From my very superficial digging, I saw that only a few accounts were affected - that is, in this specific instance, I kept seeing the same accounts pop up over and over again in the log. I still wasn't sure what the problem was, but I used the loggly.com and the sumologic.com tools to help sort the log data so I could organize the data. Once I saw that the errors were linked to specific accounts, I then looked at these accounts in particular.
While you can pull apache logs for just a single user, I used the entire server log in the step above because I didn't want to be too focused in my initial search. If I looked at the Apache logs for 3 accounts and all three had issues, I might mistakenly conclude that Apache is the issue. So by looking at the entire Apache log first, I was able to narrow down my focus.
So I went to each site individually and I look at the PHP Error log. In this instance we were using cPanel, and the error log is here: /logs/[username].php.error.log. Again, this is a fairly complex chunk of data to look at so I used the above tools again to help organize it. In this case, I started by searching the data for only those entries that showed a fatal PHP error. Of course PHP warnings are important, too, but fatal errors need to be addressed first.
If you do a search for fatal php errors and you come up empty, then I can't help you here (sorry about that). But since your WordPress install is behaving erratically (or you wouldn't be reading this), and your Apache logs point to only a few sites, then it's likely that you'll find a fatal php error. And now that you've found the error, it's likely that you can now troubleshoot that error specifically, whether it's a variant of a PHP-FPM max workers error such as server reached MaxRequestWorkers , or the dreaded Allowed memory size of 1234567 bytes exhausted (tried to allocate 1234567)... error.
You can now do a search for the solution to your specific problem, and in a nutshell, the allowed memory error likely requires you to boost the memory setting in php.ini and/or wp-config.php, and the max workers issue is something to discuss with your host, as it has to do with how many PHP-FPM process can run at once. Sometimes you just need more RAM on your server, but before you spend money on RAM, you need to diagnose the problem.
I think I fixed it - what's next?
Once you fix the error, you need to go back into the logs to see if the error is gone (and assuming your site is loading properly now). Though if your site was going up and down, you might be looking at the site at a "good time" so you do need to look at logs to see if the problem is persisting. One option is to grab the log files again at a later time, and run through the same steps to upload and analyze the log files. This approach is certainly a good one, but wouldn't it be nice to have all of your relevant logs automagically pulled from your server(s) to somewhere where you can look at them all at once -- in real time and historically? Yeah. It is.
After figuring out by hand what was wrong with my site, I decided to give automagic-log-pulling (is that a new phrase?) a try. And while loggly and sumologic.com can handle this task, I ended up trying out elastic.co for my needs simply because they had the lowest price point, and I was able to setup the service very quickly and easily. That's not to say that loggly and sumologic aren't good products --they just felt to me more enterprise-like, and if I had 5,000 sites with TBs of logs, I might have gone in that direction.
In the next post I'm going to talk about how I setup Elastic.co to pull specific logs from my servers and the saved sorts I now use to check-in on my boxes.
Note: I have no affiliate relationship with any provider listed above, and I am not being paid or otherwise compensated for this post. The opinions expressed above are entirely my own.