{"id":2157,"date":"2014-05-13T12:59:46","date_gmt":"2014-05-13T17:59:46","guid":{"rendered":"http:\/\/www.poweradmin.com\/blog\/?p=2157"},"modified":"2015-04-24T16:22:56","modified_gmt":"2015-04-24T21:22:56","slug":"monitoring-best-practices-2","status":"publish","type":"post","link":"https:\/\/www.poweradmin.com\/blog\/monitoring-best-practices-2\/","title":{"rendered":"Server Monitoring Best Practices"},"content":{"rendered":"<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><a href=\"\/blog\/wp-content\/uploads\/2014\/05\/monitoring-servers.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-2205\" style=\"margin-left: 20px; border-width: 3px; margin-right: 20px; border: 2px solid black;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/monitoring-servers.jpg\" alt=\"Server Monitoring Best Practices\" width=\"172\" height=\"138\"><\/a>We often get asked about suggested practices for monitoring servers and it\u2019s a legitimate request \u2013 there are so many moving parts it\u2019s hard to know where to start. There are two things you want your monitoring to do for you:<\/span><\/p>\n<ul>\n<li><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><a title=\"Monitor Systems 24x7\" href=\"\/servermonitor\/?ref=blog\">Watch systems 24\u00d77<\/a> and alert you if there is a problem<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Show you current and historical data (usually charts) to help you get a feel for overall health and future needs<\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">ALL of the suggestions below are for the general case. There are definitely specific situations where one or more recommendations won\u2019t apply (maybe <a title=\"Performance Counter Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_perfmon_counter.aspx?ref=blog\">high memory usage<\/a> is desired on a database server for example), so consider your situation as you consider the recommendations.<\/span><\/p>\n<h2><span style=\"font-family: verdana,geneva;\"><a title=\"Email Alerts\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/action_email.aspx?ref=blog\"><span style=\"font-size: 16pt;\">Alerting<\/span><\/a><\/span><\/h2>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><a href=\"\/blog\/wp-content\/uploads\/2014\/05\/alerts.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-2209 size-full\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/alerts.jpg\" alt=\"Server Monitor Alerts\" width=\"180\" height=\"135\"><\/a>For alerting, it\u2019s a good idea to think of what issues are absolutely critical and must be handled now (corporate web site is down) vs things that need attention, but can wait a bit (disk space is under 10% free).<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">For critical alerts, email is a start, but probably not enough. You want a pager or phone to beep at someone to get their attention. SMS texts, iPhone push notifications, etc. would be a good idea.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Non-critical alerts can go to email. Sometimes emails get deleted or forgotten, so it\u2019s a good idea to have some sort of reminder or event escalation (where alerts get sent higher up the chain of command the longer the issue is left unresolved).<\/span><\/p>\n<h2><span style=\"font-family: verdana,geneva; font-size: 16pt;\">Basic Monitoring<\/span><\/h2>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">The core of most monitoring products is <a title=\"Ping Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_ping.aspx?ref=blog\">ping<\/a>, <a title=\"CPU Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_perfmon_counter.aspx?ref=blog\">CPU<\/a>, <a title=\"Memory Usage Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_perfmon_counter.aspx?ref=blog\">memory<\/a>, <a title=\"Disk Space Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_disk_space.aspx?ref=blog\">disk<\/a>, and <a title=\"Web Page Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_web_page.aspx?ref=blog\">web pages<\/a>, so we\u2019ll start there.<\/span><\/p>\n<div style=\"background-color: #ffffff; padding: 0px 20px 20px 20px;\">\n<h3><span style=\"font-family: verdana,geneva; font-size: 14pt; color: #808080;\">Ping<\/span><\/h3>\n<p><span style=\"font-family: verdana,geneva;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" style=\"margin-left: 20px;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/landing_ping_report.png\" alt=\"Ping Monitoring\" width=\"357\" height=\"152\"> <span style=\"font-size: 11pt;\">All of the rest of the monitoring isn\u2019t worth much if the server or device isn\u2019t up and running. Pinging fairly often (at least once a minute) helps you stay on top of problems as they happen. The trick is to not get hit with a lot of false-positives, which can easily happen on a busy network. So make sure you\u2019re only alerted after a few pings in a row have failed.<\/span><\/span><\/p>\n<div style=\"margin: 20px; padding: 20px; border: 1px solid #888888; background-color: #ffffd0;\"><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><b>Alert Setting:<\/b> Check once a minute, alert if response &gt; 300ms, and there are 3 errors in 4 minutes.<\/span><br>\n<span style=\"font-family: verdana,geneva; font-size: 11pt;\"> <b>Chart Setting:<\/b> Show peak response times for the past 24 hours<\/span><\/div>\n<\/div>\n<div style=\"background-color: #ffffff; padding: 0px 20px 20px 20px;\">\n<h3><span style=\"font-family: verdana,geneva; font-size: 14pt; color: #808080;\">CPU<\/span><\/h3>\n<p><span style=\"font-family: verdana,geneva;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft\" style=\"margin-right: 20px;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/cputime.png\" alt=\"CPU Monitoring\" width=\"357\" height=\"152\"> <span style=\"font-size: 11pt;\">Monitor the CPU usage (normally a percentage of total possible CPU output). It\u2019s normal for it to go up and down depending on the load. Having a very low average value means your server isn\u2019t being utilized much, and that server might be a good candidate for virtualization. If the value is quite high (90%) for an extended period, the CPU might be a bottleneck. If it\u2019s at 100% for very long at all, the system is probably not functioning well.<\/span><\/span><\/p>\n<div style=\"margin: 20px; padding: 20px; border: 1px solid #888888; background-color: #ffffd0;\"><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><b>Alert Setting:<\/b> Alert on sustained usage of &gt; 90%<\/span><br>\n<span style=\"font-family: verdana,geneva; font-size: 11pt;\"> <b>Chart Setting:<\/b> Show average usage for the past 3 days to spot any unusual patterns<\/span><\/div>\n<\/div>\n<div style=\"background-color: #ffffff; padding: 0px 20px 20px 20px;\">\n<h3><span style=\"font-family: verdana,geneva; font-size: 14pt; color: #808080;\">Memory<\/span><\/h3>\n<p><span style=\"font-family: verdana,geneva;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" style=\"margin-left: 20px;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/mem.png\" alt=\"Monitoring Memory\" width=\"357\" height=\"152\"> <span style=\"font-size: 11pt;\">Measuring memory can be tricky since there are so many definitions to consider. Total physical RAM in use? (you\u2019d like to have 100% in use!). Total memory allocated (which can be greater than physical RAM)? Amount of allocated memory swapped out to disk?<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Personally, I like to know what percentage of memory in use out of how much is possibly available (ie RAM and swap\/page file). On Windows, this is the Memory\\% Committed Bytes in Use and is defined as:<\/span><\/p>\n<p style=\"padding: 20px; background-color: #eeeeee;\"><span style=\"font-family: verdana,geneva; font-size: 11pt;\">\u201c% Committed Bytes In Use is the ratio of <b>Memory\\Committed Bytes<\/b> to the <b>Memory\\Commit Limit<\/b>. Committed memory is the physical memory in use for which space has been reserved in the paging file should it need to be written to disk. The commit limit is determined by the size of the paging file. If the paging file is enlarged, the commit limit increases, and the ratio is reduced). This counter displays the current percentage value only; it is not an average.\u201d<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">If the % of total memory is high, you might be swapping to disk a lot and thus getting lower server performance. You can check this by also monitoring how much of the swap\/page file is in use.<\/span><\/p>\n<div style=\"margin: 20px; padding: 20px; border: 1px solid #888888; background-color: #ffffd0;\"><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><b>Alert Setting:<\/b> Alert on sustained % memory used &gt; 90%<\/span><br>\n<span style=\"font-family: verdana,geneva; font-size: 11pt;\"> Alert on swap\/page file use &gt; 70%<\/span><br>\n<span style=\"font-family: verdana,geneva; font-size: 11pt;\"> <b>Chart Setting:<\/b> Show 5-minute maximum for the past 3 days to spot any unusual patterns<\/span><\/div>\n<\/div>\n<div style=\"background-color: #ffffff; padding: 0px 20px 20px 20px;\">\n<h3><span style=\"font-family: verdana,geneva; font-size: 14pt; color: #808080;\">Disk<\/span><\/h3>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">I\u2019ve experienced cases where an OS has crashed because there was no free disk space. Certainly databases, <a title=\"Mail Server Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_mail_server.aspx?ref=blog\">mail servers<\/a>, etc. don\u2019t function well when they can\u2019t write their data to disk. Low disk space is a critical problem, but usually (hopefully) a slow moving one so you have time to fix it.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" style=\"margin-left: 20px;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/predictfull1.png\" alt=\"Monitor Disk Space\" width=\"357\" height=\"116\"> <span style=\"font-size: 11pt;\">One useful feature to watch for is trend analysis where the monitoring product looks at disk growth rates and tries to <a title=\"Disk Space Monitoring \u2013 Predict When Full\" href=\"\/blog\/disk-space-monitoring-predict-full-disks\/\">predict when you\u2019ll run out of disk space<\/a>. This gives you an early heads up so you can be proactive rather than reactive.<\/span><\/span><\/p>\n<div style=\"margin: 20px; padding: 20px; border: 1px solid #888888; background-color: #ffffd0;\"><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><b>Alert Settings:<\/b> Alert when free disk space &lt; 10%<\/span><br>\n<span style=\"font-family: verdana,geneva; font-size: 11pt;\"> <b>Chart Settings:<\/b> Because disk space normally changes slowly, chart 30 days so you can visually see trends<\/span><\/div>\n<\/div>\n<div style=\"background-color: #ffffff; padding: 0px 20px 20px 20px;\">\n<h3><span style=\"font-family: verdana,geneva; font-size: 14pt; color: #808080;\">Web Page Performance<\/span><\/h3>\n<p><span style=\"font-family: verdana,geneva;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" style=\"margin-left: 20px;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/webresponse.png\" alt=\"Web Page Performance\" width=\"357\" height=\"152\"> <span style=\"font-size: 11pt;\">If you or your company has a website, knowing the website is up is pretty darn important. A web page monitor should be able to check:<\/span><\/span><\/p>\n<ul>\n<li><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Is it the site up?<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Is it responding as quickly as expected?<\/span><\/li>\n<li><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Are there any errors on the page?<\/span><\/li>\n<\/ul>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Some monitoring products can also check resources (ie file is where it should be), <a title=\"SSL Certificate Hints\" href=\"\/help\/sslhints\/?ref=blog\">SSL certificate expiration<\/a>, etc.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">You\u2019ll need to decide how important the website is. If it\u2019s absolutely critical to your business, checking once every couple of minutes makes sense. If it\u2019s a personal blog, maybe once an hour is OK.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><img decoding=\"async\" class=\"alignleft\" style=\"margin-right: 20px;\" src=\"\/blog\/wp-content\/uploads\/2014\/05\/sslexp.png\" alt=\"\"> <span style=\"font-size: 11pt;\">Hint: Since checking a page often could affect stats, have a separate page (maybe in a separate folder) used just for polling if you can. That way it\u2019s easy to filter those requests out from the stats. Or if you need to hit the main page, consider adding something to the url like ?MONITOR=true for the same reason.<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">Hint 2: Some people want to check that the webserver is able to access the database. I recommend having one page that hits the database and then outputs \u201cOK\u201d or \u201c<a title=\"Monitor Databases\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/howto_monitor_database.aspx?ref=blog\">DATABASE ERROR<\/a>\u201d. Then your web page monitor can check that page and alert if it sees \u201cDATABASE ERROR\u201d.<\/span><\/p>\n<div style=\"margin: 20px; padding: 20px; border: 1px solid #888888; background-color: #ffffd0;\"><span style=\"font-family: verdana,geneva; font-size: 11pt;\"><b>Alert Setting:<\/b> Check once every couple minutes or per hour, depending on critical nature of the site. Pick a threshold for page load time that seems appropriate (alert if longer than 4 seconds for example)<\/span><br>\n<span style=\"font-family: verdana,geneva; font-size: 11pt;\"> <b>Chart Setting:<\/b> Maximum response time over the past 24 hours<\/span><\/div>\n<\/div>\n<h2><span style=\"font-family: verdana,geneva; font-size: 16pt;\">Advanced Monitoring<\/span><\/h2>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">The next article in this series will explore some advanced monitoring scenarios, like watching <a title=\"Event Log Monitor\" href=\"https:\/\/www.poweradmin.com\/help\/sm_5_4\/monitor_event_log.aspx?ref=blog\">Event Logs<\/a> for specific events (user login for example), watching log files for errors, and more.<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva; font-size: 11pt;\">If you are looking for a product that can do all of the above, we just happen to know about <a href=\"\/servermonitor\/?ref=blog\">a good one<\/a> \ud83d\ude42<\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"> <\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><a title=\"\" href=\"http:\/\/www.twitter.com\/home?status=RT:%20@poweradmn%20Monitoring%20Best%20Practices%20\/blog\/monitoring-best-practices-2\/?ref=blog\" target=\"_blank\" rel=\"nofollow\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1028\" title=\"\" src=\"\/blog\/wp-content\/uploads\/2013\/11\/tweet-this-article-button-large-300x75.png\" alt=\"Tweet this\" width=\"300\" height=\"75\" srcset=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/uploads\/2013\/11\/tweet-this-article-button-large-300x75.png 300w, https:\/\/www.poweradmin.com\/blog\/wp-content\/uploads\/2013\/11\/tweet-this-article-button-large.png 310w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\"><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><a title=\"\" href=\"https:\/\/plus.google.com\/share?url={\/blog\/monitoring-best-practices-2\/?ref=blog}\" target=\"_blank\" rel=\"nofollow\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1053\" title=\"\" src=\"\/blog\/wp-content\/uploads\/2013\/11\/google-plus-this-article-button-large2-300x75.png\" alt=\"Share on Google+\" width=\"300\" height=\"75\" srcset=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/uploads\/2013\/11\/google-plus-this-article-button-large2-300x75.png 300w, https:\/\/www.poweradmin.com\/blog\/wp-content\/uploads\/2013\/11\/google-plus-this-article-button-large2.png 310w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\"><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><em><span style=\"color: #999999;\">Photo Credit: <a href=\"http:\/\/www.flickr.com\/photos\/81295370@N00\/423471686\/\" rel=\"nofollow\" target=\"_blank\"><span style=\"color: #999999;\">midom<\/span><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a> via <a href=\"http:\/\/compfight.com\" rel=\"nofollow\" target=\"_blank\"><span style=\"color: #999999;\">Compfight<\/span><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a> <a href=\"https:\/\/creativecommons.org\/licenses\/by\/2.0\/\" rel=\"nofollow\" target=\"_blank\"><span style=\"color: #999999;\">cc<\/span><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a><\/span><\/em><\/span><br>\n<span style=\"font-family: verdana,geneva;\"> <em><span style=\"color: #999999;\"> Photo Credit: <a href=\"http:\/\/www.flickr.com\/photos\/60141638@N06\/8407332232\/\" rel=\"nofollow\" target=\"_blank\"><span style=\"color: #999999;\">One Way Stock<\/span><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a> via <a href=\"http:\/\/compfight.com\" rel=\"nofollow\" target=\"_blank\"><span style=\"color: #999999;\">Compfight<\/span><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a> <a href=\"https:\/\/creativecommons.org\/licenses\/by-nd\/2.0\/\" rel=\"nofollow\" target=\"_blank\"><span style=\"color: #999999;\">cc<\/span><img class=\"extlink-icon\" src=\"https:\/\/www.poweradmin.com\/blog\/wp-content\/plugins\/external-links-nofollow-open-in-new-tab-favicon\/images\/extlink.png\"><\/a><\/span><\/em><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We often get asked about suggested practices for monitoring servers and it\u2019s a legitimate request \u2013 there are so many moving parts it\u2019s hard to know where to start. There are two things you want your monitoring to do for you: Watch systems 24\u00d77 and alert you if there is a problem Show you current [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":2205,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,5,10,9,8],"tags":[],"class_list":["post-2157","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-general-it","category-how-to","category-power-admin","category-technical","category-windows"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/posts\/2157","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/comments?post=2157"}],"version-history":[{"count":5,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/posts\/2157\/revisions"}],"predecessor-version":[{"id":3511,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/posts\/2157\/revisions\/3511"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/media\/2205"}],"wp:attachment":[{"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/media?parent=2157"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/categories?post=2157"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.poweradmin.com\/blog\/wp-json\/wp\/v2\/tags?post=2157"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}