As developers and system administrators, we often rely on commercial monitoring tools for our servers. However, sometimes you need a lightweight, customizable solution that you can fully control. In this guide, I'll show you how to create five practical Bash scripts that you can use to monitor various aspects of your Linux server performance.
These scripts are ideal for:
- Small to medium-sized server environments
- Developers looking to understand server monitoring basics
- Anyone who wants to automate responses to common server issues
Let's dive in!
1. CPU and Memory Usage Monitor
One of the most critical aspects of server performance is CPU and memory usage. Here's a script that monitors both and sends email alerts when thresholds are exceeded:
#!/bin/bash
# cpu_mem_monitor.sh - Monitors CPU and memory usage and sends alerts
# Configuration
THRESHOLD_CPU=80 # Alert when CPU usage exceeds 80%
THRESHOLD_MEM=80 # Alert when memory usage exceeds 80%
EMAIL="admin@yourdomain.com"
LOG_FILE="/var/log/server-monitor.log"
# Create log file if it doesn't exist
if [ ! -f $LOG_FILE ]; then
touch $LOG_FILE
chmod 644 $LOG_FILE
fi
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}
send_alert() {
log_message "$1"
echo "$1" | mail -s "Server Alert: $(hostname)" $EMAIL
}
# Get CPU usage (excluding idle time)
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | awk '{print 100 - $1}')
CPU_USAGE_INT=${CPU_USAGE%.*}
# Get memory usage percentage
MEM_USAGE=$(free | grep Mem | awk '{print $3/$2 * 100.0}')
MEM_USAGE_INT=${MEM_USAGE%.*}
log_message "CPU: ${CPU_USAGE}%, Memory: ${MEM_USAGE}%"
# Check if CPU usage exceeds threshold
if [ $CPU_USAGE_INT -gt $THRESHOLD_CPU ]; then
send_alert "HIGH CPU ALERT: Usage at ${CPU_USAGE}% (Threshold: ${THRESHOLD_CPU}%)"
fi
# Check if memory usage exceeds threshold
if [ $MEM_USAGE_INT -gt $THRESHOLD_MEM ]; then
send_alert "HIGH MEMORY ALERT: Usage at ${MEM_USAGE}% (Threshold: ${THRESHOLD_MEM}%)"
fi
How to Use This Script:
- Save it as
cpu_mem_monitor.sh
- Make it executable:
chmod +x cpu_mem_monitor.sh
- Set up a cron job to run it at regular intervals:
# Run every 5 minutes
*/5 * * * * /path/to/cpu_mem_monitor.sh
Make sure you have mailutils
or a similar package installed to enable email functionality: sudo apt-get install mailutils
2. Disk Space Monitor
Running out of disk space can cause serious issues for your services. This script monitors your disk space and sends alerts when it gets too low:
#!/bin/bash
# disk_monitor.sh - Monitors disk space usage and sends alerts
# Configuration
THRESHOLD=85 # Alert when usage exceeds 85%
EMAIL="admin@yourdomain.com"
LOG_FILE="/var/log/disk-monitor.log"
EXCLUDE_LIST=("tmpfs" "devtmpfs" "squashfs") # Filesystems to exclude
# Create log file if it doesn't exist
if [ ! -f $LOG_FILE ]; then
touch $LOG_FILE
chmod 644 $LOG_FILE
fi
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}
send_alert() {
log_message "$1"
echo -e "$1" | mail -s "Disk Space Alert: $(hostname)" $EMAIL
}
# Get disk usage for mounted filesystems
DISK_USAGE=$(df -h | awk '$NF!~/^\/$|\/boot$/ {print}')
log_message "Running disk space check"
# Check each filesystem
ALERTS=""
while read -r line; do
# Skip header line
if echo "$line" | grep -q "Filesystem"; then
continue
fi
# Extract filesystem details
FILESYSTEM=$(echo "$line" | awk '{print $1}')
MOUNT_POINT=$(echo "$line" | awk '{print $NF}')
USAGE_PERCENT=$(echo "$line" | awk '{print $5}' | sed 's/%//')
# Skip excluded filesystems
SKIP=0
for EXCL in "${EXCLUDE_LIST[@]}"; do
if echo "$FILESYSTEM" | grep -q "$EXCL"; then
SKIP=1
break
fi
done
if [ $SKIP -eq 1 ]; then
continue
fi
# Check if usage exceeds threshold
if [ "$USAGE_PERCENT" -gt "$THRESHOLD" ]; then
DETAIL=$(echo "$line" | awk '{print "Used: "$3" Free: "$4" Usage: "$5}')
ALERTS="${ALERTS}${MOUNT_POINT} - ${DETAIL}\n"
fi
done <<< "$DISK_USAGE"
# Send alert if needed
if [ ! -z "$ALERTS" ]; then
ALERT_MESSAGE="DISK SPACE ALERT: The following filesystems exceed ${THRESHOLD}% usage:\n\n${ALERTS}"
send_alert "$ALERT_MESSAGE"
fi
How to Use This Script:
- Save it as
disk_monitor.sh
- Make it executable:
chmod +x disk_monitor.sh
- Set up a cron job to run it daily or hourly:
# Run once a day at 8am
0 8 * * * /path/to/disk_monitor.sh
3. Critical Log File Monitor
This script scans your system logs for important error patterns and alerts you when they occur:
#!/bin/bash
# log_monitor.sh - Monitors critical log files for important patterns
# Configuration
EMAIL="admin@yourdomain.com"
LOG_FILE="/var/log/logmonitor.log"
# Log files to monitor with their patterns
declare -A LOG_PATTERNS
LOG_PATTERNS["/var/log/syslog"]="(error|critical|fatal|failed|failure)"
LOG_PATTERNS["/var/log/auth.log"]="(failed|invalid|error|authentication failure)"
LOG_PATTERNS["/var/log/apache2/error.log"]="(fatal|error|critical)"
LOG_PATTERNS["/var/log/mysql/error.log"]="(ERROR|CRITICAL)"
# Create log file if it doesn't exist
if [ ! -f $LOG_FILE ]; then
touch $LOG_FILE
chmod 644 $LOG_FILE
fi
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}
send_alert() {
log_message "$1"
echo -e "$1" | mail -s "Log Alert: $(hostname)" $EMAIL
}
# Store timestamp of last run to only check new logs
TIMESTAMP_FILE="/tmp/logmonitor_timestamp"
if [ -f "$TIMESTAMP_FILE" ]; then
LAST_RUN=$(cat "$TIMESTAMP_FILE")
else
# If running for the first time, check last 10 minutes
LAST_RUN=$(date -d "10 minutes ago" "+%Y-%m-%d %H:%M:%S")
fi
# Update timestamp file with current time
date "+%Y-%m-%d %H:%M:%S" > "$TIMESTAMP_FILE"
# Function to convert date to seconds since epoch for comparison
date_to_seconds() {
date -d "$1" +%s
}
LAST_RUN_SECONDS=$(date_to_seconds "$LAST_RUN")
CURRENT_TIME=$(date "+%Y-%m-%d %H:%M:%S")
CURRENT_SECONDS=$(date_to_seconds "$CURRENT_TIME")
log_message "Checking logs from $LAST_RUN to $CURRENT_TIME"
# Check each log file for patterns
ALERTS=""
for LOG_PATH in "${!LOG_PATTERNS[@]}"; do
if [ -f "$LOG_PATH" ]; then
PATTERN="${LOG_PATTERNS[$LOG_PATH]}"
# Get new log entries matching pattern
NEW_ENTRIES=$(grep -i -E "$PATTERN" "$LOG_PATH" | while read -r line; do
# Try to extract timestamp from log line (formats vary)
if [[ "$line" =~ ^[A-Za-z]{3}\ [0-9]{1,2}\ [0-9]{2}:[0-9]{2}:[0-9]{2} ]]; then
# Syslog format
LOG_DATE=$(echo "$line" | cut -d ' ' -f 1-3)
FULL_DATE=$(date -d "$LOG_DATE $(date +%Y)" "+%Y-%m-%d %H:%M:%S")
elif [[ "$line" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}\ [0-9]{2}:[0-9]{2}:[0-9]{2} ]]; then
# ISO format
FULL_DATE=$(echo "$line" | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}\ [0-9]{2}:[0-9]{2}:[0-9]{2}')
else
# If can't parse date, assume it's recent
continue
fi
# Convert log entry date to seconds
ENTRY_SECONDS=$(date_to_seconds "$FULL_DATE")
# Only show entries newer than last run
if [ "$ENTRY_SECONDS" -ge "$LAST_RUN_SECONDS" ]; then
echo "$line"
fi
done)
if [ ! -z "$NEW_ENTRIES" ]; then
ENTRY_COUNT=$(echo "$NEW_ENTRIES" | wc -l)
ALERTS="${ALERTS}$LOG_PATH - $ENTRY_COUNT new critical entries found:\n"
ALERTS="${ALERTS}$(echo "$NEW_ENTRIES" | head -n 15)\n"
# If more than 15 entries, indicate there are more
if [ "$ENTRY_COUNT" -gt 15 ]; then
ALERTS="${ALERTS}... and $(($ENTRY_COUNT - 15)) more entries (check full log)\n"
fi
ALERTS="${ALERTS}\n"
fi
else
log_message "Warning: Log file $LOG_PATH not found"
fi
done
# Send alert if needed
if [ ! -z "$ALERTS" ]; then
ALERT_MESSAGE="LOG ALERT: Critical entries found in system logs:\n\n${ALERTS}"
send_alert "$ALERT_MESSAGE"
fi
How to Use This Script:
- Save it as
log_monitor.sh
- Make it executable:
chmod +x log_monitor.sh
- Set up a cron job to run it hourly:
# Run every hour
0 * * * * /path/to/log_monitor.sh
4. Network Connection Monitor
This script monitors network connections and alerts you about unusual connection counts:
#!/bin/bash
# network_monitor.sh - Monitors network connections and detects unusual patterns
# Configuration
EMAIL="admin@yourdomain.com"
LOG_FILE="/var/log/network-monitor.log"
MAX_CONNECTIONS=200 # Alert threshold for total connections
MAX_PER_IP=50 # Alert threshold for connections per IP
# Create log file if it doesn't exist
if [ ! -f $LOG_FILE ]; then
touch $LOG_FILE
chmod 644 $LOG_FILE
fi
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}
send_alert() {
log_message "$1"
echo -e "$1" | mail -s "Network Alert: $(hostname)" $EMAIL
}
# Count total current connections
TOTAL_CONNECTIONS=$(netstat -an | grep ESTABLISHED | wc -l)
log_message "Total active connections: $TOTAL_CONNECTIONS"
# Check if total connections exceed threshold
if [ "$TOTAL_CONNECTIONS" -gt "$MAX_CONNECTIONS" ]; then
TOP_PORTS=$(netstat -an | grep ESTABLISHED | awk '{print $4}' | cut -d: -f2 | sort | uniq -c | sort -nr | head -5)
TOP_SERVICES=$(for port in $(echo "$TOP_PORTS" | awk '{print $2}'); do
service=$(grep -w "$port/tcp" /etc/services 2>/dev/null | head -1 | awk '{print $1}')
if [ -z "$service" ]; then service="unknown"; fi
echo "Port $port: $service"
done)
ALERT_MESSAGE="HIGH CONNECTION ALERT: $TOTAL_CONNECTIONS active connections (Threshold: $MAX_CONNECTIONS)\n\nTop destination ports:\n$TOP_PORTS\n\nService identification:\n$TOP_SERVICES"
send_alert "$ALERT_MESSAGE"
fi
# Check for excessive connections from single IPs
IP_CONNECTIONS=$(netstat -an | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr)
SUSPICIOUS_IPS=""
while read -r line; do
COUNT=$(echo "$line" | awk '{print $1}')
IP=$(echo "$line" | awk '{print $2}')
if [ "$COUNT" -gt "$MAX_PER_IP" ]; then
IP_DETAILS=$(whois "$IP" 2>/dev/null | grep -E "Organization|OrgName|netname|country" | head -4 | tr '\n' ' ')
SUSPICIOUS_IPS="${SUSPICIOUS_IPS}$IP - $COUNT connections - $IP_DETAILS\n"
fi
done <<< "$IP_CONNECTIONS"
# Send alert if suspicious IPs found
if [ ! -z "$SUSPICIOUS_IPS" ]; then
ALERT_MESSAGE="SUSPICIOUS CONNECTION ALERT: The following IPs have excessive connection counts (Threshold: $MAX_PER_IP):\n\n$SUSPICIOUS_IPS"
send_alert "$ALERT_MESSAGE"
fi
How to Use This Script:
- Save it as
network_monitor.sh
- Make it executable:
chmod +x network_monitor.sh
- Install required tools:
sudo apt-get install net-tools whois
- Set up a cron job to run it periodically:
# Run every 15 minutes
*/15 * * * * /path/to/network_monitor.sh
5. Service Status Monitor
This script ensures your critical services are running and can automatically restart them if they fail:
#!/bin/bash
# service_monitor.sh - Monitors critical services and restarts them if needed
# Configuration
EMAIL="admin@yourdomain.com"
LOG_FILE="/var/log/service-monitor.log"
# Define services to monitor and whether to restart them
# Format: "service_name:restart_flag (0 or 1)"
SERVICES=(
"nginx:1"
"mysql:1"
"ssh:1"
"apache2:1"
"mongodb:0" # Monitor but don't restart automatically
)
# Create log file if it doesn't exist
if [ ! -f $LOG_FILE ]; then
touch $LOG_FILE
chmod 644 $LOG_FILE
fi
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> $LOG_FILE
}
send_alert() {
log_message "$1"
echo -e "$1" | mail -s "Service Alert: $(hostname)" $EMAIL
}
# Check each service
log_message "Checking service status"
FAILED_SERVICES=""
RESTARTED_SERVICES=""
for SERVICE_DEF in "${SERVICES[@]}"; do
# Split definition into service name and restart flag
SERVICE_NAME=$(echo "$SERVICE_DEF" | cut -d: -f1)
RESTART_FLAG=$(echo "$SERVICE_DEF" | cut -d: -f2)
# Check if service is active
if systemctl is-active --quiet "$SERVICE_NAME"; then
log_message "$SERVICE_NAME is running"
else
log_message "$SERVICE_NAME is not running"
FAILED_SERVICES="${FAILED_SERVICES}$SERVICE_NAME "
if [ "$RESTART_FLAG" -eq 1 ]; then
log_message "Attempting to restart $SERVICE_NAME"
systemctl restart "$SERVICE_NAME"
# Check if restart was successful
sleep 2
if systemctl is-active --quiet "$SERVICE_NAME"; then
log_message "$SERVICE_NAME successfully restarted"
RESTARTED_SERVICES="${RESTARTED_SERVICES}$SERVICE_NAME "
else
log_message "Failed to restart $SERVICE_NAME"
fi
fi
fi
done
# Send alerts if needed
if [ ! -z "$FAILED_SERVICES" ]; then
ALERT_MESSAGE="SERVICE ALERT: The following services were detected as not running:\n$FAILED_SERVICES\n\n"
if [ ! -z "$RESTARTED_SERVICES" ]; then
ALERT_MESSAGE="${ALERT_MESSAGE}Successfully restarted services: $RESTARTED_SERVICES\n\n"
fi
# List still-failed services
STILL_FAILED=""
for SERVICE in $FAILED_SERVICES; do
if ! systemctl is-active --quiet "$SERVICE"; then
STILL_FAILED="${STILL_FAILED}$SERVICE "
fi
done
if [ ! -z "$STILL_FAILED" ]; then
ALERT_MESSAGE="${ALERT_MESSAGE}Services still not running: $STILL_FAILED\n\n"
ALERT_MESSAGE="${ALERT_MESSAGE}Please check these services manually."
fi
send_alert "$ALERT_MESSAGE"
fi
How to Use This Script:
- Save it as
service_monitor.sh
- Make it executable:
chmod +x service_monitor.sh
- Edit the
SERVICES
array to include the services you want to monitor - Set up a cron job to run it frequently:
# Run every 5 minutes
*/5 * * * * /path/to/service_monitor.sh
Setting Up as a Comprehensive Monitoring System
These scripts work well independently, but they're even more powerful when used together as part of a comprehensive monitoring system. Here's how to set everything up:
Step 1: Create a Central Directory
sudo mkdir -p /opt/server-monitor/scripts
sudo mkdir -p /opt/server-monitor/logs
Step 2: Copy All Scripts
Save all five scripts to /opt/server-monitor/scripts/
and make them executable:
sudo chmod +x /opt/server-monitor/scripts/*.sh
Step 3: Create a Central Configuration File
Create /opt/server-monitor/config.sh
:
#!/bin/bash
# Central configuration for all monitoring scripts
# Email settings
EMAIL_ADMIN="admin@yourdomain.com"
EMAIL_ALERTS="alerts@yourdomain.com"
# Thresholds
CPU_THRESHOLD=80
MEM_THRESHOLD=80
DISK_THRESHOLD=85
NET_MAX_CONNECTIONS=200
NET_MAX_PER_IP=50
# Log file locations
LOG_DIR="/opt/server-monitor/logs"
CPU_MEM_LOG="$LOG_DIR/cpu-mem-monitor.log"
DISK_LOG="$LOG_DIR/disk-monitor.log"
LOG_MONITOR_LOG="$LOG_DIR/log-monitor.log"
NETWORK_LOG="$LOG_DIR/network-monitor.log"
SERVICE_LOG="$LOG_DIR/service-monitor.log"
# Services to monitor (service:restart_flag)
SERVICES=(
"nginx:1"
"mysql:1"
"ssh:1"
"apache2:1"
"mongodb:0"
)
Step 4: Set Up a Main Wrapper Script
Create /opt/server-monitor/monitor.sh
:
#!/bin/bash
# Main monitoring wrapper script
# Source configuration
source /opt/server-monitor/config.sh
# Create log directory if it doesn't exist
if [ ! -d "$LOG_DIR" ]; then
mkdir -p "$LOG_DIR"
chmod 755 "$LOG_DIR"
fi
# Run all monitoring scripts
/opt/server-monitor/scripts/cpu_mem_monitor.sh
/opt/server-monitor/scripts/disk_monitor.sh
/opt/server-monitor/scripts/log_monitor.sh
/opt/server-monitor/scripts/network_monitor.sh
/opt/server-monitor/scripts/service_monitor.sh
# Generate daily report (optional)
if [ "$(date +%H)" -eq "6" ]; then # Run at 6 AM
echo "Daily System Health Report - $(date '+%Y-%m-%d')" > "$LOG_DIR/daily_report.txt"
echo "----------------------------------------" >> "$LOG_DIR/daily_report.txt"
echo "CPU/Memory Summary:" >> "$LOG_DIR/daily_report.txt"
grep "CPU:" "$CPU_MEM_LOG" | tail -24 >> "$LOG_DIR/daily_report.txt"
echo -e "\nDisk Usage:" >> "$LOG_DIR/daily_report.txt"
df -h >> "$LOG_DIR/daily_report.txt"
echo -e "\nService Status:" >> "$LOG_DIR/daily_report.txt"
for SERVICE in $(echo ${SERVICES[@]} | tr ' ' '\n' | cut -d: -f1); do
systemctl status "$SERVICE" | head -3 >> "$LOG_DIR/daily_report.txt"
done
# Send daily report
cat "$LOG_DIR/daily_report.txt" | mail -s "Daily System Health Report - $(hostname)" "$EMAIL_ADMIN"
fi
Step 5: Set Up the Cron Jobs
sudo crontab -e
Add these lines:
# Run the main monitoring script every 10 minutes
*/10 * * * * /opt/server-monitor/monitor.sh
# Run disk check only once per day (to avoid excessive alerts)
0 7 * * * /opt/server-monitor/scripts/disk_monitor.sh
# Rotate logs weekly
0 0 * * 0 find /opt/server-monitor/logs -type f -name "*.log" -exec cp {} {}.old \; -exec echo "" > {} \;
Conclusion
These five scripts provide a solid foundation for monitoring your Linux servers without relying on complex external tools. You can customize them to fit your specific needs and environment, and they're particularly useful for:
- Small to medium-sized deployments
- Development and staging environments
- Personal projects where commercial monitoring is overkill
- Learning how server monitoring works under the hood
As you become more comfortable with these scripts, you can extend them to perform more sophisticated monitoring and even integrate them with other tools like Prometheus or Grafana for visualization.
Remember that while these scripts are powerful, they're not a complete replacement for enterprise monitoring solutions in large, critical production environments. However, they can complement those solutions by providing a lightweight, customizable alternative that you fully control.
Next Steps
- Add a simple web UI to display the monitoring data
- Set up SMS alerts for critical issues
- Create a central monitoring server that collects data from multiple servers
- Add graphing capabilities using tools like gnuplot
What other monitoring scripts would you find useful? Let me know in the comments!
Top comments (0)