How to Monitor Failed Systemd Services on Linux Server - Complete Failure Detection Guide
Are you wondering how to automatically detect when systemd services fail on your Linux server? Need to receive immediate alerts when critical services like Nginx, MySQL, or Redis crash or fail to start? This comprehensive guide shows you...
How to Monitor Failed Systemd Services on Linux Server - Complete Failure Detection Guide
Are you wondering how to automatically detect when systemd services fail on your Linux server? Need to receive immediate alerts when critical services like Nginx, MySQL, or Redis crash or fail to start? This comprehensive guide shows you multiple methods to monitor failed systemd services, detect service failures automatically, troubleshoot service issues, and maintain system stability on your Linux server.
Why Monitoring Failed Systemd Services Matters
Systemd services are the backbone of modern Linux servers, running everything from web servers and databases to application services and background daemons. When services fail, your entire infrastructure can become unavailable, causing costly downtime and service interruptions. Learning how to monitor failed systemd services helps you detect problems immediately, troubleshoot issues quickly, and maintain high availability for your Linux server infrastructure. Automated failure detection prevents extended outages and helps you resolve service issues before they impact users.
Method 1: Check Failed Systemd Services with systemctl
The systemctl command provides built-in functionality to check for failed systemd services on Linux servers. This is the most straightforward method to identify services that have failed to start or have crashed.
Basic Failed Services Check
To see all failed systemd services:
# List all failed services
systemctl --failed
# List failed services with detailed information
systemctl --failed --no-pager
# Check if any services are failed (returns exit code)
systemctl --failed --quiet && echo "No failed services" || echo "Failed services detected"
The --failed flag shows only services in a failed state, making it easy to identify problematic services at a glance.
Check Specific Service Status
To check if a specific service has failed:
# Check if specific service is failed
systemctl is-failed nginx
# Check multiple services at once
systemctl is-failed nginx mysql redis postgresql
# Get detailed status of failed service
systemctl status servicename
The is-failed command returns "failed" if the service has failed, or "active" or "inactive" if it's in a different state.
List All Services with Status
To see the status of all services and identify failed ones:
# List all services with their status
systemctl list-units --type=service --all --no-pager
# List only failed and inactive services
systemctl list-units --type=service --state=failed,inactive
# List services with their load and active states
systemctl list-units --type=service --state=loaded
Method 2: Advanced Failed Service Detection Techniques
Beyond basic checks, you can use advanced techniques to detect and analyze failed systemd services more effectively.
Count Failed Services
To get a count of failed services for monitoring thresholds:
# Count failed services
systemctl --failed --no-pager | grep -c "failed" || echo "0"
# Count failed services (more reliable)
systemctl list-units --type=service --state=failed --no-pager | wc -l
# Check if count exceeds threshold
FAILED_COUNT=$(systemctl list-units --type=service --state=failed --no-pager | wc -l)
if [ "$FAILED_COUNT" -gt 0 ]; then
echo "Warning: $FAILED_COUNT failed services detected"
fi
Check Service Failure Reasons
To understand why services failed:
# Show detailed information about failed services
systemctl --failed --no-pager -l
# Check service logs for failure reasons
journalctl -u servicename --since "1 hour ago" | tail -50
# Show service failure timestamps
systemctl show servicename --property=ActiveEnterTimestamp,InactiveEnterTimestamp
Monitor Service Restart Frequency
To detect services that are failing repeatedly:
# Check service restart count
systemctl show servicename --property=NRestarts
# List services sorted by restart count
systemctl list-units --type=service --all --no-pager | grep -E "loaded|active"
# Monitor service state changes
journalctl -u servicename --since "24 hours ago" | grep -i "started\|stopped\|failed"
Method 3: Automated Failed Service Monitoring with Zuzia.app
Manually checking for failed systemd services works for occasional troubleshooting, but for production Linux servers, you need automated monitoring that alerts you immediately when services fail. Zuzia.app provides comprehensive failed service monitoring through scheduled command execution.
Setting Up Automated Failed Service Monitoring
-
Add Scheduled Task in Zuzia.app Dashboard
- Navigate to your server in Zuzia.app
- Click "Add Scheduled Task"
- Choose "Command Execution" as the task type
-
Configure Failed Service Check Command
- Enter command:
systemctl --failed --no-pager - Set execution frequency: Every 15 minutes for active monitoring
- Configure alert conditions: Alert when any service is failed
- Set up comparison with previous runs to detect new failures
- Enter command:
-
Set Up Notifications
- Choose notification channels (email, webhook, Slack, etc.)
- Configure alert thresholds (e.g., alert if any service fails)
- Set up escalation rules for critical services
- Configure different alert levels for different service types
Monitor Critical Services Specifically
For mission-critical services, create dedicated monitoring tasks:
# Check critical web services
systemctl is-failed nginx apache2 php-fpm
# Check database services
systemctl is-failed mysql postgresql redis
# Check application services
systemctl is-failed application-name worker-service
Zuzia.app stores all command outputs in its database, allowing you to track service failures over time, identify patterns in service issues, and detect recurring problems before they cause extended outages.
Method 4: Troubleshoot Failed Systemd Services
Once you've detected failed services, you need to troubleshoot and resolve the issues.
View Service Logs
To understand why a service failed:
# View recent service logs
journalctl -u servicename --since "1 hour ago"
# View service logs with follow mode
journalctl -u servicename -f
# View last 100 log entries
journalctl -u servicename -n 100
# View logs with priority filtering
journalctl -u servicename -p err
Check Service Dependencies
To identify if service failures are caused by dependencies:
# Show service dependencies
systemctl list-dependencies servicename
# Check if required services are running
systemctl is-active required-service
# Show service requirements
systemctl show servicename --property=Requires,Wants,After
Restart Failed Services
To attempt to restart failed services:
# Restart a failed service
systemctl restart servicename
# Reset failed state and restart
systemctl reset-failed servicename
systemctl restart servicename
# Reload service configuration and restart
systemctl daemon-reload
systemctl restart servicename
Real-World Use Cases for Failed Service Monitoring
Web Server Failure Detection
For web servers, monitor critical services:
# Check web server services
systemctl is-failed nginx apache2
# Monitor PHP processor
systemctl is-failed php-fpm php8.1-fpm
# Check SSL certificate renewal service
systemctl is-failed certbot.timer
Database Service Monitoring
For database servers, track database service failures:
# Monitor MySQL service
systemctl is-failed mysql mysqld
# Check PostgreSQL service
systemctl is-failed postgresql postgresql@14-main
# Monitor Redis service
systemctl is-failed redis redis-server
Application Service Monitoring
For application servers, monitor application services:
# Check application services
systemctl is-failed application-name
# Monitor worker services
systemctl is-failed worker-queue worker-background
# Check API services
systemctl is-failed api-service api-gateway
Best Practices for Failed Service Monitoring
1. Monitor Failed Services Frequently
Check for failed services every 15-30 minutes for active monitoring, and more frequently (every 5 minutes) for critical services. This allows you to detect failures immediately and respond quickly. Use Zuzia.app automated monitoring to check failed services continuously without manual intervention.
2. Set Up Immediate Alerts
Configure alerts to trigger immediately when any service fails. Don't wait for scheduled checks - use real-time monitoring where possible. Set up different alert levels for different service types (critical, important, optional).
3. Track Failure Patterns
Monitor service failures over time to identify patterns. Services that fail repeatedly might indicate configuration issues, resource constraints, or dependency problems. Use Zuzia.app's historical data to track failure frequency and identify root causes.
4. Automate Service Recovery
For non-critical services, consider automated recovery scripts that attempt to restart failed services. Always test recovery scripts thoroughly and monitor their effectiveness. For critical services, require manual intervention to prevent automatic actions that might cause data loss.
5. Document Service Dependencies
Maintain documentation about service dependencies and relationships. This helps you understand the impact of service failures and troubleshoot issues more effectively. Update documentation when service configurations change.
Troubleshooting Common Failed Service Issues
Service Fails to Start
If a service fails to start:
# Check service status
systemctl status servicename
# View startup logs
journalctl -u servicename --since "10 minutes ago"
# Check service configuration
systemctl cat servicename
# Verify service file syntax
systemd-analyze verify servicename.service
Service Crashes Repeatedly
If a service keeps crashing:
# Check restart count
systemctl show servicename --property=NRestarts
# View crash logs
journalctl -u servicename --since "1 hour ago" | grep -i "error\|fail\|crash"
# Check resource limits
systemctl show servicename --property=MemoryLimit,CPUQuota
# Monitor service in real-time
journalctl -u servicename -f
Service Dependencies Not Met
If a service fails due to dependencies:
# Check required services
systemctl list-dependencies servicename --reverse
# Verify dependency services are running
systemctl is-active required-service
# Check service ordering
systemctl show servicename --property=After,Before,Requires
FAQ: Common Questions About Monitoring Failed Systemd Services
How often should I check for failed systemd services on my Linux server?
We recommend checking for failed services every 15-30 minutes for active monitoring, and every 5 minutes for critical services. This allows you to detect failures immediately and respond quickly. Use Zuzia.app automated monitoring to check failed services continuously without manual intervention.
What should I do when a systemd service fails?
When a service fails, first check the service logs using journalctl -u servicename to understand why it failed. Then check service dependencies and configuration. Attempt to restart the service with systemctl restart servicename. If the service continues to fail, investigate the root cause (configuration errors, resource constraints, dependency issues) before attempting further restarts.
Can I monitor specific critical services instead of all services?
Yes, you can check specific services using systemctl is-failed servicename or by creating separate monitoring tasks in Zuzia.app for each critical service. This allows you to set different alert thresholds and notification channels for different service types. Monitoring specific services also reduces system load compared to checking all services.
How do I find out why a systemd service failed?
Use journalctl -u servicename --since "1 hour ago" to view recent service logs. Look for error messages, exceptions, or failure reasons in the logs. You can also check service status with systemctl status servicename which shows recent log entries. Check service dependencies with systemctl list-dependencies servicename to see if required services are running.
Can I automatically restart failed services?
Yes, you can create scripts that detect failed services and restart them automatically, but use caution. For critical services, automatic restart might cause data loss or corruption. Always test restart scripts thoroughly and monitor their effectiveness. Consider implementing restart limits (e.g., maximum 3 restarts per hour) to prevent restart loops.
How can I prevent services from failing in the first place?
Monitor service resource usage (CPU, memory, disk), set appropriate resource limits, ensure dependencies are properly configured, keep service configurations updated, and monitor service logs for warning signs. Use Zuzia.app to track service health metrics over time and identify issues before they cause failures. Regular maintenance and updates also help prevent service failures.
Does Zuzia.app track service failure history and patterns?
Yes, Zuzia.app stores all command outputs in its database, allowing you to track service failures over time and identify patterns. You can view historical data to see which services fail most frequently, when failures occur, and how often services need to be restarted. This helps you identify root causes and prevent recurring failures.