Metrics and Monitoring
Metrics and the /metrics endpoint are experimental features in the current version of webPDF. They are disabled by default and must be explicitly enabled in the configuration.
webPDF provides comprehensive metrics collection and export capabilities for monitoring application performance, system resources, and business operations. The metrics system is built on Micrometer and supports both Prometheus and JMX registries.
Overview
The metrics architecture implements a 4-layer monitoring system for REST web services that measures performance at different stages of request processing:
- REST (Jersey): Uses all 4 layers (HTTP, API, Service, Thread Pool)
- SOAP (JAX-WS): Uses Layer 3 (Service) and Layer 4 (Thread Pool) only
The HTTP Transport and REST API layers (Layer 1 and 2) are specific to REST endpoints and do not apply to SOAP web services.
Layered Architecture
Layer 1: HTTP Transport http_server_* [REST only]
- Full request cycle: authentication, CORS, routing
- Measures complete request/response time
↓ framework overhead
Layer 2: REST API api_operation_* [REST only]
- Jersey endpoint execution and request/response processing
↓ serialization overhead
Layer 3: Business Logic service_method_* [REST + SOAP]
- Web service operations (converter, PDF/A, signature, etc.)
- Core processing logic
↓ queue wait time
Layer 4: Thread Pool threadpool_* [REST + SOAP]
- Worker execution and queue management
- Actual worker processing time
This layered approach allows precise identification of performance bottlenecks at each stage. Each layer's metrics measure different aspects, with Layer 1 being the most comprehensive (full request) and Layer 4 focusing on the actual worker execution.
Quick Start
- Enable metrics in
conf/application.xml:
<application>
<metrics enabled="true">
<prometheus enabled="true"/>
</metrics>
</application>
- Configure authentication (required by default):
# Environment settings
export WEBPDF_METRICS_AUTH_USERNAME=prometheus
export WEBPDF_METRICS_AUTH_PASSWORD=your-secure-password
-
Restart webPDF server
-
Access metrics endpoint:
curl -u prometheus:your-secure-password http://localhost:8080/webPDF/metrics
- Configure Prometheus to scrape the endpoint (see Integration Guide)
Configuration
Metrics can be configured via conf/application.xml or environment variables.
For complete configuration details including parameters, presets, and environment settings, see the Configuration Guide.
Accessing Metrics
Prometheus Endpoint
Metrics are exposed via HTTP at:
http://<HOST_URL>/webPDF/metrics
The endpoint URL includes the context path (default: webPDF). If you have changed the context path, adjust the URL accordingly.
Authentication is required by default. See the Authentication Guide for configuration details.
Example:
curl -u prometheus:your-password http://localhost:8080/webPDF/metrics
JMX Access
Metrics are also available via JMX under the configured domain (default: webpdf.metrics).
See Java Management Extension (JMX) for JMX configuration and access.
Available Metrics
webPDF collects metrics across multiple layers and components:
HTTP Transport Layer
- Request duration, throughput, and active requests
- Request/response payload sizes
- Error rates (4xx/5xx)
REST API Layer
- Endpoint execution timing
- Operation-specific metrics
Business Logic Layer
- Service method execution time
- Request counts and error rates
- Per-operation metrics (converter, PDF/A, signature, etc.)
Thread Pool Layer
- Thread pool utilization
- Queue depth and capacity
- Rejected tasks (critical alert indicator)
Event Pipeline (Cluster)
- Event consumer processing metrics
- Dead letter queue (DLQ) metrics
- Reprocessing statistics
JVM & System
- Application version information
- Memory usage (heap/non-heap)
- Garbage collection metrics
- JVM runtime information
- CPU usage, disk space, file descriptors
- Thread counts, class loader stats
See Metric Reference for complete details on all 58+ metrics.
Memory Usage Estimation
The metrics system provides memory forecasting based on configuration:
- Base overhead: ~50 MB (registries, JVM metrics, system metrics)
- Per timer (with percentiles): ~1 MB
- Per timer (without percentiles): ~0.1 MB
- Per gauge/counter: ~0.05 MB
Example calculation for 50 endpoints:
- HTTP timers: 50 × 1 MB = 50 MB
- API timers: 50 × 1 MB = 50 MB
- Service timers: 10 × 1 MB = 10 MB
- Thread pool gauges: 10 × 0.05 MB = 0.5 MB
- Base overhead: 50 MB
- Total: ~160 MB
Disabling percentiles reduces this to ~65 MB.
Health Monitoring
The metrics system includes automatic health monitoring that logs periodic snapshots of meter count and heap usage:
Metrics health snapshot: meters=1523 (delta=12), heap=256 MB / 2048 MB (delta=8 MB)
Warnings are emitted when:
- Meter count exceeds the configured warning threshold (default: 10000, see Configuration Guide)
- Heap usage exceeds 80% of maximum
- Unusual growth detected (>1000 meters or >256 MB per interval)
Health Monitoring Configuration
Health logging can be configured via environment settings:
| Environment Variable | Default | Description |
|---|---|---|
WEBPDF_METRICS_HEALTH_LOG_INTERVAL | 300000 | Health snapshot interval in milliseconds (min: 30000) |
WEBPDF_METRICS_HEALTH_WARN_COOLDOWN | 3600000 | Warning cooldown period in milliseconds (min: 60000) |
Example:
# Log health snapshots every 5 minutes, warn at most once per hour
export WEBPDF_METRICS_HEALTH_LOG_INTERVAL=300000
export WEBPDF_METRICS_HEALTH_WARN_COOLDOWN=3600000
Health monitoring starts automatically when metrics are enabled. The logger runs as a daemon thread and logs to the standard application logger.
Quick Links
- Configuration Guide - Configure metrics, layers, presets, and environment settings
- Authentication Guide - Configure Basic Auth or Bearer Token authentication
- Metric Reference - Complete list of all 58+ metrics with descriptions
- Integration Guide - Set up Prometheus, Grafana, and alerts
- Troubleshooting - Solve common issues
Common Use Cases
Monitor Request Performance
Track request latency and throughput:
# P95 latency
http_server_requests{quantile="0.95"}
# Request rate (per second)
rate(http_server_requests_count[5m])
# Error rate
rate(http_server_errors_server_total[5m])
Monitor Resource Utilization
Track thread pool and system resources:
# Thread pool utilization
(threadpool_active / threadpool_pool_size) * 100
# Queue depth
threadpool_queue_size
# Heap usage percentage
(jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}) * 100
Detect Performance Bottlenecks
Compare metrics across layers:
# HTTP overhead (auth, CORS)
http_server_requests{quantile="0.95"} - api_operation_duration{quantile="0.95"}
# Serialization overhead
api_operation_duration{quantile="0.95"} - service_method_duration{quantile="0.95"}
Alert on Critical Conditions
# Queue overflow (immediate action required!)
increase(threadpool_rejected_total[5m]) > 0
# High error rate
(sum(rate(http_server_errors_server_total[5m])) / sum(rate(http_server_requests_count[5m]))) > 0.05
Best Practices
Production Deployments
- ✅ Disable percentiles to save memory (500-1000 MB reduction)
- ✅ Set cardinality limits to prevent metric explosion
- ✅ Enable authentication for security
- ✅ Use HTTPS for metrics endpoint (TLS configuration)
- ✅ Monitor health logs for warnings
- ✅ Set up alerts for critical conditions (queue overflow, high error rate)
Development/Testing
- ✅ Enable all metrics for detailed analysis
- ✅ Disable authentication for easy access (local only!)
- ✅ Enable startup logging to see all metrics
- ✅ Use percentiles for latency analysis
Performance
- ✅ Disable HTTP layer if overhead is concern (highest impact)
- ✅ Limit percentiles to 1-2 values (e.g., only p95)
- ✅ Use minimal preset for resource-constrained environments
See Also
- Server Addresses - All webPDF URLs including metrics endpoint
- Java Management Extension (JMX) - JMX monitoring
- Server Configuration - Configuration file details
- Admin Portal - Web-based configuration