Version: 10.0

Metrics and Monitoring

Experimental feature

Metrics and the /metrics endpoint are experimental features in the current version of webPDF. They are disabled by default and must be explicitly enabled in the configuration.

webPDF provides comprehensive metrics collection and export capabilities for monitoring application performance, system resources, and business operations. The metrics system is built on Micrometer and supports both Prometheus and JMX registries.

Overview

The metrics architecture implements a 4-layer monitoring system for REST web services that measures performance at different stages of request processing:

Web Service Type Coverage

REST (Jersey): Uses all 4 layers (HTTP, API, Service, Thread Pool)
SOAP (JAX-WS): Uses Layer 3 (Service) and Layer 4 (Thread Pool) only

The HTTP Transport and REST API layers (Layer 1 and 2) are specific to REST endpoints and do not apply to SOAP web services.

Layered Architecture

Layer 1: HTTP Transport http_server_* [REST only]

Full request cycle: authentication, CORS, routing
Measures complete request/response time

↓ framework overhead

Layer 2: REST API api_operation_* [REST only]

Jersey endpoint execution and request/response processing

↓ serialization overhead

Layer 3: Business Logic service_method_* [REST + SOAP]

Web service operations (converter, PDF/A, signature, etc.)
Core processing logic

↓ queue wait time

Layer 4: Thread Pool threadpool_* [REST + SOAP]

Worker execution and queue management
Actual worker processing time

This layered approach allows precise identification of performance bottlenecks at each stage. Each layer's metrics measure different aspects, with Layer 1 being the most comprehensive (full request) and Layer 4 focusing on the actual worker execution.

Quick Start

Enable metrics in conf/application.xml:

<application>
    <metrics enabled="true">
        <prometheus enabled="true"/>
    </metrics>
</application>

Configure authentication (required by default):

# Environment settings
export WEBPDF_METRICS_AUTH_USERNAME=prometheus
export WEBPDF_METRICS_AUTH_PASSWORD=your-secure-password

Restart webPDF server
Access metrics endpoint:

curl -u prometheus:your-secure-password http://localhost:8080/webPDF/metrics

Configure Prometheus to scrape the endpoint (see Integration Guide)

Configuration

Metrics can be configured via conf/application.xml or environment variables.

For complete configuration details including parameters, presets, and environment settings, see the Configuration Guide.

Accessing Metrics

Prometheus Endpoint

Metrics are exposed via HTTP at:

http://<HOST_URL>/webPDF/metrics

Context Path

The endpoint URL includes the context path (default: webPDF). If you have changed the context path, adjust the URL accordingly.

Authentication is required by default. See the Authentication Guide for configuration details.

Example:

curl -u prometheus:your-password http://localhost:8080/webPDF/metrics

JMX Access

Metrics are also available via JMX under the configured domain (default: webpdf.metrics).

See Java Management Extension (JMX) for JMX configuration and access.

Available Metrics

webPDF collects metrics across multiple layers and components:

HTTP Transport Layer

Request duration, throughput, and active requests
Request/response payload sizes
Error rates (4xx/5xx)

REST API Layer

Endpoint execution timing
Operation-specific metrics

Business Logic Layer

Service method execution time
Request counts and error rates
Per-operation metrics (converter, PDF/A, signature, etc.)

Thread Pool Layer

Thread pool utilization
Queue depth and capacity
Rejected tasks (critical alert indicator)

Event Pipeline (Cluster)

Event consumer processing metrics
Dead letter queue (DLQ) metrics
Reprocessing statistics

JVM & System

Application version information
Memory usage (heap/non-heap)
Garbage collection metrics
JVM runtime information
CPU usage, disk space, file descriptors
Thread counts, class loader stats

See Metric Reference for complete details on all 58+ metrics.

Memory Usage Estimation

The metrics system provides memory forecasting based on configuration:

Base overhead: ~50 MB (registries, JVM metrics, system metrics)
Per timer (with percentiles): ~1 MB
Per timer (without percentiles): ~0.1 MB
Per gauge/counter: ~0.05 MB

Example calculation for 50 endpoints:

HTTP timers: 50 × 1 MB = 50 MB
API timers: 50 × 1 MB = 50 MB
Service timers: 10 × 1 MB = 10 MB
Thread pool gauges: 10 × 0.05 MB = 0.5 MB
Base overhead: 50 MB
Total: ~160 MB

Disabling percentiles reduces this to ~65 MB.

Health Monitoring

The metrics system includes automatic health monitoring that logs periodic snapshots of meter count and heap usage:

Metrics health snapshot: meters=1523 (delta=12), heap=256 MB / 2048 MB (delta=8 MB)

Warnings are emitted when:

Meter count exceeds the configured warning threshold (default: 10000, see Configuration Guide)
Heap usage exceeds 80% of maximum
Unusual growth detected (>1000 meters or >256 MB per interval)

Health Monitoring Configuration

Health logging can be configured via environment settings:

Environment Variable	Default	Description
`WEBPDF_METRICS_HEALTH_LOG_INTERVAL`	`300000`	Health snapshot interval in milliseconds (min: 30000)
`WEBPDF_METRICS_HEALTH_WARN_COOLDOWN`	`3600000`	Warning cooldown period in milliseconds (min: 60000)

Example:

# Log health snapshots every 5 minutes, warn at most once per hour
export WEBPDF_METRICS_HEALTH_LOG_INTERVAL=300000
export WEBPDF_METRICS_HEALTH_WARN_COOLDOWN=3600000

note

Health monitoring starts automatically when metrics are enabled. The logger runs as a daemon thread and logs to the standard application logger.

Quick Links

Configuration Guide - Configure metrics, layers, presets, and environment settings
Authentication Guide - Configure Basic Auth or Bearer Token authentication
Metric Reference - Complete list of all 58+ metrics with descriptions
Integration Guide - Set up Prometheus, Grafana, and alerts
Troubleshooting - Solve common issues

Common Use Cases

Monitor Request Performance

Track request latency and throughput:

# P95 latency
http_server_requests{quantile="0.95"}

# Request rate (per second)
rate(http_server_requests_count[5m])

# Error rate
rate(http_server_errors_server_total[5m])

Monitor Resource Utilization

Track thread pool and system resources:

# Thread pool utilization
(threadpool_active / threadpool_pool_size) * 100

# Queue depth
threadpool_queue_size

# Heap usage percentage
(jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}) * 100

Detect Performance Bottlenecks

Compare metrics across layers:

# HTTP overhead (auth, CORS)
http_server_requests{quantile="0.95"} - api_operation_duration{quantile="0.95"}

# Serialization overhead
api_operation_duration{quantile="0.95"} - service_method_duration{quantile="0.95"}

Alert on Critical Conditions

# Queue overflow (immediate action required!)
increase(threadpool_rejected_total[5m]) > 0

# High error rate
(sum(rate(http_server_errors_server_total[5m])) / sum(rate(http_server_requests_count[5m]))) > 0.05

Best Practices

Production Deployments

✅ Disable percentiles to save memory (500-1000 MB reduction)
✅ Set cardinality limits to prevent metric explosion
✅ Enable authentication for security
✅ Use HTTPS for metrics endpoint (TLS configuration)
✅ Monitor health logs for warnings
✅ Set up alerts for critical conditions (queue overflow, high error rate)

Development/Testing

✅ Enable all metrics for detailed analysis
✅ Disable authentication for easy access (local only!)
✅ Enable startup logging to see all metrics
✅ Use percentiles for latency analysis

Performance

✅ Disable HTTP layer if overhead is concern (highest impact)
✅ Limit percentiles to 1-2 values (e.g., only p95)
✅ Use minimal preset for resource-constrained environments

Overview​

Layered Architecture​

Quick Start​

Configuration​

Accessing Metrics​

Prometheus Endpoint​

JMX Access​

Available Metrics​

HTTP Transport Layer​

REST API Layer​

Business Logic Layer​

Thread Pool Layer​

Event Pipeline (Cluster)​

JVM & System​

Memory Usage Estimation​

Health Monitoring​

Health Monitoring Configuration​

Quick Links​

Common Use Cases​

Monitor Request Performance​

Monitor Resource Utilization​

Detect Performance Bottlenecks​

Alert on Critical Conditions​

Best Practices​

Production Deployments​

Development/Testing​

Performance​

See Also​