Metrics Integration
This guide shows how to integrate webPDF metrics with monitoring systems like Prometheus and Grafana.
Prometheus Integration
Prometheus is the recommended monitoring system for webPDF metrics. It scrapes the metrics endpoint periodically and stores time-series data.
Basic Configuration
Configure Prometheus to scrape the metrics endpoint with authentication:
Using Bearer Token Authentication
prometheus.yml:
scrape_configs:
- job_name: 'webpdf'
static_configs:
- targets: ['localhost:8080']
metrics_path: '/webPDF/metrics'
bearer_token: 'your-secure-api-token'
scrape_interval: 15s
Using Basic Authentication
prometheus.yml:
scrape_configs:
- job_name: 'webpdf'
static_configs:
- targets: ['localhost:8080']
metrics_path: '/webPDF/metrics'
basic_auth:
username: 'prometheus'
password: 'your-secure-password'
scrape_interval: 15s
Production Configuration
For production deployments, store credentials in separate files:
File-Based Basic Authentication
prometheus.yml:
scrape_configs:
- job_name: 'webpdf'
static_configs:
- targets: ['webpdf-prod-01:8080', 'webpdf-prod-02:8080']
metrics_path: '/webPDF/metrics'
basic_auth:
username: 'prometheus'
password_file: '/etc/prometheus/webpdf_password.txt'
scrape_interval: 15s
scrape_timeout: 10s
/etc/prometheus/webpdf_password.txt:
your-secure-password
Set restrictive permissions:
chmod 600 /etc/prometheus/webpdf_password.txt
chown prometheus:prometheus /etc/prometheus/webpdf_password.txt
File-Based Bearer Token
prometheus.yml:
scrape_configs:
- job_name: 'webpdf'
static_configs:
- targets: ['webpdf-prod-01:8080', 'webpdf-prod-02:8080']
metrics_path: '/webPDF/metrics'
bearer_token_file: '/etc/prometheus/webpdf_token.txt'
scrape_interval: 15s
scrape_timeout: 10s
/etc/prometheus/webpdf_token.txt:
your-secure-api-token
Multiple Instances
For monitoring multiple webPDF instances, use labels to identify them:
scrape_configs:
- job_name: 'webpdf'
static_configs:
- targets: ['webpdf-prod-01:8080']
labels:
environment: 'production'
datacenter: 'us-east-1'
instance_name: 'webpdf-prod-01'
- targets: ['webpdf-prod-02:8080']
labels:
environment: 'production'
datacenter: 'us-west-1'
instance_name: 'webpdf-prod-02'
- targets: ['webpdf-staging:8080']
labels:
environment: 'staging'
datacenter: 'us-east-1'
instance_name: 'webpdf-staging'
metrics_path: '/webPDF/metrics'
basic_auth:
username: 'prometheus'
password_file: '/etc/prometheus/webpdf_password.txt'
scrape_interval: 15s
HTTPS Configuration
When webPDF uses TLS, configure HTTPS scraping:
scrape_configs:
- job_name: 'webpdf'
scheme: https
static_configs:
- targets: ['webpdf.example.com:8443']
metrics_path: '/webPDF/metrics'
bearer_token_file: '/etc/prometheus/webpdf_token.txt'
tls_config:
ca_file: /etc/prometheus/certs/ca.crt
# For self-signed certificates (not recommended):
# insecure_skip_verify: true
scrape_interval: 15s
Verify Scraping
Check Prometheus targets page to verify scraping is working:
http://prometheus-server:9090/targets
Look for:
- State: UP (green) = successful scraping
- State: DOWN (red) = scraping failed, check authentication/network
Grafana Integration
Grafana provides visualization and dashboards for Prometheus metrics.
Add Prometheus Data Source
- Open Grafana:
http://grafana-server:3000 - Navigate to Configuration → Data Sources
- Click Add data source
- Select Prometheus
- Configure:
- URL:
http://prometheus-server:9090 - Access: Server (default)
- URL:
- Click Save & Test
Example Dashboard Panels
Request Rate
# Total request rate (requests per second)
rate(http_server_requests_count[5m])
# Request rate by endpoint
sum by (endpoint) (rate(http_server_requests_count[5m]))
# Request rate by status code
sum by (status) (rate(http_server_requests_count[5m]))
Error Rate
# Total error rate (4xx + 5xx)
sum(rate(http_server_errors_client_total[5m])) + sum(rate(http_server_errors_server_total[5m]))
# Error percentage
(sum(rate(http_server_errors_server_total[5m])) / sum(rate(http_server_requests_count[5m]))) * 100
# Errors by endpoint
sum by (endpoint) (rate(http_server_errors_server_total[5m]))
Response Time (Latency)
# P95 latency
http_server_requests{quantile="0.95"}
# P99 latency
http_server_requests{quantile="0.99"}
# Average latency (5min window)
rate(http_server_requests_sum[5m]) / rate(http_server_requests_count[5m])
# Latency by endpoint
http_server_requests{quantile="0.95", endpoint="/rest/converter"}
Thread Pool Utilization
# Thread pool utilization percentage
(threadpool_active / threadpool_pool_size) * 100
# Queue depth
threadpool_queue_size
# Queue utilization percentage
(threadpool_queue_size / threadpool_queue_capacity) * 100
# Rejected tasks (critical!)
increase(threadpool_rejected_total[5m])
Memory Usage
# Heap memory usage percentage
(jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}) * 100
# Heap used (MB)
jvm_memory_used_bytes{area="heap"} / 1024 / 1024
# GC pause time rate
rate(jvm_gc_pause_sum[5m])
CPU Usage
# Process CPU usage (percentage)
process_cpu_usage * 100
# System CPU usage (percentage)
system_cpu_usage * 100
Service Method Performance
# Service method request rate
sum by (method) (rate(service_method_requests_total[5m]))
# Service method error rate
sum by (method) (rate(service_method_errors_total[5m]))
# Service method duration P95
service_method_duration{quantile="0.95"}
Alert Rules
Create alerts in Prometheus to notify on critical conditions:
prometheus-alerts.yml:
groups:
- name: webpdf_alerts
interval: 30s
rules:
# High error rate alert
- alert: WebPDFHighErrorRate
expr: |
(sum(rate(http_server_errors_server_total[5m])) / sum(rate(http_server_requests_count[5m]))) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "webPDF high error rate detected"
description: "Error rate is {{ $value | humanizePercentage }} (threshold: 5%)"
# Thread pool saturation
- alert: WebPDFThreadPoolSaturated
expr: |
(threadpool_active / threadpool_pool_size) > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "webPDF thread pool {{ $labels.pool }} saturated"
description: "Thread pool utilization is {{ $value | humanizePercentage }}"
# Queue overflow (critical!)
- alert: WebPDFQueueOverflow
expr: |
increase(threadpool_rejected_total[5m]) > 0
labels:
severity: critical
annotations:
summary: "webPDF queue overflow - requests being rejected!"
description: "{{ $value }} tasks rejected in pool {{ $labels.pool }}"
# High memory usage
- alert: WebPDFHighMemoryUsage
expr: |
(jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"}) > 0.85
for: 15m
labels:
severity: warning
annotations:
summary: "webPDF high memory usage"
description: "Heap usage is {{ $value | humanizePercentage }}"
# High P99 latency
- alert: WebPDFHighLatency
expr: |
http_server_requests{quantile="0.99"} > 5
for: 10m
labels:
severity: warning
annotations:
summary: "webPDF high latency detected"
description: "P99 latency is {{ $value }}s for {{ $labels.endpoint }}"
# Service is down
- alert: WebPDFServiceDown
expr: |
up{job="webpdf"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "webPDF service is down"
description: "Instance {{ $labels.instance }} is not responding"
# DLQ buildup
- alert: WebPDFDLQBuildup
expr: |
dlq_store_entries{status="pending"} > 100
for: 30m
labels:
severity: warning
annotations:
summary: "webPDF DLQ has many pending entries"
description: "{{ $value }} pending DLQ entries need attention"
Load alerts into Prometheus:
# prometheus.yml
rule_files:
- 'prometheus-alerts.yml'
Other Monitoring Systems
JMX Integration
Metrics are also exposed via JMX under the domain webpdf.metrics (configurable).
See Java Management Extension (JMX) for details.
JConsole
jconsole localhost:9999
Navigate to MBeans → webpdf.metrics
VisualVM
- Install VisualVM
- Connect to webPDF JVM
- Open MBeans tab
- Navigate to webpdf.metrics
Datadog Integration
Use Prometheus scraping with Datadog Agent:
datadog.yaml:
prometheus_url: http://localhost:9090
metrics:
- http_server_*
- threadpool_*
- jvm_*
- service_method_*
New Relic Integration
Use Prometheus Remote Write:
prometheus.yml:
remote_write:
- url: https://metric-api.newrelic.com/prometheus/v1/write?prometheus_server=webpdf
bearer_token_file: /etc/prometheus/newrelic_token.txt
Performance Considerations
Scrape Interval
- 15 seconds: Good balance for production (default recommendation)
- 30 seconds: Reduced load, suitable for less critical monitoring
- 5 seconds: High-resolution monitoring, increases load
Cardinality Management
High cardinality (many unique tag combinations) creates many time series:
<metrics enabled="true">
<application maxEndpoints="100"
maxUris="500"
maxStatusCodes="20"/>
</metrics>
See Configuration for details.
Retention
Configure Prometheus retention based on needs:
# 30 days retention
prometheus --storage.tsdb.retention.time=30d
# 100GB maximum storage
prometheus --storage.tsdb.retention.size=100GB
Troubleshooting Integration
Prometheus Can't Scrape
- Check authentication: Verify credentials in
prometheus.yml - Check network:
curl -u username:password http://webpdf:8080/webPDF/metrics - Check TLS: Verify certificate configuration if using HTTPS
- Check logs: Review Prometheus logs for error messages
No Data in Grafana
- Check data source: Test Prometheus connection in Grafana
- Check query: Verify PromQL syntax in panel query
- Check time range: Ensure time range includes scraped data
- Check labels: Verify label names match your metrics
High Prometheus Memory Usage
- Reduce cardinality: Set limits in webPDF metrics configuration
- Reduce retention: Lower
--storage.tsdb.retention.time - Disable unused metrics: Disable metric layers not needed
See Also
- Metrics Overview - Main metrics documentation
- Configuration Guide - Configure metrics, layers, and presets
- Authentication - Secure metrics endpoint
- Metric Reference - Complete list of all metrics
- Troubleshooting - Common issues