Metrics Troubleshooting
This guide helps diagnose and resolve common issues with webPDF metrics.
High Memory Usage
Metrics can consume significant memory, especially with percentiles enabled.
Symptoms
- High JVM heap usage
- Frequent garbage collections
OutOfMemoryErrorin logs- Server startup warnings about metric memory usage
Diagnosis
Check server logs for memory forecasts:
Metrics memory forecast (heuristic): 2048 MB (warning threshold: 1024 MB)
Metrics health snapshot: meters=15234 (delta=1200), heap=1800 MB / 2048 MB (delta=256 MB)
Solutions
1. Disable Percentiles (Most Effective)
Percentiles consume ~1 MB per timer. Disabling can reduce memory by 50-80%.
conf/application.xml:
<application>
<metrics enabled="true">
<application percentilesEnabled="false" />
</metrics>
</application>
Memory savings: ~500-1000 MB for typical deployment
2. Set Cardinality Limits
Limit unique tag combinations to prevent metric explosion:
<application>
<metrics enabled="true">
<application maxEndpoints="100"
maxUris="500"
maxStatusCodes="20" />
</metrics>
</application>
Memory savings: ~200-500 MB depending on traffic patterns
3. Disable Unused Layers
Turn off metric layers you don't need:
<application>
<metrics enabled="true">
<httpMetrics enabled="false" /> <!-- Highest overhead -->
<apiMetrics enabled="true" />
<serviceMetrics enabled="true" />
<threadPoolMetrics enabled="true" />
</metrics>
</application>
To disable event consumer and DLQ metrics, set WEBPDF_METRICS_EVENT_PIPELINE_ENABLED=false.
Memory savings: ~100-300 MB per disabled layer
4. Increase JVM Heap
If metrics are essential, increase heap size:
webPDF.vmoptions / webPDF.service.vmoptions:
-Xms2048m
-Xmx4096m
See Java Configuration for details.
Missing Metrics
Metrics endpoint is accessible but returns no data or partial data.
Symptoms
- Empty response from
/webPDF/metrics - Some metrics missing
- Prometheus shows "No data"
Diagnosis Steps
1. Check Global Enable
Verify in logs:
Metrics manager globally disabled by configuration
Fix (via XML):
<application>
<metrics enabled="true">
</metrics>
</application>
Or via Environment Settings:
export WEBPDF_METRICS_ENABLED=true
2. Check Registry Configuration
Verify in logs:
No registry enabled - Metrics will not be exported
Fix:
<application>
<metrics enabled="true">
<prometheus enabled="true" />
</metrics>
</application>
3. Check Layer Configuration
Specific metrics missing? Check layer settings:
<application>
<metrics enabled="true">
<httpMetrics enabled="true" /> <!-- http_server_* metrics -->
<apiMetrics enabled="true" /> <!-- api_operation_* metrics -->
<serviceMetrics enabled="true" /> <!-- service_method_* metrics -->
<threadPoolMetrics enabled="true" /> <!-- threadpool_* metrics -->
<jvmMetrics enabled="true" /> <!-- jvm_* metrics -->
<systemMetrics enabled="true" /> <!-- system_*, process_* -->
</metrics>
</application>
Event consumer and DLQ metrics are controlled by WEBPDF_METRICS_EVENT_PIPELINE_ENABLED.
4. Check Endpoint Configuration
Verify endpoint is enabled:
<application>
<metrics enabled="true">
<application endpointEnabled="true" />
</metrics>
</application>
5. Verify Authentication
See Authentication Troubleshooting below.
Authentication Issues
Cannot access metrics endpoint due to authentication failures.
401 Unauthorized
Symptoms
$ curl http://localhost:8080/webPDF/metrics
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="Metrics", Bearer
Unauthorized: Invalid credentials
Diagnosis
- Check configured authentication methods:
Server logs:
Metrics authentication enabled: Basic Auth (username=prometheus)
# or
Metrics authentication enabled: Bearer Token
# or
Metrics authentication is required but no credentials configured!
- Verify credentials:
Environment variables:
# Linux/Mac
echo $WEBPDF_METRICS_AUTH_USERNAME
echo $WEBPDF_METRICS_AUTH_PASSWORD
echo $WEBPDF_METRICS_AUTH_TOKEN
# Windows
echo %WEBPDF_METRICS_AUTH_USERNAME%
echo %WEBPDF_METRICS_AUTH_PASSWORD%
echo %WEBPDF_METRICS_AUTH_TOKEN%
Solutions
Set credentials via environment settings:
export WEBPDF_METRICS_AUTH_USERNAME=prometheus
export WEBPDF_METRICS_AUTH_PASSWORD=secure-password
Test with curl:
# Basic Auth
curl -u prometheus:secure-password http://localhost:8080/webPDF/metrics
# Bearer Token
curl -H "Authorization: Bearer your-token" http://localhost:8080/webPDF/metrics
Restart webPDF server after configuration changes.
Wrong Authentication Method
Symptoms
Using Bearer Token but only Basic Auth is configured (or vice versa).
Diagnosis
Check WWW-Authenticate header in 401 response:
Basic realm="Metrics"only → Only Basic Auth configuredBeareronly → Only Bearer Token configured- Both → Either method accepted
Solution
Use the configured authentication method or configure the method you want to use.
Disable Authentication (Development Only)
<application>
<metrics enabled="true">
<auth enabled="false" />
</metrics>
</application>
This exposes metrics publicly. Only use in development/trusted environments.
Performance Impact
Metrics collection adds overhead to request processing.
Symptoms
- Increased request latency
- Higher CPU usage
- More memory allocations
Expected Overhead
| Layer | Overhead per Request | Component |
|---|---|---|
| HTTP metrics | ~0.1-0.2 ms | HTTP request instrumentation |
| API metrics | ~0.05-0.1 ms | REST API instrumentation |
| Service metrics | ~0.02-0.05 ms | Service operation instrumentation |
| Thread pool metrics | Negligible | Gauge reads |
Total expected overhead: ~0.2-0.4 ms per request (< 1% for typical 50ms requests)
Diagnosis
Compare request duration with/without metrics:
With metrics:
http_server_requests{quantile="0.95"}
Disable metrics temporarily:
<metrics enabled="false" />
Restart and measure performance difference.
Solutions
1. Disable HTTP Layer (Highest Overhead)
HTTP metrics have the most overhead due to frequent execution:
<metrics enabled="true">
<httpMetrics enabled="false" />
<apiMetrics enabled="true" />
<serviceMetrics enabled="true" />
<threadPoolMetrics enabled="true" />
</metrics>
Overhead reduction: ~50-60% of metrics overhead
2. Reduce Percentile Count
Fewer percentiles = faster recording:
<metrics enabled="true">
<application percentilesEnabled="true">
<percentiles>
<percentile>0.95</percentile>
</percentiles>
</application>
</metrics>
3. Consider Minimal Configuration
Only essential metrics:
<application>
<metrics enabled="true">
<httpMetrics enabled="false" />
<apiMetrics enabled="false" />
<serviceMetrics enabled="true" />
<threadPoolMetrics enabled="true" />
<jvmMetrics enabled="false" />
<systemMetrics enabled="false" />
<application percentilesEnabled="false" />
</metrics>
</application>
Prometheus Scraping Issues
Prometheus can't scrape metrics or reports errors.
Target Down in Prometheus
Symptoms
Prometheus targets page shows webPDF as DOWN (red).
Diagnosis
Check network connectivity:
curl -v http://webpdf:8080/webPDF/metrics
Check authentication:
curl -u prometheus:password http://webpdf:8080/webPDF/metrics
Check Prometheus logs:
level=error msg="Error scraping target" target=webpdf err="authentication required"
Solutions
- Fix authentication in prometheus.yml:
basic_auth:
username: 'prometheus'
password: 'correct-password'
- Fix network/firewall:
- Ensure Prometheus can reach webPDF server
- Check firewall rules
- Verify DNS resolution
- Fix TLS configuration:
scheme: https
tls_config:
ca_file: /path/to/ca.crt
Scrape Timeout
Symptoms
level=warn msg="Scrape took longer than timeout" target=webpdf
Solutions
Increase scrape timeout:
scrape_configs:
- job_name: 'webpdf'
scrape_timeout: 15s # Increase from default 10s
scrape_interval: 30s
Or reduce metric count:
- Set cardinality limits
- Disable unused layers
Health Monitoring Issues
Metrics health logger reports warnings.
Meter Count Warning
Log Message
Metrics health warning: meters=12500 (threshold=10000), heap=1200 MB / 2048 MB
Meaning
More metrics than expected, potential memory issue.
Solutions
- Increase warning threshold (if memory is okay):
System property:
-Dwebpdf.metrics.application.metric.count.warning.threshold=15000
Or in conf/application.xml:
<metrics enabled="true">
<application metricCountWarningThreshold="15000" />
</metrics>
- Reduce metric count:
- Set cardinality limits
- Disable unused layers
High Heap Usage Warning
Log Message
Metrics health warning: heap=1700 MB / 2048 MB (83% usage)
Solutions
See High Memory Usage above.
Unusual Growth Warning
Log Message
Metrics health snapshot: meters=15000 (delta=1200), heap=1800 MB / 2048 MB (delta=512 MB)
Metrics health warning: growth: metersDelta=1200, heapDelta=512 MB
Meaning
Rapid increase in metrics or memory, possible metric explosion.
Diagnosis
Check for unbounded cardinality:
- URI tags with unique IDs (
/user/123,/user/456, ...) - Endpoint tags with dynamic values
- Status codes beyond typical range
Solutions
Set cardinality limits:
<metrics enabled="true">
<application maxEndpoints="100"
maxUris="500"
maxStatusCodes="20" />
</metrics>
Use templated URIs:
Ensure REST resources use path templates such as /user/{id} instead of concrete values such as
/user/123 or /user/456.
Metric Cardinality Explosion
Too many unique tag combinations create excessive metrics.
Symptoms
- Rapid memory growth
- Hundreds of thousands of metrics
- Prometheus high memory usage
- Slow metric queries
Common Causes
- Concrete IDs in URIs:
http_server_requests{uri="/document/abc-123-def"}
http_server_requests{uri="/document/xyz-456-ghi"}
...thousands more...
Should be: {uri="/document/{id}"}
-
Dynamic endpoint names: Each unique endpoint creates new time series
-
Unbounded status codes: Custom status codes creating many unique values
Solutions
1. Enable Cardinality Limits
Prevents unbounded growth:
<metrics enabled="true">
<application maxEndpoints="100"
maxUris="500"
maxStatusCodes="20" />
</metrics>
When limit reached, new unique values are denied (existing metrics continue).
2. Fix URI Templates
Ensure REST resources use path templates such as /document/{id} instead of concrete identifiers
such as /document/abc-123-def.
3. Monitor Cardinality
Check metrics count:
curl -s http://localhost:8080/webPDF/metrics | grep -c "^http_server_requests"
Should be: Hundreds to low thousands Warning: Tens of thousands Critical: Hundreds of thousands
Log Messages Reference
Startup Messages
✓ Using the metrics configuration: MetricsConfiguration{enabled=true, ...}
✓ Metrics startup memory baseline (measured): before=150 MB, after=180 MB, delta=30 MB
✓ Metrics authentication enabled: Basic Auth (username=prometheus)
✓ Metrics health logger started (interval=300s, warnCooldown=3600s)
Warning Messages
⚠ Metrics configuration warnings: More than 5 percentiles configured - high memory usage expected
⚠ Metrics memory forecast (heuristic): 1536 MB (warning threshold: 1024 MB) - consider disabling percentiles
⚠ Metrics count 12000 exceeds warning threshold 10000
⚠ Metrics authentication disabled (recommended for development mode only)
⚠ Metrics authentication is required but no credentials configured!
Error Messages
✗ Metrics manager globally disabled by configuration
✗ No registry available - Metrics disabled
✗ MetricsManager not found in ServletContext - authentication disabled
Getting Help
If issues persist after trying these solutions:
- Check server logs for detailed error messages
- Enable debug logging for metrics components
- Review configuration in
conf/application.xml - Test endpoint directly with curl
- Verify Prometheus configuration and logs
See Also
- Metrics Overview - Main metrics documentation
- Configuration Guide - Configure metrics, layers, and presets
- Authentication - Configure credentials
- Metric Reference - Complete metrics list
- Integration - Prometheus and Grafana setup