Metrics Reference
This page provides a complete reference of all metrics collected by webPDF. Metrics are organized by layer and component.
Application Information
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
webpdf | Gauge | version, version_date, beta | - | Constant metric with webPDF version metadata |
Architecture Overview
webPDF implements a 4-layer monitoring architecture:
The values for Duration and other measurements on this page are examples only.
Layer Duration Overhead What's Measured
─────────────────────────────────────────────────────────────────────────
HTTP (http_server) 50ms 100% Full request cycle
↓ (5ms auth/CORS)
API (api_operation) 45ms 90% 5ms: Auth, CORS, routing
↓ (5ms Jersey)
Service (service_method) 40ms 80% 5ms: Jersey serialization
↓ (5ms queue wait)
Worker Execution 35ms 70% 5ms: Queue wait time
HTTP Transport Layer
Prefix: http_server
Measures: Complete HTTP request lifecycle including network I/O, authentication, CORS, and routing overhead
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
http_server_requests | Timer | endpoint, method, uri, status | seconds | Total HTTP request duration from network arrival to response |
http_server_requests_active | Gauge | endpoint, method, uri | requests | Currently active HTTP requests in processing |
http_server_request_size | DistributionSummary | endpoint, method, uri | bytes | HTTP request body size |
http_server_response_size | DistributionSummary | endpoint, method, uri, status | bytes | HTTP response body size |
http_server_errors_client | Counter | endpoint, method, uri | errors | Client errors (4xx status codes) |
http_server_errors_server | Counter | endpoint, method, uri | errors | Server errors (5xx status codes) |
Example Tags:
endpoint:/rest/converter,/rest/pdfa,/soap/signaturemethod:GET,POST,PUT,DELETEuri:/rest/converter,/rest/pdfa/{id}status:200,400,401,500
REST API Layer
Prefix: api_operation
Measures: Jersey REST endpoint execution including request deserialization, resource method invocation, and response serialization (excludes auth/CORS overhead)
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
api_operation_duration | Timer | endpoint, operation | seconds | REST endpoint processing time |
Example Tags:
endpoint:ConverterResource,PdfaResource,SignatureResourceoperation:convert,validatePdfa,signDocument
Business Logic Layer
Prefix: service_method
Measures: Execution of business logic (web service operations) including processing time, request count, and errors (excludes Jersey overhead)
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
service_method_duration | Timer | method | seconds | Service operation duration from worker submission to response |
service_method_requests_total | Counter | method | requests | Total number of service requests executed |
service_method_errors_total | Counter | method | errors | Total number of service execution errors |
Method Tag Values:
converter- Document conversion (Office, images, etc.)pdfa- PDF/A conversion and validationsignature- Digital signaturesbarcode- Barcode generation and recognitionocr- Optical character recognitiontoolbox- PDF manipulation operationsurlconverter- URL to PDF conversion
Thread Pool Layer
Prefix: threadpool
Measures: Thread pool utilization, queue depth, and worker lifecycle metrics (live gauges, no timing)
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
threadpool_active | Gauge | pool | threads | Currently executing worker threads |
threadpool_pool_size | Gauge | pool | threads | Configured maximum thread pool capacity |
threadpool_largest_pool_size | Gauge | pool | threads | Peak number of concurrent threads observed |
threadpool_completed_total | Gauge | pool | tasks | Total number of completed worker tasks |
threadpool_rejected_total | Gauge | pool | tasks | Tasks rejected due to full queue ⚠️ |
threadpool_queue_size | Gauge | pool | tasks | Current number of tasks waiting in queue |
threadpool_queue_remaining_capacity | Gauge | pool | tasks | Available queue slots for new tasks |
threadpool_queue_capacity | Gauge | pool | tasks | Maximum configured queue capacity |
Pool Tag Values:
converter,pdfa,barcode,ocr,signature,toolbox,urlconverter
Queue Monitoring:
queue_size = 0 → No backlog, immediate processing
queue_size > 0 → Workers waiting, potential latency
queue_size == capacity → Queue full, next submit will reject!
rejected_total > 0 → CRITICAL: System overload, scale up required
threadpool_rejected_total > 0 indicates queue overflow and request rejection (503 errors). Immediate action required: scale up or investigate bottleneck.
Event Pipeline Metrics
Prefix: event_consumer, dlq
Measures: Event processing and dead letter queue operations
Event Consumer Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
event_consumer_read_batches_total | Counter | - | batches | Total consumer readGroup batches |
event_consumer_read_messages_total | Counter | - | messages | Total messages returned by consumer readGroup |
event_consumer_messages_total | Counter | outcome | messages | Event consumer outcomes by result class |
event_consumer_file_storage_delete_total | Counter | result | operations | File storage deletion results in event consumer |
event_consumer_errors_total | Counter | type | errors | Consumer loop errors by type |
Outcome Tag Values:
processed- Successfully processedretried- Temporary failure, retry scheduledacked_invalid- Invalid message acknowledgeddlq- Moved to dead letter queuemax_deliveries- Maximum delivery attempts reached
File Storage Result Tag Values:
success- File deleted successfullyretryable_failure- Temporary failure, will retrypermanent_failure- Permanent failure, cannot deleteskipped_non_cloud- Skipped (not cloud storage)
Error Type Tag Values:
redis_connection- Redis connection errorredis_timeout- Redis operation timeoutredis_exception- Other Redis exceptionunexpected- Unexpected error
Dead Letter Queue (DLQ) Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
dlq_published_total | Counter | origin | entries | DLQ publications by origin |
dlq_reprocess_claimed_total | Counter | - | entries | Total claimed replay entries |
dlq_reprocess_result_total | Counter | result | operations | DLQ replay outcomes |
dlq_lineage_blocked_on_publish_total | Counter | - | operations | DLQ publish operations blocked by lineage protection |
dlq_archive_failures_total | Counter | sink | operations | DLQ archive failures by sink type |
dlq_reprocess_cleanup_removed_total | Counter | - | entries | Removed DLQ entries during retention cleanup |
dlq_store_entries | Gauge | status | entries | Current replay store entries by status |
dlq_store_blocked_lineages | Gauge | - | lineages | Current number of permanently blocked replay lineages |
dlq_store_oldest_pending_age_seconds | Gauge | - | seconds | Age of oldest pending DLQ entry |
Origin Tag Values:
event_writer- Published by event writerevent_consumer- Published by event consumer
Reprocess Result Tag Values:
succeeded- Reprocessing succeededretry- Temporary failure, retry scheduledpermanent_failed- Permanent failureskipped_blocked_lineage- Reprocessing skipped because the entry belongs to a blocked lineage
Archive Sink Tag Values:
best_effort- Best-effort archive sinklocal_mandatory- Local mandatory archive sink
Store Status Tag Values:
pending- Waiting for reprocessingprocessing- Currently being reprocessedsucceeded- Reprocessing succeededfailed_permanent- Permanent failure
JVM Metrics
Prefix: jvm
Measures: Java Virtual Machine performance
Memory Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
jvm_memory_used_bytes | Gauge | area, id | bytes | Used memory by memory pool |
jvm_memory_committed_bytes | Gauge | area, id | bytes | Committed memory by memory pool |
jvm_memory_max_bytes | Gauge | area, id | bytes | Maximum memory by memory pool |
Area Tag Values: heap, nonheap
ID Tag Values: PS Eden Space, PS Old Gen, PS Survivor Space, Code Cache, Metaspace, etc.
Garbage Collection Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
jvm_gc_pause | Timer | action, cause | seconds | Garbage collection pause duration |
jvm_gc_memory_allocated_bytes_total | Counter | - | bytes | Total memory allocated |
jvm_gc_memory_promoted_bytes_total | Counter | - | bytes | Memory promoted to old generation |
Thread Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
jvm_threads_live | Gauge | - | threads | Current live thread count |
jvm_threads_daemon | Gauge | - | threads | Current daemon thread count |
jvm_threads_peak | Gauge | - | threads | Peak live thread count |
jvm_threads_states | Gauge | state | threads | Current thread count by state |
State Tag Values: runnable, blocked, waiting, timed-waiting, new, terminated
Class Loader Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
jvm_classes_loaded | Gauge | - | classes | Currently loaded class count |
jvm_classes_unloaded_total | Counter | - | classes | Total unloaded class count |
JVM Info Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
jvm_info | Gauge | runtime, vendor, version | - | Constant info metric with JVM runtime metadata |
System Metrics
Prefix: system, process
Measures: Operating system and process metrics
CPU Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
system_cpu_usage | Gauge | - | ratio (0-1) | System-wide CPU usage |
system_cpu_count | Gauge | - | cores | Number of available processors |
process_cpu_usage | Gauge | - | ratio (0-1) | Process CPU usage |
Uptime Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
process_uptime_seconds | Gauge | - | seconds | Process uptime since start |
process_start_time_seconds | Gauge | - | seconds | Process start time (Unix epoch) |
Disk Space Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
disk_free_bytes | Gauge | path | bytes | Free disk space on filesystem |
disk_total_bytes | Gauge | path | bytes | Total disk space on filesystem |
File Descriptor Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
process_files_open | Gauge | - | files | Currently open file descriptors |
process_files_max | Gauge | - | files | Maximum file descriptors |
Log4j2 Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
log4j2_events_total | Counter | level | events | Log events by level |
Level Tag Values: trace, debug, info, warn, error, fatal
Tomcat Metrics
Prefix: tomcat
Measures: Embedded Apache Tomcat server metrics
Session Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
tomcat_sessions_active_current | Gauge | - | sessions | Current active sessions |
tomcat_sessions_active_max | Gauge | - | sessions | Maximum active sessions observed |
tomcat_sessions_created_total | Counter | - | sessions | Total created sessions |
tomcat_sessions_expired_total | Counter | - | sessions | Total expired sessions |
tomcat_sessions_rejected_total | Counter | - | sessions | Total rejected sessions |
Thread Metrics
| Metric | Type | Tags | Unit | Description |
|---|---|---|---|---|
tomcat_threads_current | Gauge | name | threads | Current thread count in thread pool |
tomcat_threads_busy | Gauge | name | threads | Busy thread count in thread pool |
Name Tag: http-nio-8080 (depends on connector configuration)
Metric Correlation
Understanding Performance Bottlenecks
Compare metrics across layers to identify bottlenecks:
If api_operation_duration >> service_method_duration:
- High serialization/deserialization overhead
- Consider optimizing JSON parsing or reducing payload size
If service_method_duration >> threadpool_active:
- Workers are idle while business logic runs
- Processing time dominated by external dependencies
If threadpool_queue_size > 0:
- Requests waiting for available worker threads
service_method_durationincreases due to queue wait time- Consider scaling up thread pool
Common Alert Patterns
Queue Buildup:
threadpool_queue_size > 0
→ Requests waiting for worker threads
→ service_method_duration increases
→ Consider scaling thread pool
Thread Pool Saturation:
threadpool_active == threadpool_pool_size
→ All threads busy, new requests must queue
→ Scale up or investigate slow workers
System Overload (Critical):
threadpool_rejected_total > 0
→ Queue full, requests being rejected
→ http_server_errors_server increases (503)
→ IMMEDIATE ACTION REQUIRED
High Memory Usage:
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} > 0.8
→ Memory pressure, possible GC thrashing
→ Review heap settings or optimize memory usage
DLQ Buildup:
dlq_store_entries{status="pending"} > 100
→ Many failed events waiting for reprocessing
→ Check event consumer errors
See Also
- Metrics Overview - Configuration and setup
- Authentication - Secure metrics endpoint
- Integration - Prometheus and Grafana setup
- Troubleshooting - Common issues and solutions