Observability & Monitoring
Built-in server metrics export and monitoring capabilities
Observability & Monitoring
The Lit Status server includes built-in observability features with comprehensive metrics export capabilities for integration with external monitoring systems.
Overview
The observability features provide:
- Server-Side Metrics Export: Built-in Prometheus and JSON format metrics export
- Client-Side OpenTelemetry: Distributed tracing, metrics, and structured logging for SDK users
- Function Metrics: Comprehensive execution tracking and analysis
- Time-Series Data: Historical metrics with configurable granularity
- Filtering: Advanced filtering by network, product, and function
- Performance Monitoring: Response time and throughput analysis
- Error Tracking: Detailed error rates and failure analysis
Built-in Metrics Export
The Lit Status server includes a comprehensive metrics export endpoint at /metrics/export that supports both Prometheus and JSON formats.
Prometheus Format Export
# Export all metrics in Prometheus format
curl -H "X-API-Key: your-key" \
"http://localhost:3000/metrics/export?format=prometheus"
# Export with filters
curl -H "X-API-Key: your-key" \
"http://localhost:3000/metrics/export?format=prometheus&network=mainnet&product=lit-node"Sample Prometheus Output:
# HELP lit_status_function_total_executions Total number of function executions
# TYPE lit_status_function_total_executions counter
lit_status_function_total_executions{function="sendTransaction",network="mainnet",product="lit-node"} 1250
# HELP lit_status_function_uptime Function uptime percentage
# TYPE lit_status_function_uptime gauge
lit_status_function_uptime{function="sendTransaction",network="mainnet",product="lit-node"} 96.0
# HELP lit_status_function_response_time Average response time in milliseconds
# TYPE lit_status_function_response_time gauge
lit_status_function_response_time{function="sendTransaction",network="mainnet",product="lit-node"} 245.5JSON Format Export
# Export all metrics in JSON format
curl -H "X-API-Key: your-key" \
"http://localhost:3000/metrics/export?format=json"Sample JSON Output:
{
"metadata": {
"exportTime": "2024-01-15T10:30:00.000Z",
"totalFunctions": 15,
"filters": {
"network": null,
"product": null,
"function": null
}
},
"metrics": [
{
"function": {
"id": "clw123456789",
"name": "sendTransaction",
"network": "mainnet",
"product": "lit-node"
},
"metrics": {
"totalExecutions": 1250,
"successfulExecutions": 1200,
"failedExecutions": 50,
"averageResponseTime": 245.5,
"uptime": 96.0,
"lastExecutionTime": "2024-01-15T10:35:00.000Z"
}
}
]
}Available Filters
Query Parameters
format-prometheusorjson(default: prometheus)network- Filter by specific networkproduct- Filter by specific productfunction- Filter by specific function nameincludeInactive- Include inactive functions (default: false)startDate- Start date for time range (ISO string)endDate- End date for time range (ISO string)
Get Available Filter Values
curl -H "X-API-Key: your-key" \
"http://localhost:3000/metrics/filters"Response:
{
"networks": ["mainnet", "testnet", "goerli"],
"products": ["lit-node", "vincent-registry", "my-app"],
"functions": ["sendTransaction", "checkBalance", "authenticate"],
"totalFunctions": 15,
"activeFunctions": 12,
"inactiveFunctions": 3
}Integration with External Systems
Prometheus Integration
Configure Prometheus to scrape metrics:
# prometheus.yml
scrape_configs:
- job_name: 'lit-status'
static_configs:
- targets: ['localhost:3000']
metrics_path: '/metrics/export'
params:
format: ['prometheus']
bearer_token: 'your-api-key'
scrape_interval: 30sGrafana Dashboard Setup
1. Add Prometheus Data Source
- Go to Configuration β Data Sources
- Add Prometheus data source
- Set URL:
http://localhost:9090 - Click Save & Test
2. Create Dashboard Panels
Function Uptime Panel
lit_status_function_uptime{function="$function",network="$network"}Execution Rate Panel
rate(lit_status_function_total_executions{function="$function"}[5m])Response Time Panel
lit_status_function_response_time{function="$function",network="$network"}Error Rate Panel
rate(lit_status_function_failed_executions{function="$function"}[5m]) /
rate(lit_status_function_total_executions{function="$function"}[5m])Time-Series Metrics
For detailed time-series analysis, use the dedicated endpoint:
# Get hourly time-series data
curl -H "X-API-Key: your-key" \
"http://localhost:3000/functions/{functionId}/metrics/timeseries?granularity=hour"This provides bucketed data perfect for charting and trend analysis.
Performance Monitoring
Key Metrics to Monitor
Response Time Percentiles
histogram_quantile(0.95, lit_status_function_response_time)
histogram_quantile(0.99, lit_status_function_response_time)Throughput
sum(rate(lit_status_function_total_executions[5m])) by (network, product)Error Rates
sum(rate(lit_status_function_failed_executions[5m])) /
sum(rate(lit_status_function_total_executions[5m]))Alerting Rules
Set up alerts for:
- Error rate > 5%
- Response time > 500ms (95th percentile)
- Throughput drop > 50%
- Function unavailable for > 5 minutes
Custom Monitoring Scripts
Automated Health Checks
#!/bin/bash
# health-check.sh
API_KEY="your-api-key"
BASE_URL="http://localhost:3000"
# Check server health
health=$(curl -s -H "X-API-Key: $API_KEY" "$BASE_URL/health")
status=$(echo $health | jq -r '.status')
if [ "$status" != "ok" ]; then
echo "ALERT: Lit Status server unhealthy"
echo $health
exit 1
fi
echo "β
Server healthy"Metrics Collection Script
#!/bin/bash
# collect-metrics.sh
API_KEY="your-api-key"
BASE_URL="http://localhost:3000"
OUTPUT_DIR="/var/log/lit-status"
# Create timestamped metrics export
timestamp=$(date +%Y%m%d_%H%M%S)
curl -s -H "X-API-Key: $API_KEY" \
"$BASE_URL/metrics/export?format=json" \
> "$OUTPUT_DIR/metrics_$timestamp.json"
echo "β
Metrics exported to $OUTPUT_DIR/metrics_$timestamp.json"Production Considerations
Resource Monitoring
Monitor these server metrics:
- Database connections: Keep pool usage < 80%
- Memory usage: Monitor for memory leaks
- Response times: Alert if > 500ms consistently
- Error rates: Alert if > 5% for any function
Data Retention
Configure appropriate retention policies:
- Function logs: 30 days for detailed data
- Aggregated metrics: 1 year for trend analysis
- Exported metrics: Archive monthly for compliance
Security
# Use environment variables for API keys
export LIT_STATUS_API_KEY="your-secure-api-key"
# Restrict metrics access to monitoring systems only
curl -H "X-API-Key: $LIT_STATUS_API_KEY" \
"http://localhost:3000/metrics/export?format=prometheus"Scaling Considerations
For high-volume environments:
- Use read-only API keys for metrics collection
- Implement metrics caching if needed
- Consider database connection pooling
- Set up load balancing for multiple server instances
Troubleshooting
Common Issues
Missing Metrics
# Verify function registration
curl -H "X-API-Key: your-key" \
"http://localhost:3000/functions"
# Check if functions are active
curl -H "X-API-Key: your-key" \
"http://localhost:3000/functions?includeInactive=true"Empty Data
# Verify executions are being logged
curl -H "X-API-Key: your-key" \
"http://localhost:3000/functions/{functionId}/metrics"Performance Issues
# Check server health and response times
time curl -H "X-API-Key: your-key" \
"http://localhost:3000/health"Client-Side OpenTelemetry Integration
The Lit Status SDK includes optional OpenTelemetry integration that provides comprehensive client-side observability for your function executions. This complements the server-side metrics export with distributed tracing, client-side metrics, and structured logging.
Overview
When enabled in the SDK, OpenTelemetry integration provides:
- π Distributed Tracing: Track execution flows across your application
- π Client-Side Metrics: Monitor success rates, execution times, and error counts from the client perspective
- π Structured Logging: Rich contextual logs with function metadata
- π Correlation: Link client-side spans with server-side metrics for complete observability
Automatic Instrumentation
The OpenTelemetry integration automatically instruments your executeAndLog calls without requiring code changes:
import { createLitStatusClient } from '@lit-protocol/lit-status-sdk';
const client = createLitStatusClient({
url: 'http://localhost:3000',
apiKey: 'your-api-key',
openTelemetry: {
enabled: true,
serviceName: 'my-application',
otlpEndpoint: 'http://localhost:4318', // Optional
}
});
// This call now automatically includes telemetry
const { result, log } = await client.executeAndLog(functionId, async () => {
return await processData();
});Generated Telemetry Data
Client-Side Metrics
The SDK integration creates complementary metrics to the server-side metrics:
Function Execution Metrics:
lit_function_executions_total: Total executions from client perspectivelit_function_execution_duration_ms: Client-side execution durationlit_function_errors_total: Client-side error count
Labels:
function_name: Function identifiernetwork: Network contextproduct: Product contextstatus:successorerrorerror_type: Error class name (for errors)
Distributed Traces
Each executeAndLog call creates a trace span:
execute_functionName
βββ lit.function.id: "func-123"
βββ lit.function.name: "processTransaction"
βββ lit.network: "ethereum"
βββ lit.product: "dapp-backend"
βββ lit.duration_ms: 245.6
βββ Status: OK | ERRORStructured Logs
Rich contextual logs for each execution:
{
"timestamp": "2024-01-15T10:30:45.123Z",
"level": "INFO",
"message": "Function processTransaction completed in 245.60ms",
"attributes": {
"lit.function.id": "func-123",
"lit.function.name": "processTransaction",
"lit.network": "ethereum",
"lit.product": "dapp-backend",
"lit.duration_ms": 245.6,
"lit.success": true
}
}Observability Stack Integration
Development (Console Export)
For development and debugging:
const client = createLitStatusClient({
url: 'http://localhost:3000',
apiKey: 'dev-key',
openTelemetry: {
enabled: true,
exportToConsole: true, // See telemetry in terminal
}
});Production (OTLP Collector)
For production observability platforms:
const client = createLitStatusClient({
url: 'https://api.lit.example.com',
apiKey: process.env.LIT_API_KEY,
openTelemetry: {
enabled: true,
serviceName: 'production-app',
otlpEndpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
exportToConsole: false,
}
});Complete Observability Architecture
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Your App β β Lit Status β β PostgreSQL β
β βββββΆβ Server βββββΆβ Database β
β (SDK + OTEL) β β (Metrics) β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
β OTLP β Prometheus/JSON β SQL Queries
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β OTLP Collector β β Monitoring β β Grafana / β
β βββββΆβ Platform ββββββ Dashboards β
β (Traces/Metrics)β β (Jaeger/Grafana)β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββBenefits of Combined Observability
Complete Visibility
- Client Perspective: See function execution from the application side
- Server Perspective: Monitor server-side performance and storage
- End-to-End Traces: Follow requests across client and server boundaries
Enhanced Debugging
- Client-Side Errors: Catch errors before they reach the server
- Network Issues: Identify connectivity and latency problems
- Performance Analysis: Compare client vs server execution times
Production Monitoring
- Distributed Systems: Monitor microservices and distributed applications
- Error Attribution: Determine if errors originate from client or server
- Capacity Planning: Understanding both client load and server capacity
Integration with Existing Monitoring
The client-side OpenTelemetry metrics complement the existing server-side metrics export:
Server-Side Metrics (via /metrics/export):
- Stored execution history from database
- Aggregated success rates and response times
- Historical trending and analysis
Client-Side Metrics (via OpenTelemetry):
- Real-time execution telemetry
- Client-specific error tracking
- Distributed tracing context
Combined Benefits:
- Complete observability coverage
- Client and server correlation
- Historical and real-time data
- Multiple export formats (Prometheus, OTLP)
Next Steps
- Enable Client-Side Telemetry: Add OpenTelemetry configuration to your SDK clients
- Set Up OTLP Collector: Deploy collector for production telemetry
- Configure Dashboards: Create visualizations combining client and server metrics
- Set Up Alerting: Monitor both client-side and server-side metrics for comprehensive coverage
For detailed setup instructions, see the SDK OpenTelemetry Integration section.