Observability & Monitoring

The Lit Status server includes built-in observability features with comprehensive metrics export capabilities for integration with external monitoring systems.

Overview

The observability features provide:

Server-Side Metrics Export: Built-in Prometheus and JSON format metrics export
Client-Side OpenTelemetry: Distributed tracing, metrics, and structured logging for SDK users
Function Metrics: Comprehensive execution tracking and analysis
Time-Series Data: Historical metrics with configurable granularity
Filtering: Advanced filtering by network, product, and function
Performance Monitoring: Response time and throughput analysis
Error Tracking: Detailed error rates and failure analysis

Built-in Metrics Export

The Lit Status server includes a comprehensive metrics export endpoint at /metrics/export that supports both Prometheus and JSON formats.

Prometheus Format Export

# Export all metrics in Prometheus format
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/export?format=prometheus"

# Export with filters
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/export?format=prometheus&network=mainnet&product=lit-node"

Sample Prometheus Output:

# HELP lit_status_function_total_executions Total number of function executions
# TYPE lit_status_function_total_executions counter
lit_status_function_total_executions{function="sendTransaction",network="mainnet",product="lit-node"} 1250

# HELP lit_status_function_uptime Function uptime percentage
# TYPE lit_status_function_uptime gauge
lit_status_function_uptime{function="sendTransaction",network="mainnet",product="lit-node"} 96.0

# HELP lit_status_function_response_time Average response time in milliseconds
# TYPE lit_status_function_response_time gauge
lit_status_function_response_time{function="sendTransaction",network="mainnet",product="lit-node"} 245.5

JSON Format Export

# Export all metrics in JSON format
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/export?format=json"

Sample JSON Output:

{
  "metadata": {
    "exportTime": "2024-01-15T10:30:00.000Z",
    "totalFunctions": 15,
    "filters": {
      "network": null,
      "product": null,
      "function": null
    }
  },
  "metrics": [
    {
      "function": {
        "id": "clw123456789",
        "name": "sendTransaction",
        "network": "mainnet",
        "product": "lit-node"
      },
      "metrics": {
        "totalExecutions": 1250,
        "successfulExecutions": 1200,
        "failedExecutions": 50,
        "averageResponseTime": 245.5,
        "uptime": 96.0,
        "lastExecutionTime": "2024-01-15T10:35:00.000Z"
      }
    }
  ]
}

Available Filters

Query Parameters

format - prometheus or json (default: prometheus)
network - Filter by specific network
product - Filter by specific product
function - Filter by specific function name
includeInactive - Include inactive functions (default: false)
startDate - Start date for time range (ISO string)
endDate - End date for time range (ISO string)

Get Available Filter Values

curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/filters"

Response:

{
  "networks": ["mainnet", "testnet", "goerli"],
  "products": ["lit-node", "vincent-registry", "my-app"],
  "functions": ["sendTransaction", "checkBalance", "authenticate"],
  "totalFunctions": 15,
  "activeFunctions": 12,
  "inactiveFunctions": 3
}

Integration with External Systems

Prometheus Integration

Configure Prometheus to scrape metrics:

# prometheus.yml
scrape_configs:
  - job_name: 'lit-status'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics/export'
    params:
      format: ['prometheus']
    bearer_token: 'your-api-key'
    scrape_interval: 30s

Grafana Dashboard Setup

1. Add Prometheus Data Source

Go to Configuration → Data Sources
Add Prometheus data source
Set URL: http://localhost:9090
Click Save & Test

2. Create Dashboard Panels

Function Uptime Panel

lit_status_function_uptime{function="$function",network="$network"}

Execution Rate Panel

rate(lit_status_function_total_executions{function="$function"}[5m])

Response Time Panel

lit_status_function_response_time{function="$function",network="$network"}

Error Rate Panel

rate(lit_status_function_failed_executions{function="$function"}[5m]) / 
rate(lit_status_function_total_executions{function="$function"}[5m])

Time-Series Metrics

For detailed time-series analysis, use the dedicated endpoint:

# Get hourly time-series data
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions/{functionId}/metrics/timeseries?granularity=hour"

This provides bucketed data perfect for charting and trend analysis.

Performance Monitoring

Key Metrics to Monitor

Response Time Percentiles

histogram_quantile(0.95, lit_status_function_response_time)
histogram_quantile(0.99, lit_status_function_response_time)

Throughput

sum(rate(lit_status_function_total_executions[5m])) by (network, product)

Error Rates

sum(rate(lit_status_function_failed_executions[5m])) / 
sum(rate(lit_status_function_total_executions[5m]))

Alerting Rules

Set up alerts for:

Error rate > 5%
Response time > 500ms (95th percentile)
Throughput drop > 50%
Function unavailable for > 5 minutes

Custom Monitoring Scripts

Automated Health Checks

#!/bin/bash
# health-check.sh

API_KEY="your-api-key"
BASE_URL="http://localhost:3000"

# Check server health
health=$(curl -s -H "X-API-Key: $API_KEY" "$BASE_URL/health")
status=$(echo $health | jq -r '.status')

if [ "$status" != "ok" ]; then
    echo "ALERT: Lit Status server unhealthy"
    echo $health
    exit 1
fi

echo "✅ Server healthy"

Metrics Collection Script

#!/bin/bash
# collect-metrics.sh

API_KEY="your-api-key"
BASE_URL="http://localhost:3000"
OUTPUT_DIR="/var/log/lit-status"

# Create timestamped metrics export
timestamp=$(date +%Y%m%d_%H%M%S)
curl -s -H "X-API-Key: $API_KEY" \
     "$BASE_URL/metrics/export?format=json" \
     > "$OUTPUT_DIR/metrics_$timestamp.json"

echo "✅ Metrics exported to $OUTPUT_DIR/metrics_$timestamp.json"

Production Considerations

Resource Monitoring

Monitor these server metrics:

Database connections: Keep pool usage < 80%
Memory usage: Monitor for memory leaks
Response times: Alert if > 500ms consistently
Error rates: Alert if > 5% for any function

Data Retention

Configure appropriate retention policies:

Function logs: 30 days for detailed data
Aggregated metrics: 1 year for trend analysis
Exported metrics: Archive monthly for compliance

Security

# Use environment variables for API keys
export LIT_STATUS_API_KEY="your-secure-api-key"

# Restrict metrics access to monitoring systems only
curl -H "X-API-Key: $LIT_STATUS_API_KEY" \
     "http://localhost:3000/metrics/export?format=prometheus"

Scaling Considerations

For high-volume environments:

Use read-only API keys for metrics collection
Implement metrics caching if needed
Consider database connection pooling
Set up load balancing for multiple server instances

Troubleshooting

Common Issues

Missing Metrics

# Verify function registration
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions"

# Check if functions are active
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions?includeInactive=true"

Empty Data

# Verify executions are being logged
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions/{functionId}/metrics"

Performance Issues

# Check server health and response times
time curl -H "X-API-Key: your-key" \
     "http://localhost:3000/health"

Client-Side OpenTelemetry Integration

The Lit Status SDK includes optional OpenTelemetry integration that provides comprehensive client-side observability for your function executions. This complements the server-side metrics export with distributed tracing, client-side metrics, and structured logging.

Overview

When enabled in the SDK, OpenTelemetry integration provides:

🔍 Distributed Tracing: Track execution flows across your application
📊 Client-Side Metrics: Monitor success rates, execution times, and error counts from the client perspective
📝 Structured Logging: Rich contextual logs with function metadata
🔗 Correlation: Link client-side spans with server-side metrics for complete observability

Automatic Instrumentation

The OpenTelemetry integration automatically instruments your executeAndLog calls without requiring code changes:

import { createLitStatusClient } from '@lit-protocol/lit-status-sdk';

const client = createLitStatusClient({
  url: 'http://localhost:3000',
  apiKey: 'your-api-key',
  openTelemetry: {
    enabled: true,
    serviceName: 'my-application',
    otlpEndpoint: 'http://localhost:4318', // Optional
  }
});

// This call now automatically includes telemetry
const { result, log } = await client.executeAndLog(functionId, async () => {
  return await processData();
});

Generated Telemetry Data

Client-Side Metrics

The SDK integration creates complementary metrics to the server-side metrics:

Function Execution Metrics:

lit_function_executions_total: Total executions from client perspective
lit_function_execution_duration_ms: Client-side execution duration
lit_function_errors_total: Client-side error count

Labels:

function_name: Function identifier
network: Network context
product: Product context
status: success or error
error_type: Error class name (for errors)

Distributed Traces

Each executeAndLog call creates a trace span:

execute_functionName
├── lit.function.id: "func-123"
├── lit.function.name: "processTransaction"
├── lit.network: "ethereum"
├── lit.product: "dapp-backend"
├── lit.duration_ms: 245.6
└── Status: OK | ERROR

Structured Logs

Rich contextual logs for each execution:

{
  "timestamp": "2024-01-15T10:30:45.123Z",
  "level": "INFO",
  "message": "Function processTransaction completed in 245.60ms",
  "attributes": {
    "lit.function.id": "func-123",
    "lit.function.name": "processTransaction",
    "lit.network": "ethereum", 
    "lit.product": "dapp-backend",
    "lit.duration_ms": 245.6,
    "lit.success": true
  }
}

Observability Stack Integration

Development (Console Export)

For development and debugging:

const client = createLitStatusClient({
  url: 'http://localhost:3000',
  apiKey: 'dev-key',
  openTelemetry: {
    enabled: true,
    exportToConsole: true, // See telemetry in terminal
  }
});

Production (OTLP Collector)

For production observability platforms:

const client = createLitStatusClient({
  url: 'https://api.lit.example.com',
  apiKey: process.env.LIT_API_KEY,
  openTelemetry: {
    enabled: true,
    serviceName: 'production-app',
    otlpEndpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
    exportToConsole: false,
  }
});

Complete Observability Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Your App      │    │   Lit Status    │    │   PostgreSQL    │
│                 │───▶│     Server      │───▶│    Database     │
│ (SDK + OTEL)    │    │   (Metrics)     │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │ OTLP                  │ Prometheus/JSON       │ SQL Queries
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ OTLP Collector  │    │  Monitoring     │    │   Grafana /     │
│                 │───▶│   Platform      │◀───│   Dashboards    │
│ (Traces/Metrics)│    │ (Jaeger/Grafana)│    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Benefits of Combined Observability

Complete Visibility

Client Perspective: See function execution from the application side
Server Perspective: Monitor server-side performance and storage
End-to-End Traces: Follow requests across client and server boundaries

Enhanced Debugging

Client-Side Errors: Catch errors before they reach the server
Network Issues: Identify connectivity and latency problems
Performance Analysis: Compare client vs server execution times

Production Monitoring

Distributed Systems: Monitor microservices and distributed applications
Error Attribution: Determine if errors originate from client or server
Capacity Planning: Understanding both client load and server capacity

Integration with Existing Monitoring

The client-side OpenTelemetry metrics complement the existing server-side metrics export:

Server-Side Metrics (via /metrics/export):

Stored execution history from database
Aggregated success rates and response times
Historical trending and analysis

Client-Side Metrics (via OpenTelemetry):

Real-time execution telemetry
Client-specific error tracking
Distributed tracing context

Combined Benefits:

Complete observability coverage
Client and server correlation
Historical and real-time data
Multiple export formats (Prometheus, OTLP)

Next Steps

Enable Client-Side Telemetry: Add OpenTelemetry configuration to your SDK clients
Set Up OTLP Collector: Deploy collector for production telemetry
Configure Dashboards: Create visualizations combining client and server metrics
Set Up Alerting: Monitor both client-side and server-side metrics for comprehensive coverage

For detailed setup instructions, see the SDK OpenTelemetry Integration section.

Observability & Monitoring

On this page