My App

Observability & Monitoring

Built-in server metrics export and monitoring capabilities

Observability & Monitoring

The Lit Status server includes built-in observability features with comprehensive metrics export capabilities for integration with external monitoring systems.

Overview

The observability features provide:

  • Server-Side Metrics Export: Built-in Prometheus and JSON format metrics export
  • Client-Side OpenTelemetry: Distributed tracing, metrics, and structured logging for SDK users
  • Function Metrics: Comprehensive execution tracking and analysis
  • Time-Series Data: Historical metrics with configurable granularity
  • Filtering: Advanced filtering by network, product, and function
  • Performance Monitoring: Response time and throughput analysis
  • Error Tracking: Detailed error rates and failure analysis

Built-in Metrics Export

The Lit Status server includes a comprehensive metrics export endpoint at /metrics/export that supports both Prometheus and JSON formats.

Prometheus Format Export

# Export all metrics in Prometheus format
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/export?format=prometheus"

# Export with filters
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/export?format=prometheus&network=mainnet&product=lit-node"

Sample Prometheus Output:

# HELP lit_status_function_total_executions Total number of function executions
# TYPE lit_status_function_total_executions counter
lit_status_function_total_executions{function="sendTransaction",network="mainnet",product="lit-node"} 1250

# HELP lit_status_function_uptime Function uptime percentage
# TYPE lit_status_function_uptime gauge
lit_status_function_uptime{function="sendTransaction",network="mainnet",product="lit-node"} 96.0

# HELP lit_status_function_response_time Average response time in milliseconds
# TYPE lit_status_function_response_time gauge
lit_status_function_response_time{function="sendTransaction",network="mainnet",product="lit-node"} 245.5

JSON Format Export

# Export all metrics in JSON format
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/export?format=json"

Sample JSON Output:

{
  "metadata": {
    "exportTime": "2024-01-15T10:30:00.000Z",
    "totalFunctions": 15,
    "filters": {
      "network": null,
      "product": null,
      "function": null
    }
  },
  "metrics": [
    {
      "function": {
        "id": "clw123456789",
        "name": "sendTransaction",
        "network": "mainnet",
        "product": "lit-node"
      },
      "metrics": {
        "totalExecutions": 1250,
        "successfulExecutions": 1200,
        "failedExecutions": 50,
        "averageResponseTime": 245.5,
        "uptime": 96.0,
        "lastExecutionTime": "2024-01-15T10:35:00.000Z"
      }
    }
  ]
}

Available Filters

Query Parameters

  • format - prometheus or json (default: prometheus)
  • network - Filter by specific network
  • product - Filter by specific product
  • function - Filter by specific function name
  • includeInactive - Include inactive functions (default: false)
  • startDate - Start date for time range (ISO string)
  • endDate - End date for time range (ISO string)

Get Available Filter Values

curl -H "X-API-Key: your-key" \
     "http://localhost:3000/metrics/filters"

Response:

{
  "networks": ["mainnet", "testnet", "goerli"],
  "products": ["lit-node", "vincent-registry", "my-app"],
  "functions": ["sendTransaction", "checkBalance", "authenticate"],
  "totalFunctions": 15,
  "activeFunctions": 12,
  "inactiveFunctions": 3
}

Integration with External Systems

Prometheus Integration

Configure Prometheus to scrape metrics:

# prometheus.yml
scrape_configs:
  - job_name: 'lit-status'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/metrics/export'
    params:
      format: ['prometheus']
    bearer_token: 'your-api-key'
    scrape_interval: 30s

Grafana Dashboard Setup

1. Add Prometheus Data Source

  1. Go to Configuration β†’ Data Sources
  2. Add Prometheus data source
  3. Set URL: http://localhost:9090
  4. Click Save & Test

2. Create Dashboard Panels

Function Uptime Panel
lit_status_function_uptime{function="$function",network="$network"}
Execution Rate Panel
rate(lit_status_function_total_executions{function="$function"}[5m])
Response Time Panel
lit_status_function_response_time{function="$function",network="$network"}
Error Rate Panel
rate(lit_status_function_failed_executions{function="$function"}[5m]) / 
rate(lit_status_function_total_executions{function="$function"}[5m])

Time-Series Metrics

For detailed time-series analysis, use the dedicated endpoint:

# Get hourly time-series data
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions/{functionId}/metrics/timeseries?granularity=hour"

This provides bucketed data perfect for charting and trend analysis.

Performance Monitoring

Key Metrics to Monitor

Response Time Percentiles

histogram_quantile(0.95, lit_status_function_response_time)
histogram_quantile(0.99, lit_status_function_response_time)

Throughput

sum(rate(lit_status_function_total_executions[5m])) by (network, product)

Error Rates

sum(rate(lit_status_function_failed_executions[5m])) / 
sum(rate(lit_status_function_total_executions[5m]))

Alerting Rules

Set up alerts for:

  • Error rate > 5%
  • Response time > 500ms (95th percentile)
  • Throughput drop > 50%
  • Function unavailable for > 5 minutes

Custom Monitoring Scripts

Automated Health Checks

#!/bin/bash
# health-check.sh

API_KEY="your-api-key"
BASE_URL="http://localhost:3000"

# Check server health
health=$(curl -s -H "X-API-Key: $API_KEY" "$BASE_URL/health")
status=$(echo $health | jq -r '.status')

if [ "$status" != "ok" ]; then
    echo "ALERT: Lit Status server unhealthy"
    echo $health
    exit 1
fi

echo "βœ… Server healthy"

Metrics Collection Script

#!/bin/bash
# collect-metrics.sh

API_KEY="your-api-key"
BASE_URL="http://localhost:3000"
OUTPUT_DIR="/var/log/lit-status"

# Create timestamped metrics export
timestamp=$(date +%Y%m%d_%H%M%S)
curl -s -H "X-API-Key: $API_KEY" \
     "$BASE_URL/metrics/export?format=json" \
     > "$OUTPUT_DIR/metrics_$timestamp.json"

echo "βœ… Metrics exported to $OUTPUT_DIR/metrics_$timestamp.json"

Production Considerations

Resource Monitoring

Monitor these server metrics:

  • Database connections: Keep pool usage < 80%
  • Memory usage: Monitor for memory leaks
  • Response times: Alert if > 500ms consistently
  • Error rates: Alert if > 5% for any function

Data Retention

Configure appropriate retention policies:

  • Function logs: 30 days for detailed data
  • Aggregated metrics: 1 year for trend analysis
  • Exported metrics: Archive monthly for compliance

Security

# Use environment variables for API keys
export LIT_STATUS_API_KEY="your-secure-api-key"

# Restrict metrics access to monitoring systems only
curl -H "X-API-Key: $LIT_STATUS_API_KEY" \
     "http://localhost:3000/metrics/export?format=prometheus"

Scaling Considerations

For high-volume environments:

  • Use read-only API keys for metrics collection
  • Implement metrics caching if needed
  • Consider database connection pooling
  • Set up load balancing for multiple server instances

Troubleshooting

Common Issues

Missing Metrics

# Verify function registration
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions"

# Check if functions are active
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions?includeInactive=true"

Empty Data

# Verify executions are being logged
curl -H "X-API-Key: your-key" \
     "http://localhost:3000/functions/{functionId}/metrics"

Performance Issues

# Check server health and response times
time curl -H "X-API-Key: your-key" \
     "http://localhost:3000/health"

Client-Side OpenTelemetry Integration

The Lit Status SDK includes optional OpenTelemetry integration that provides comprehensive client-side observability for your function executions. This complements the server-side metrics export with distributed tracing, client-side metrics, and structured logging.

Overview

When enabled in the SDK, OpenTelemetry integration provides:

  • πŸ” Distributed Tracing: Track execution flows across your application
  • πŸ“Š Client-Side Metrics: Monitor success rates, execution times, and error counts from the client perspective
  • πŸ“ Structured Logging: Rich contextual logs with function metadata
  • πŸ”— Correlation: Link client-side spans with server-side metrics for complete observability

Automatic Instrumentation

The OpenTelemetry integration automatically instruments your executeAndLog calls without requiring code changes:

import { createLitStatusClient } from '@lit-protocol/lit-status-sdk';

const client = createLitStatusClient({
  url: 'http://localhost:3000',
  apiKey: 'your-api-key',
  openTelemetry: {
    enabled: true,
    serviceName: 'my-application',
    otlpEndpoint: 'http://localhost:4318', // Optional
  }
});

// This call now automatically includes telemetry
const { result, log } = await client.executeAndLog(functionId, async () => {
  return await processData();
});

Generated Telemetry Data

Client-Side Metrics

The SDK integration creates complementary metrics to the server-side metrics:

Function Execution Metrics:

  • lit_function_executions_total: Total executions from client perspective
  • lit_function_execution_duration_ms: Client-side execution duration
  • lit_function_errors_total: Client-side error count

Labels:

  • function_name: Function identifier
  • network: Network context
  • product: Product context
  • status: success or error
  • error_type: Error class name (for errors)

Distributed Traces

Each executeAndLog call creates a trace span:

execute_functionName
β”œβ”€β”€ lit.function.id: "func-123"
β”œβ”€β”€ lit.function.name: "processTransaction"
β”œβ”€β”€ lit.network: "ethereum"
β”œβ”€β”€ lit.product: "dapp-backend"
β”œβ”€β”€ lit.duration_ms: 245.6
└── Status: OK | ERROR

Structured Logs

Rich contextual logs for each execution:

{
  "timestamp": "2024-01-15T10:30:45.123Z",
  "level": "INFO",
  "message": "Function processTransaction completed in 245.60ms",
  "attributes": {
    "lit.function.id": "func-123",
    "lit.function.name": "processTransaction",
    "lit.network": "ethereum", 
    "lit.product": "dapp-backend",
    "lit.duration_ms": 245.6,
    "lit.success": true
  }
}

Observability Stack Integration

Development (Console Export)

For development and debugging:

const client = createLitStatusClient({
  url: 'http://localhost:3000',
  apiKey: 'dev-key',
  openTelemetry: {
    enabled: true,
    exportToConsole: true, // See telemetry in terminal
  }
});

Production (OTLP Collector)

For production observability platforms:

const client = createLitStatusClient({
  url: 'https://api.lit.example.com',
  apiKey: process.env.LIT_API_KEY,
  openTelemetry: {
    enabled: true,
    serviceName: 'production-app',
    otlpEndpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
    exportToConsole: false,
  }
});

Complete Observability Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Your App      β”‚    β”‚   Lit Status    β”‚    β”‚   PostgreSQL    β”‚
β”‚                 │───▢│     Server      │───▢│    Database     β”‚
β”‚ (SDK + OTEL)    β”‚    β”‚   (Metrics)     β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β”‚ OTLP                  β”‚ Prometheus/JSON       β”‚ SQL Queries
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OTLP Collector  β”‚    β”‚  Monitoring     β”‚    β”‚   Grafana /     β”‚
β”‚                 │───▢│   Platform      │◀───│   Dashboards    β”‚
β”‚ (Traces/Metrics)β”‚    β”‚ (Jaeger/Grafana)β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benefits of Combined Observability

Complete Visibility

  • Client Perspective: See function execution from the application side
  • Server Perspective: Monitor server-side performance and storage
  • End-to-End Traces: Follow requests across client and server boundaries

Enhanced Debugging

  • Client-Side Errors: Catch errors before they reach the server
  • Network Issues: Identify connectivity and latency problems
  • Performance Analysis: Compare client vs server execution times

Production Monitoring

  • Distributed Systems: Monitor microservices and distributed applications
  • Error Attribution: Determine if errors originate from client or server
  • Capacity Planning: Understanding both client load and server capacity

Integration with Existing Monitoring

The client-side OpenTelemetry metrics complement the existing server-side metrics export:

Server-Side Metrics (via /metrics/export):

  • Stored execution history from database
  • Aggregated success rates and response times
  • Historical trending and analysis

Client-Side Metrics (via OpenTelemetry):

  • Real-time execution telemetry
  • Client-specific error tracking
  • Distributed tracing context

Combined Benefits:

  • Complete observability coverage
  • Client and server correlation
  • Historical and real-time data
  • Multiple export formats (Prometheus, OTLP)

Next Steps

  1. Enable Client-Side Telemetry: Add OpenTelemetry configuration to your SDK clients
  2. Set Up OTLP Collector: Deploy collector for production telemetry
  3. Configure Dashboards: Create visualizations combining client and server metrics
  4. Set Up Alerting: Monitor both client-side and server-side metrics for comprehensive coverage

For detailed setup instructions, see the SDK OpenTelemetry Integration section.