Core - Authentication & JWT Management¶
Port: 5000 Database: Redis (session storage) + PostgreSQL (optional) Repository: hivematrix-core Version: 1.0
Table of Contents¶
- Overview
- Architecture
- Core Responsibilities
- Technology Stack
- Production Features
- API Reference
- Authentication Endpoints
- Token Management
- Service-to-Service
- Public Endpoints
- Authentication Flows
- User Login Flow
- Token Validation Flow
- Logout Flow
- Service-to-Service Flow
- Session Management
- Redis-Backed Sessions
- Session Lifecycle
- Session Revocation
- Permission Levels
- Configuration
- Environment Variables
- RSA Key Generation
- Security
- Rate Limiting
- Error Handling
- SSL/TLS
- Monitoring & Observability
- Health Checks
- Logging
- Metrics
- Development
- Running Locally
- Testing
- API Documentation
- Troubleshooting
- See Also
Overview¶
Core is the central authentication authority for the entire HiveMatrix platform. It serves as the bridge between Keycloak (OAuth2 identity provider) and HiveMatrix services, converting OAuth tokens into platform-specific JWTs with session management and revocation support.
Core Responsibilities¶
- OAuth2 Token Exchange
- Receives Keycloak OAuth2 access tokens from Nexus
- Validates tokens with Keycloak's userinfo endpoint
-
Converts to HiveMatrix-signed JWTs
-
JWT Token Management
- Issues RS256-signed JWTs with embedded session IDs
- Validates token signatures and expiration
- Tracks active sessions with revocation capability
-
Distributes public keys via JWKS endpoint
-
Session Management
- Creates persistent sessions in Redis (with in-memory fallback)
- Tracks session expiration (1-hour default)
- Supports explicit revocation (logout)
-
Probabilistic cleanup of expired sessions
-
Permission Management
- Maps Keycloak groups to permission levels
- Embeds permission levels in JWT claims
-
Supports four-tier permission model
-
Service-to-Service Authentication
- Issues short-lived service tokens (5 minutes)
- Validates calling and target services
- Enables trusted inter-service communication
Architecture¶
Technology Stack¶
- Framework: Flask 3.0.0
- Session Storage: Redis (with in-memory fallback)
- OAuth Library: Authlib
- JWT Library: PyJWT with cryptography
- Rate Limiting: Flask-Limiter with Redis backend
- Logging: Structured JSON logging with correlation IDs
- API Documentation: Flasgger (OpenAPI/Swagger)
- Health Checks: Custom HealthChecker library
Production Features¶
Core includes several production-ready features introduced in version 4.1:
Redis Session Persistence¶
- Sessions stored in Redis survive service restarts
- Automatic TTL expiration (1 hour)
- Graceful fallback to thread-safe in-memory storage if Redis unavailable
- Session count tracking for monitoring
Per-User Rate Limiting¶
- Rate limits applied per user (JWT subject) instead of IP address
- Prevents shared IP abuse scenarios
- Configurable limits per endpoint:
/login: 10 requests/minute/auth: 20 requests/minute/api/token/exchange: 20 requests/minute- Internal endpoints (validate, service-token): exempt
- Falls back to IP-based limiting for unauthenticated requests
Structured Logging¶
- JSON-formatted logs with correlation IDs for distributed tracing
- Request/response logging with timing
- Error context preservation
- Centralized logging to Helm service
- Configurable log levels (DEBUG, INFO, WARNING, ERROR)
- Can be disabled for development (ENABLE_JSON_LOGGING=false)
RFC 7807 Problem Details¶
- Standardized machine-readable error responses
- Consistent error format across all endpoints
- Includes error type, title, detail, status, and instance
- Enables automated error handling by clients
OpenAPI/Swagger Documentation¶
- Auto-generated API documentation at
/docs - Interactive API testing interface
- Request/response schemas
- Authentication examples
- Available at
http://localhost:5000/docs
Comprehensive Health Checks¶
- Component-level health monitoring at
/health - Checks: Redis, disk space, response latency
- Status: healthy (200), degraded, or unhealthy (503)
- Useful for Kubernetes readiness/liveness probes
API Reference¶
Core exposes a RESTful API with the following endpoints:
Authentication Endpoints¶
GET /login¶
Initiates the OAuth2 login flow by redirecting to Keycloak.
Query Parameters:
- next (optional): URL to redirect to after successful login
Rate Limit: 10 requests/minute
Response: 302 Redirect to Keycloak authorization endpoint
Example:
GET /auth¶
OAuth2 callback endpoint. Handles authorization code from Keycloak.
Query Parameters:
- code (required, provided by Keycloak): Authorization code
- error (optional): Error code if authentication failed
Rate Limit: 20 requests/minute
Response:
- Success: 302 Redirect to next_url or user's preferred home page with JWT token
- Error: 302 Redirect back to /login
User Home Page Logic:
1. Attempts to fetch user's preferred home page from Codex (/api/public/user/home-page)
2. Falls back through: beacon → knowledgetree → codex → helm
3. Redirects to {nexus_url}/{service}/?token={jwt}
Example Response:
POST /logout¶
Ends the user session and revokes tokens.
Query Parameters:
- redirect (optional): URL to redirect after logout (default: /)
Actions Performed: 1. Revokes refresh token with Keycloak 2. Revokes access token with Keycloak 3. Clears Flask session 4. Redirects to Keycloak logout endpoint 5. Sets cache control headers to prevent back button issues
Response: 302 Redirect with cleared session cookie
Example:
Token Management¶
POST /api/token/exchange¶
Exchanges a Keycloak OAuth2 access token for a HiveMatrix JWT.
Rate Limit: 20 requests/minute
Request Body:
Alternative: Send access token in Authorization: Bearer <token> header
Response (200 OK):
JWT Payload:
{
"iss": "hivematrix-core",
"sub": "user-uuid-1234-5678-9abc-def012345678",
"jti": "session-abc123...",
"name": "John Doe",
"email": "john.doe@example.com",
"preferred_username": "johndoe",
"permission_level": "admin",
"groups": ["/admins", "/technicians"],
"iat": 1700000000,
"exp": 1700003600
}
Process:
1. Validates Keycloak token with userinfo endpoint (via Nexus proxy)
2. Extracts user information and group membership
3. Determines permission level based on groups
4. Creates session in Redis with session ID
5. Mints HiveMatrix JWT with session ID as jti claim
6. Returns JWT to caller (typically Nexus)
Error Responses:
- 400 Bad Request: No access token provided
- 401 Unauthorized: Invalid Keycloak token
- 500 Internal Server Error: Token exchange failed
Example:
TOKEN=$(curl -s -X POST http://localhost:5000/api/token/exchange \
-H "Content-Type: application/json" \
-d '{"access_token": "keycloak_token_here"}' | jq -r '.token')
POST /api/token/validate¶
Validates a HiveMatrix JWT and checks session status.
Rate Limit: Exempt (called frequently by services)
Request Body:
Alternative: Send token in Authorization: Bearer <token> header
Response (200 OK):
{
"valid": true,
"user": {
"sub": "user-uuid-1234",
"name": "John Doe",
"email": "john.doe@example.com",
"preferred_username": "johndoe",
"permission_level": "admin",
"groups": ["/admins"]
}
}
Validation Checks: 1. JWT signature verification (RS256) 2. Token expiration check 3. Session ID (jti) existence in Redis 4. Session not expired (1-hour TTL) 5. Session not revoked (logout)
Error Responses:
- 400 Bad Request: No token provided
- 401 Unauthorized: Token invalid, expired, or session revoked
Example:
curl -X POST http://localhost:5000/api/token/validate \
-H "Content-Type: application/json" \
-d '{"token": "'"$TOKEN"'"}'
POST /api/token/revoke¶
Revokes a session (logout). Token will fail validation after revocation.
Request Body:
Response (200 OK):
Process: 1. Decodes JWT to extract session ID (jti claim) 2. Marks session as revoked in Redis 3. Session remains in Redis until TTL expires (for audit) 4. Subsequent validation requests will fail
Error Responses:
- 400 Bad Request: No token provided or no session ID
- 401 Unauthorized: Invalid token
- 404 Not Found: Session not found or already revoked
Example:
curl -X POST http://localhost:5000/api/token/revoke \
-H "Content-Type: application/json" \
-d '{"token": "'"$TOKEN"'"}'
Service-to-Service¶
POST /service-token¶
Generates a short-lived JWT for service-to-service communication.
Rate Limit: Exempt (protected by token caching in service_client.py)
Request Body:
Validation:
- Service names must match pattern: ^[a-z0-9_-]{1,50}$
- Both services must exist in services.json
Response (200 OK):
Service Token Payload:
{
"iss": "hivematrix-core",
"sub": "service:codex",
"calling_service": "codex",
"target_service": "ledger",
"type": "service",
"iat": 1700000000,
"exp": 1700000300
}
Token Characteristics:
- Expiration: 5 minutes (300 seconds)
- Type: service (distinguishes from user tokens)
- Subject: service:{calling_service}
- Purpose: Enables trusted inter-service API calls
Error Responses:
- 400 Bad Request: Missing parameters, invalid format, or unknown service
Example:
SERVICE_TOKEN=$(curl -s -X POST http://localhost:5000/service-token \
-H "Content-Type: application/json" \
-d '{"calling_service": "codex", "target_service": "ledger"}' | jq -r '.token')
Public Endpoints¶
GET /.well-known/jwks.json¶
JSON Web Key Set (JWKS) endpoint. Publishes Core's public RSA key.
Rate Limit: Exempt (public endpoint)
Response (200 OK):
{
"keys": [
{
"kty": "RSA",
"alg": "RS256",
"kid": "hivematrix-signing-key-1",
"use": "sig",
"n": "0vx7agoebGcQSuuPiLJXZptN9nndrQmbXEps2aiAFbWhM78LhWx...",
"e": "AQAB"
}
]
}
Purpose: - Services fetch this on startup to verify JWT signatures - Standard RFC 7517 JWKS format - No authentication required (public key is public information)
Usage by Services:
from jwt import PyJWKClient
jwks_client = PyJWKClient('http://localhost:5000/.well-known/jwks.json')
signing_key = jwks_client.get_signing_key_from_jwt(token)
GET /health¶
Comprehensive health check endpoint.
Rate Limit: Exempt
Response (200 OK - Healthy):
{
"service": "core",
"status": "healthy",
"timestamp": "2025-11-22T10:30:00.000Z",
"checks": {
"redis": {
"status": "healthy",
"latency_ms": 2,
"connected_clients": 5,
"used_memory_mb": 2.45
},
"disk": {
"status": "healthy",
"usage_percent": 45.67,
"free_gb": 123.45,
"total_gb": 250.0
}
}
}
Response (503 Service Unavailable - Degraded):
{
"service": "core",
"status": "degraded",
"timestamp": "2025-11-22T10:30:00.000Z",
"checks": {
"redis": {
"status": "unhealthy",
"error": "Connection refused"
},
"disk": {
"status": "healthy",
"usage_percent": 45.67,
"free_gb": 123.45,
"total_gb": 250.0
}
}
}
Health Status Determination: - healthy (200): All checks passing - degraded (503): Non-critical issues (e.g., Redis down but in-memory fallback working) - unhealthy (503): Critical component failure (e.g., disk full)
Checks Performed: 1. Redis: Connectivity, latency, memory usage, client count 2. Disk Space: Usage percentage, free space - Healthy: < 85% - Degraded: 85-95% - Unhealthy: >= 95%
GET /docs¶
Interactive OpenAPI/Swagger API documentation.
Response: HTML page with interactive API explorer
Features:
- Browse all endpoints with descriptions
- View request/response schemas
- Test API calls directly from browser
- Authentication examples
- Download OpenAPI spec at /apispec.json
Access: http://localhost:5000/docs
Authentication Flows¶
User Login Flow¶
Complete OAuth2 authorization code flow:
┌─────────┐ ┌───────┐ ┌──────────┐ ┌──────┐
│ Browser │ │ Nexus │ │ Keycloak │ │ Core │
└────┬────┘ └───┬───┘ └────┬─────┘ └───┬──┘
│ │ │ │
│ 1. GET / │ │ │
├────────────────────────>│ │ │
│ │ │ │
│ 2. No session │ │ │
│ Redirect to Core login │ │ │
│<────────────────────────┤ │ │
│ │ │ │
│ 3. GET /login?next=/ │ │ │
├──────────────────────────────────────────────────────────────────────────────>│
│ │ │ │
│ 4. Redirect to Keycloak │ │
│<──────────────────────────────────────────────────────────────────────────────┤
│ │ │ │
│ 5. GET /realms/.../auth │ │
├──────────────────────────────────────────────────>│ │
│ │ │ │
│ 6. Login form (username/password) │ │
│<──────────────────────────────────────────────────┤ │
│ │ │ │
│ 7. POST credentials │ │ │
├──────────────────────────────────────────────────>│ │
│ │ │ │
│ 8. Redirect to /auth with code │ │
│<──────────────────────────────────────────────────┤ │
│ │ │ │
│ 9. GET /auth?code=xxx │ │ │
├──────────────────────────────────────────────────────────────────────────────>│
│ │ │ │
│ │ │ 10. Exchange code │
│ │ │ for access token │
│ │ │<─────────────────────────┤
│ │ │ │
│ │ │ 11. Access token │
│ │ ├─────────────────────────>│
│ │ │ │
│ │ │ 12. Validate with │
│ │ │ /userinfo endpoint │
│ │ │<─────────────────────────┤
│ │ │ │
│ │ │ 13. User info + groups │
│ │ ├─────────────────────────>│
│ │ │ │
│ │ │ 14. Create session │
│ │ │ in Redis, mint JWT │
│ │ │ with jti=session_id │
│ │ │ │
│ │ 15. Get user's home │ │
│ │ page from Codex │ │
│ │<─────────────────────────────────────────────────────┤
│ │ │ │
│ 16. Redirect to home page with JWT │ │
│<──────────────────────────────────────────────────────────────────────────────┤
│ │ │ │
│ 17. GET /beacon/?token=jwt │ │
├────────────────────────>│ │ │
│ │ │ │
│ │ 18. Store JWT in session cookie │
│ │ │ │
│ 19. Authenticated page │ │ │
│<────────────────────────┤ │ │
Key Points: - User never enters credentials on HiveMatrix pages (handled by Keycloak) - Core never stores passwords (delegated to Keycloak) - JWT includes session ID for revocation capability - Permission level determined from Keycloak group membership
Token Validation Flow¶
On every authenticated request to any HiveMatrix service:
┌─────────┐ ┌───────┐ ┌──────────┐ ┌──────┐
│ Browser │ │ Nexus │ │ Codex │ │ Core │
└────┬────┘ └───┬───┘ └────┬─────┘ └───┬──┘
│ │ │ │
│ 1. GET /codex/ │ │ │
├─────────────────>│ │ │
│ │ │ │
│ │ 2. Extract JWT │ │
│ │ from session │ │
│ │ │ │
│ │ 3. POST /api/token/validate │
│ │ {token: jwt} │ │
│ ├───────────────────────────────────────>│
│ │ │ │
│ │ │ 4. Verify JWT │
│ │ │ signature with │
│ │ │ public key │
│ │ │ │
│ │ │ 5. Check session │
│ │ │ in Redis: │
│ │ │ - Exists? │
│ │ │ - Not expired? │
│ │ │ - Not revoked? │
│ │ │ │
│ │ 6. {valid: true, user: {...}} │
│ │<───────────────────────────────────────┤
│ │ │ │
│ │ 7. Proxy to Codex with │
│ │ Authorization: Bearer <jwt> │
│ ├──────────────────>│ │
│ │ │ │
│ │ 8. Codex verifies│ │
│ │ JWT independently│ │
│ │ using public key │ │
│ │ │ │
│ │ 9. Response │ │
│ │<──────────────────┤ │
│ │ │ │
│ 10. Response │ │ │
│<─────────────────┤ │ │
Validation Sequence: 1. Nexus validates token with Core 2. Core checks signature, expiration, and session status 3. If valid, Nexus proxies request with JWT to backend service 4. Backend service independently verifies JWT signature using Core's public key 5. Backend service can trust JWT claims (permission_level, groups, etc.)
Why Dual Validation? - Nexus → Core: Checks session status (revocation, expiration) - Backend → JWT: Verifies signature (prevents tampering) - Provides defense in depth and distributed trust model
Logout Flow¶
Complete session termination:
┌─────────┐ ┌───────┐ ┌──────────┐ ┌──────┐
│ Browser │ │ Nexus │ │ Keycloak │ │ Core │
└────┬────┘ └───┬───┘ └────┬─────┘ └───┬──┘
│ │ │ │
│ 1. Click Logout│ │ │
├─────────────────>│ │ │
│ │ │ │
│ │ 2. Extract JWT │ │
│ │ from session │ │
│ │ │ │
│ │ 3. POST /api/token/revoke │
│ │ {token: jwt} │ │
│ ├───────────────────────────────────────>│
│ │ │ │
│ │ │ 4. Extract jti │
│ │ │ from JWT │
│ │ │ │
│ │ │ 5. Mark session │
│ │ │ as revoked in │
│ │ │ Redis │
│ │ │ │
│ │ 6. {message: "Session revoked"} │
│ │<───────────────────────────────────────┤
│ │ │ │
│ │ 7. Redirect to │ │
│ │ Core /logout │ │
│ │───────────────────────────────────────>│
│ │ │ │
│ │ │ 8. Revoke tokens │
│ │ │ with Keycloak │
│ │ │<──────────────────┤
│ │ │ │
│ │ │ 9. Redirect to │
│ │ │ Keycloak logout │
│ │ │ endpoint │
│ │<───────────────────────────────────────┤
│ │ │ │
│ 10. Redirect to Keycloak logout │ │
│<─────────────────┤ │ │
│ │ │ │
│ 11. GET /realms/.../logout │ │
├──────────────────────────────────────>│ │
│ │ │ │
│ 12. Clear Keycloak session │ │
│ Redirect to post_logout_redirect_uri│ │
│<──────────────────────────────────────┤ │
│ │ │ │
│ 13. Back to home page (logged out) │ │
Logout Steps: 1. Nexus revokes HiveMatrix session with Core 2. Core marks session as revoked in Redis (jti) 3. Core revokes Keycloak tokens (refresh + access) 4. Core clears Flask session cookie 5. Core redirects to Keycloak logout endpoint 6. Keycloak clears its session 7. User redirected to home page (logged out)
Security Considerations: - Session revocation is immediate (subsequent validation fails) - Revoked sessions remain in Redis until TTL expires (audit trail) - Cache control headers prevent back button from showing cached pages - Session cookie explicitly deleted
Service-to-Service Flow¶
How services authenticate with each other:
┌────────┐ ┌──────┐ ┌────────┐
│ Codex │ │ Core │ │ Ledger │
└───┬────┘ └───┬──┘ └───┬────┘
│ │ │
│ 1. Need to call Ledger API │ │
│ │ │
│ 2. POST /service-token │ │
│ {calling_service: "codex", │ │
│ target_service: "ledger"} │ │
├──────────────────────────────────────>│ │
│ │ │
│ │ 3. Validate service names │
│ │ against services.json │
│ │ │
│ │ 4. Mint service token │
│ │ (5 min expiry, type="service") │
│ │ │
│ 5. {token: <service_jwt>} │ │
│<──────────────────────────────────────┤ │
│ │ │
│ 6. GET /api/invoices │ │
│ Authorization: Bearer <service_jwt> │ │
├─────────────────────────────────────────────────────────────────────────────>│
│ │ │
│ │ │
│ │ │
│ │ 7. Verify JWT signature │
│ │ using Core's public key │
│ │ │
│ │ 8. Check type="service" │
│ │ Set g.is_service_call=True │
│ │ │
│ │ 9. Bypass user permission checks │
│ │ (services are trusted) │
│ │ │
│ 10. {invoices: [...]} │ │
│<─────────────────────────────────────────────────────────────────────────────┤
Service Token Characteristics: - Short-lived: 5 minutes (prevents long-term abuse if leaked) - Scoped: Includes calling and target service names - Trusted: Backend services bypass user permission checks - Validated: Service names must exist in services.json
Usage in Code:
from app.service_client import call_service
# Automatically handles token request and API call
response = call_service('ledger', '/api/invoices')
invoices = response.json()
Session Management¶
Redis-Backed Sessions¶
Core uses Redis for persistent session storage with automatic fallback to in-memory storage.
Why Redis? - Sessions survive Core service restarts - Shared session storage (for multi-instance deployments) - Automatic TTL expiration - High performance (sub-millisecond latency)
Fallback Behavior:
- If Redis unavailable at startup → uses thread-safe in-memory dict
- If Redis fails during operation → switches to in-memory storage
- In-memory mode logs warning: "SessionManager: Using in-memory sessions"
- Sessions lost on service restart when using in-memory mode
Configuration:
# Redis runs on standard port
redis-server --port 6379
# Check Redis connectivity
redis-cli ping
# Expected: PONG
Session Lifecycle¶
Creation¶
When a user logs in (via /api/token/exchange):
-
Session Data Created:
-
Stored in Redis:
- Key:
session:{session_id} - Value: JSON-serialized session data
- TTL: 3600 seconds (1 hour)
-
Session ID: 32-byte URL-safe token (e.g.,
a1b2c3d4...) -
Session ID Embedded in JWT:
jticlaim contains session ID- Links JWT to revocable session
Validation¶
On every request (via /api/token/validate):
- Decode JWT to extract
jti(session ID) - Fetch session from Redis:
GET session:{jti} - Check if session exists
- Check if session expired:
expires_at < current_time - Check if session revoked:
revoked == True - Return user data if all checks pass
Performance: - Redis lookup: ~1-2ms - In-memory lookup: ~0.1ms - Total validation time: <5ms
Expiration¶
Sessions automatically expire after 1 hour:
- Redis Mode: TTL handled by Redis (automatic deletion)
- In-Memory Mode: Probabilistic cleanup on validation (1% chance)
- Manual Cleanup:
session_manager.cleanup_expired()(returns count)
Monitoring:
# Get active session count
count = session_manager.get_active_session_count()
print(f"Active sessions: {count}")
Session Revocation¶
Explicit Revocation (Logout)¶
Via /api/token/revoke:
- Extract
jtifrom JWT - Fetch session from Redis
- Mark
revoked: True - Update session in Redis with remaining TTL
- Session remains in Redis until TTL expires (audit trail)
Why Keep Revoked Sessions? - Audit logging (track logout events) - Debug investigations (why did user get logged out?) - Forensics (detect suspicious logout patterns)
Cleanup¶
Expired sessions are automatically removed:
- Redis: Automatic via TTL (no action needed)
- In-Memory: Manual cleanup via
cleanup_expired() - Called probabilistically (1% of validations)
- Can be called manually via health endpoint or cron job
Permission Levels¶
HiveMatrix uses a four-tier permission model based on Keycloak group membership.
Permission Tiers¶
| Level | Keycloak Group | Description | Typical Use Cases |
|---|---|---|---|
| admin | admins or /admins |
Full system access | System administrators, platform owners |
| technician | technicians or /technicians |
Technical operations | MSP technicians, engineers, helpdesk staff |
| billing | billing or /billing |
Financial operations | Billing department, accountants |
| client | (default) | Limited access | End-user clients, read-only access |
Permission Determination¶
During token exchange (/api/token/exchange), Core:
- Fetches user info from Keycloak (includes
groupsarray) - Checks group membership in priority order:
- Embeds
permission_levelin JWT claims - Services enforce permissions using decorators
Service-Level Enforcement¶
Services use the permission_level claim from JWT:
from flask import g
@app.route('/admin/settings')
@token_required
def admin_settings():
# Check permission level
if g.user.get('permission_level') != 'admin':
abort(403, "Admin access required")
# Admin-only logic here
return render_template('admin/settings.html')
Built-in Decorators:
- @token_required - Any authenticated user
- @admin_required - Admin-level only (auto-checks permission)
Group Format¶
Keycloak groups may appear in two formats:
- Without leading slash: admins, technicians, billing
- With leading slash: /admins, /technicians, /billing
Core handles both formats for compatibility.
Configuration¶
Environment Variables¶
Core is configured entirely via environment variables (loaded from .flaskenv).
⚠️ Important: .flaskenv is auto-generated by config_manager.py from Helm's master_config.json. Do not edit manually.
Flask Configuration¶
# Flask application settings
FLASK_APP=run.py
FLASK_ENV=development # or production
SECRET_KEY=<generated-secret> # Flask session encryption key
SERVICE_NAME=core
Keycloak Configuration¶
# Keycloak OAuth2 settings
KEYCLOAK_SERVER_URL=http://localhost:8080
KEYCLOAK_REALM=hivematrix
KEYCLOAK_CLIENT_ID=core-client
KEYCLOAK_CLIENT_SECRET=<client-secret>
Note: KEYCLOAK_CLIENT_SECRET is generated during Keycloak setup by configure_keycloak.sh and stored in master_config.json.
JWT Configuration¶
# JWT signing configuration
JWT_PRIVATE_KEY_FILE=keys/jwt_private.pem
JWT_PUBLIC_KEY_FILE=keys/jwt_public.pem
JWT_ISSUER=hivematrix-core
JWT_ALGORITHM=RS256
Service URLs¶
# Inter-service communication
CORE_SERVICE_URL=http://localhost:5000
NEXUS_SERVICE_URL=https://localhost
HELM_SERVICE_URL=http://localhost:5004
CODEX_SERVICE_URL=http://localhost:5010 # For user home page preference
Logging Configuration¶
# Logging settings (production features)
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
ENABLE_JSON_LOGGING=true # Structured JSON logs with correlation IDs
Security Configuration¶
⚠️ Security Warning: VERIFY_SSL=False is common in HiveMatrix deployments using self-signed certificates. For production with valid SSL certificates, set to True.
Session Configuration¶
Session settings are hardcoded in app/__init__.py:
app.config['SESSION_COOKIE_SECURE'] = False # Set to True in production
app.config['SESSION_COOKIE_HTTPONLY'] = True # Prevents JavaScript access
app.config['SESSION_COOKIE_SAMESITE'] = 'Lax' # CSRF protection
app.config['PERMANENT_SESSION_LIFETIME'] = 3600 # 1 hour
For Production:
- Set SESSION_COOKIE_SECURE = True (requires HTTPS)
- Keep HTTPONLY = True (prevents XSS attacks)
- Keep SAMESITE = 'Lax' (prevents CSRF attacks)
RSA Key Generation¶
Core uses RSA key pairs for JWT signing and verification.
Generate Keys¶
cd hivematrix-core
# Create keys directory
mkdir -p keys
# Generate private key (2048-bit RSA)
openssl genrsa -out keys/jwt_private.pem 2048
# Extract public key
openssl rsa -in keys/jwt_private.pem -pubout -out keys/jwt_public.pem
# Set proper permissions
chmod 600 keys/jwt_private.pem
chmod 644 keys/jwt_public.pem
Key Rotation¶
To rotate keys (e.g., annually or after compromise):
- Generate new key pair
- Update
.flaskenvto point to new keys - Restart Core service
- All services auto-fetch new public key from
/.well-known/jwks.json - Existing sessions remain valid until expiration (1 hour max)
Gradual Rollover: For zero-downtime rotation, Core can publish multiple keys in JWKS (future enhancement).
Key Security¶
Private Key (jwt_private.pem):
- Never commit to git (included in .gitignore)
- Permissions: 600 (owner read/write only)
- Backup: Store securely (password manager, secrets vault)
- Compromise: Rotate immediately if leaked
Public Key (jwt_public.pem):
- Safe to distribute (published via JWKS endpoint)
- Services cache this key for JWT verification
- Automatically fetched on service startup
Security¶
Rate Limiting¶
Core implements per-user rate limiting using Flask-Limiter with Redis backend.
Per-User Limits¶
Rate limits are applied per user (JWT subject) instead of per IP address:
# In app/rate_limit_key.py
def get_user_id_or_ip():
"""
Extract user ID from JWT if available, otherwise use IP address.
Prevents shared IP abuse (e.g., corporate NAT, VPN users).
"""
auth_header = request.headers.get('Authorization', '')
if auth_header.startswith('Bearer '):
token = auth_header[7:]
try:
payload = jwt.decode(token, options={"verify_signature": False})
return f"user:{payload.get('sub', 'unknown')}"
except:
pass
return f"ip:{request.remote_addr}"
Benefits: - Users behind shared IPs (corporate NAT) don't affect each other - Prevents one abusive user from blocking entire organization - More accurate rate limiting for distributed systems
Endpoint Limits¶
| Endpoint | Limit | Reason |
|---|---|---|
/login |
10/minute | Prevent credential stuffing |
/auth |
20/minute | Prevent OAuth callback abuse |
/api/token/exchange |
20/minute | Prevent token brute force |
/service-token |
Exempt | Protected by token caching |
/api/token/validate |
Exempt | Called frequently by services |
/.well-known/jwks.json |
Exempt | Public endpoint |
/health |
Exempt | Monitoring endpoint |
Storage Backend¶
- Primary: Redis (shared state for multi-instance deployments)
- Fallback: In-memory (if Redis unavailable)
# In app/__init__.py
try:
redis_client = redis.Redis(host='localhost', port=6379)
redis_client.ping()
storage_uri = "redis://localhost:6379"
except:
storage_uri = "memory://"
Response Headers¶
When rate limit exceeded, Core returns:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700003600
Retry-After: 60
{
"error": "Rate limit exceeded"
}
Error Handling¶
Core implements RFC 7807 Problem Details for standardized error responses.
Error Response Format¶
All errors follow this schema:
{
"type": "https://tools.ietf.org/html/rfc7807",
"title": "Unauthorized",
"status": 401,
"detail": "Invalid access token",
"instance": "/api/token/exchange"
}
Fields:
- type: URI reference identifying problem type
- title: Short, human-readable summary
- status: HTTP status code
- detail: Specific explanation for this occurrence
- instance: URI reference to specific occurrence
Standard Error Handlers¶
@app.errorhandler(400) # Bad Request
@app.errorhandler(401) # Unauthorized
@app.errorhandler(403) # Forbidden
@app.errorhandler(404) # Not Found
@app.errorhandler(500) # Internal Server Error
@app.errorhandler(503) # Service Unavailable
Example Error Responses¶
Bad Request:
{
"type": "https://tools.ietf.org/html/rfc7807",
"title": "Bad Request",
"status": 400,
"detail": "No access token provided",
"instance": "/api/token/exchange"
}
Unauthorized:
{
"type": "https://tools.ietf.org/html/rfc7807",
"title": "Unauthorized",
"status": 401,
"detail": "Session expired or revoked",
"instance": "/api/token/validate"
}
Internal Server Error:
{
"type": "https://tools.ietf.org/html/rfc7807",
"title": "Internal Server Error",
"status": 500,
"detail": "An unexpected error occurred",
"instance": "/api/token/exchange"
}
Benefits¶
- Machine-readable: Clients can parse and handle errors programmatically
- Standardized: RFC 7807 is an internet standard
- Consistent: All services use the same error format
- Debuggable:
instancefield helps trace errors
SSL/TLS¶
Core supports configurable SSL verification for connecting to Keycloak.
Configuration¶
# In .flaskenv
VERIFY_SSL=False # Development with self-signed certs
VERIFY_SSL=True # Production with valid certificates
Usage¶
All requests to Keycloak respect this setting:
response = requests.get(
userinfo_url,
headers={'Authorization': f'Bearer {access_token}'},
verify=current_app.config.get('VERIFY_SSL', True),
timeout=10
)
When to Use:
- Development: VERIFY_SSL=False (self-signed certs common)
- Production: VERIFY_SSL=True (valid SSL certificates)
⚠️ Security Warning: Never use VERIFY_SSL=False in production with internet-facing services.
Monitoring & Observability¶
Health Checks¶
Core provides comprehensive health monitoring via the /health endpoint.
Health Check Components¶
1. Redis Connectivity
{
"redis": {
"status": "healthy",
"latency_ms": 2,
"connected_clients": 5,
"used_memory_mb": 2.45
}
}
Checks:
- Connection test (PING command)
- Response latency
- Active client count
- Memory usage
Status:
- healthy: Connected, low latency (<10ms)
- unhealthy: Connection failed or timeout
- null: Redis not configured (in-memory mode)
2. Disk Space
Thresholds:
- healthy: <85% used
- degraded: 85-95% used
- unhealthy: ≥95% used
Overall Status Logic¶
if disk >= 95% or database down:
return "unhealthy" (503)
elif redis down or disk >= 85%:
return "degraded" (503)
else:
return "healthy" (200)
Kubernetes Integration¶
Use /health for readiness and liveness probes:
apiVersion: v1
kind: Pod
metadata:
name: hivematrix-core
spec:
containers:
- name: core
image: hivematrix/core:latest
ports:
- containerPort: 5000
livenessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 2
Logging¶
Core implements structured JSON logging with correlation IDs for distributed tracing.
Structured Logging¶
Format:
{
"timestamp": "2025-11-22T10:30:00.123Z",
"level": "INFO",
"service": "core",
"correlation_id": "a1b2c3d4-e5f6-7890",
"message": "User admin logged in",
"user": "admin",
"endpoint": "/auth",
"method": "GET",
"status_code": 302,
"duration_ms": 145
}
Fields:
- timestamp: ISO 8601 UTC timestamp
- level: DEBUG, INFO, WARNING, ERROR
- service: Always "core"
- correlation_id: Unique request ID (traces request across services)
- message: Log message
- Additional context fields (user, endpoint, etc.)
Correlation IDs¶
Correlation IDs enable tracing requests across multiple services:
User Request → Nexus (correlation_id: abc123)
└─> Core validates token (correlation_id: abc123)
└─> Codex fetches data (correlation_id: abc123)
└─> Ledger calculates billing (correlation_id: abc123)
Search logs by correlation ID to see entire request flow.
Configuration¶
# Enable/disable structured logging
ENABLE_JSON_LOGGING=true # Production (machine-readable)
ENABLE_JSON_LOGGING=false # Development (human-readable)
# Set log level
LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR
Centralized Logging¶
Core sends logs to Helm's PostgreSQL database:
from app.helm_logger import init_helm_logger
helm_logger = init_helm_logger('core', 'http://localhost:5004')
helm_logger.info("User logged in", extra={'user': username})
View Logs:
cd hivematrix-helm
source pyenv/bin/activate
# View Core logs
python logs_cli.py core --tail 50
# Filter by level
python logs_cli.py core --level ERROR --tail 100
Metrics¶
Core exposes metrics via health endpoint and logs:
Session Metrics¶
# Active session count
GET /health → checks.redis.connected_clients
# Session operations
helm_logger.info("Session created", extra={'session_id': session_id})
helm_logger.info("Session validated", extra={'session_id': session_id})
helm_logger.info("Session revoked", extra={'session_id': session_id})
Performance Metrics¶
Future Enhancements¶
Planned metrics (not yet implemented):
- Prometheus endpoint (/metrics)
- Token exchange rate
- Authentication success/failure rate
- Permission level distribution
- Service token usage
Development¶
Running Locally¶
Prerequisites: - Python 3.9+ - Redis server - Keycloak 26.4.0+ - PostgreSQL (optional, for Helm logging)
Setup:
cd hivematrix-core
# Create virtual environment
python3 -m venv pyenv
source pyenv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Generate RSA keys (if not exists)
mkdir -p keys
openssl genrsa -out keys/jwt_private.pem 2048
openssl rsa -in keys/jwt_private.pem -pubout -out keys/jwt_public.pem
# Configure environment
# .flaskenv is auto-generated by config_manager.py
# To regenerate:
cd ../hivematrix-helm
python config_manager.py write-dotenv core
cd ../hivematrix-core
# Start Redis
redis-server --port 6379
# Run Core
python run.py
Expected Output:
SessionManager: Using Redis for persistent sessions
Flask-Limiter: Using Redis for rate limiting
* Running on http://127.0.0.1:5000
* Debug mode: on
Access Points:
- API: http://localhost:5000
- Swagger Docs: http://localhost:5000/docs
- Health Check: http://localhost:5000/health
- JWKS: http://localhost:5000/.well-known/jwks.json
Testing¶
Manual Testing¶
1. Generate Test Token:
cd ../hivematrix-helm
source pyenv/bin/activate
# Create admin token (24-hour expiry)
TOKEN=$(python create_test_token.py 2>/dev/null)
echo "Token: $TOKEN"
2. Validate Token:
curl -s -X POST http://localhost:5000/api/token/validate \
-H "Content-Type: application/json" \
-d "{\"token\": \"$TOKEN\"}" | jq
Expected:
{
"valid": true,
"user": {
"sub": "admin",
"name": "Admin User",
"email": "admin@hivematrix.local",
"preferred_username": "admin",
"permission_level": "admin",
"groups": ["admin"]
}
}
3. Request Service Token:
SERVICE_TOKEN=$(curl -s -X POST http://localhost:5000/service-token \
-H "Content-Type: application/json" \
-d '{"calling_service": "codex", "target_service": "ledger"}' | jq -r '.token')
echo "Service Token: $SERVICE_TOKEN"
4. Check Health:
Integration Testing¶
Test Full OAuth Flow:
# 1. Start services
cd hivematrix-helm
./start.sh
# 2. Open browser to Nexus
open https://localhost
# 3. Click login → redirected to Keycloak
# 4. Login with admin/admin
# 5. Redirected back to HiveMatrix with token
# 6. Check session in Redis:
redis-cli KEYS "session:*"
redis-cli GET "session:{session_id}"
Load Testing¶
Test rate limiting and session performance:
# Install Apache Bench
sudo apt install apache2-utils
# Test login endpoint (rate limit: 10/min)
ab -n 20 -c 2 http://localhost:5000/login
# Should see:
# - First 10 requests: 302 (redirect)
# - Next 10 requests: 429 (rate limited)
# Test health endpoint (no rate limit)
ab -n 1000 -c 10 http://localhost:5000/health
API Documentation¶
Core includes auto-generated OpenAPI/Swagger documentation.
Access: http://localhost:5000/docs
Features:
- Interactive API testing
- Request/response examples
- Authentication configuration
- Model schemas
- Download OpenAPI spec: /apispec.json
Example Usage:
- Open
http://localhost:5000/docs - Click "Authorize" button
- Enter JWT token in format:
Bearer {token} - Click endpoint to test (e.g.,
/api/token/validate) - Click "Try it out"
- Enter parameters
- Click "Execute"
- View response
Troubleshooting¶
"Invalid token" Errors¶
Symptoms:
- /api/token/validate returns {"valid": false}
- Services return 401 Unauthorized
Causes & Solutions:
1. Token Expired (1 hour TTL)
# Decode token to check expiration
python3 -c "import jwt, sys; print(jwt.decode(sys.argv[1], options={'verify_signature': False}))" "$TOKEN"
# Check 'exp' field (Unix timestamp)
# If expired, request new token via /login or /api/token/exchange
2. Session Revoked (Logged Out)
# Check session in Redis
redis-cli GET "session:{jti}"
# If revoked: {"revoked": true, ...}
# Solution: Re-authenticate
3. Core Service Restarted (In-Memory Sessions Lost)
# Check if Redis is being used
redis-cli KEYS "session:*"
# If no sessions, check Core logs:
tail -f logs/core.log | grep "SessionManager"
# Expected: "Using Redis for persistent sessions"
# If using in-memory: Sessions lost on restart
# Solution: Start Redis or re-authenticate
4. Invalid JWT Signature
# Check public key matches
curl -s http://localhost:5000/.well-known/jwks.json | jq
# Services cache this key
# If Core keys rotated, restart services to fetch new key
Session Revocation Not Working¶
Symptoms:
- User clicks logout but can still access pages
- /api/token/revoke returns 200 but session still valid
Troubleshooting:
1. Check Session ID in JWT
# Decode token
python3 -c "import jwt, sys; print(jwt.decode(sys.argv[1], options={'verify_signature': False}))" "$TOKEN"
# Verify 'jti' claim exists
# If missing, token was issued before session tracking was added
2. Verify Session Marked as Revoked
SESSION_ID="..." # from jti claim
redis-cli GET "session:$SESSION_ID"
# Should show: {"revoked": true, ...}
# If revoked: false, revocation failed
3. Check Nexus Validation
# Nexus should validate token with Core on every request
# Check Nexus logs for validation calls:
cd ../hivematrix-helm
python logs_cli.py nexus --tail 50 | grep validate
4. Browser Cache
# User may see cached page after logout
# Core sets cache control headers to prevent this:
# Cache-Control: no-cache, no-store, must-revalidate
# Test in incognito/private window
# Force refresh: Ctrl+Shift+R (Chrome/Firefox)
Service Tokens Failing¶
Symptoms:
- POST /service-token returns 400 Bad Request
- Service-to-service calls fail with "Unknown calling_service"
Causes & Solutions:
1. Invalid Service Name Format
# Service names must match: ^[a-z0-9_-]{1,50}$
# ✅ Valid: codex, ledger, brain-hair
# ❌ Invalid: Codex, ledger!, service@123
curl -X POST http://localhost:5000/service-token \
-H "Content-Type: application/json" \
-d '{"calling_service": "CODEX", "target_service": "ledger"}'
# Returns: {"error": "Invalid calling_service format"}
# Solution: Use lowercase, alphanumeric + hyphens/underscores only
2. Service Not in Registry
# Core validates against services.json
cat services.json | jq 'keys'
# If service missing, add to apps_registry.json and regenerate:
cd ../hivematrix-helm
python install_manager.py update-config
3. Token Expired (5-Minute TTL)
# Service tokens expire quickly
# Check token age:
python3 -c "import jwt, sys, time; \
payload = jwt.decode(sys.argv[1], options={'verify_signature': False}); \
age = time.time() - payload['iat']; \
print(f'Age: {age:.0f}s, Expires in: {payload[\"exp\"] - time.time():.0f}s')" "$SERVICE_TOKEN"
# If expired, request new token
# Note: service_client.py caches tokens automatically
Redis Connection Issues¶
Symptoms:
- Core starts but logs: "SessionManager: Using in-memory sessions"
- Sessions lost after Core restart
- Rate limiting inconsistent
Troubleshooting:
1. Check Redis Running
# Test Redis connectivity
redis-cli ping
# Expected: PONG
# If connection refused:
sudo systemctl status redis
sudo systemctl start redis
2. Check Redis Port
# Core connects to localhost:6379
# Verify Redis listening:
sudo netstat -tlnp | grep 6379
# Expected: tcp 127.0.0.1:6379 LISTEN
3. Check Redis Authentication
# If Redis requires password:
redis-cli -a yourpassword ping
# Update Core to use password (not currently supported)
# Workaround: Disable Redis auth for localhost
4. Redis Memory Limits
# Check Redis memory usage
redis-cli INFO memory
# If near maxmemory limit, Redis may evict sessions
# Increase limit in /etc/redis/redis.conf:
maxmemory 256mb
maxmemory-policy allkeys-lru
sudo systemctl restart redis
Keycloak Connection Errors¶
Symptoms:
- /api/token/exchange fails with "Invalid access token"
- Login redirect fails
- OAuth callback errors
Troubleshooting:
1. Verify Keycloak Running
# Check Keycloak service
curl http://localhost:8080/health
# Expected: 200 OK with health status
# If connection refused:
cd ../keycloak-26.4.0
bin/kc.sh start-dev
2. Check Keycloak URL Configuration
# Core should use correct Keycloak URL
grep KEYCLOAK .flaskenv
# Expected:
# KEYCLOAK_SERVER_URL=http://localhost:8080
# KEYCLOAK_REALM=hivematrix
# KEYCLOAK_CLIENT_ID=core-client
3. Validate Client Secret
# Test Keycloak client credentials
KC_URL="http://localhost:8080"
REALM="hivematrix"
CLIENT_ID="core-client"
CLIENT_SECRET="..." # from .flaskenv
# Get token
curl -X POST "$KC_URL/realms/$REALM/protocol/openid-connect/token" \
-d "grant_type=client_credentials" \
-d "client_id=$CLIENT_ID" \
-d "client_secret=$CLIENT_SECRET"
# Should return access token
# If error, regenerate client secret via Keycloak admin console
4. SSL Verification Issues
# If using self-signed certificates:
# Set VERIFY_SSL=False in .flaskenv
# Check current setting:
grep VERIFY_SSL .flaskenv
# Test connection:
python3 -c "import requests; \
r = requests.get('http://localhost:8080/health', verify=False); \
print(f'Status: {r.status_code}')"
Health Check Failing¶
Symptoms:
- /health returns 503 Service Unavailable
- Kubernetes probes failing
- Monitoring alerts
Troubleshooting:
1. Check Health Response
curl -s http://localhost:5000/health | jq
# Look for unhealthy components:
# - redis.status != "healthy"
# - disk.status == "unhealthy"
2. Redis Health
3. Disk Space
# Check disk usage
df -h /
# If >= 95%, clean up:
# - Old logs: rm -f logs/*.log.old
# - Docker images: docker system prune -a
# - Package cache: sudo apt clean
4. Degraded vs Unhealthy
# Degraded (503) = non-critical issues
# - Redis down but in-memory working
# - Disk 85-95% full
# Unhealthy (503) = critical issues
# - Disk >= 95% full
# Core can still serve requests when degraded
# Fix non-critical issues to return to healthy (200)
See Also¶
Related Services¶
- Nexus - Gateway - Frontend proxy that validates tokens with Core
- Helm - Orchestration - Service management and centralized logging
- All Services - All services that depend on Core for authentication
Architecture & Design¶
- End-to-End Authentication Flow
- Session Management Architecture
- Service-to-Service Authentication
- Permission Model
- JWT Token Design
Configuration & Setup¶
- Installation Guide - Complete installation walkthrough
- Keycloak Setup - OAuth2 identity provider setup
- Redis Configuration - Session storage setup
- RSA Key Generation - JWT signing keys
- Security Guide - Security best practices
External Resources¶
- RFC 7519 - JWT Specification
- RFC 7517 - JWKS
- RFC 7807 - Problem Details
- OAuth 2.0 Authorization Code Flow
- Keycloak Documentation
- Flask-Limiter Documentation
- Redis Documentation
Tools & Utilities¶
- Token Testing:
hivematrix-helm/create_test_token.py- Generate test JWT tokens - Token Validation:
curl -X POST /api/token/validate - Log Viewer:
hivematrix-helm/logs_cli.py core --tail 50 - Security Audit:
hivematrix-helm/security_audit.py --audit - Config Manager:
hivematrix-helm/config_manager.py sync-all - Health Check:
curl http://localhost:5000/health
Last Updated: 2025-11-22 Version: 1.0 Maintained By: HiveMatrix Team