KnowledgeTree - Knowledge Base Service¶

{: .no_toc }

Port: 5020 Database: Neo4j (Graph Database) Repository: hivematrix-knowledgetree

Table of Contents¶

Overview
Architecture
Data Synchronization
Ticket Information
Context System
API Reference
Admin Operations
User Interface
Configuration
Installation & Setup
Development
Monitoring & Logging
Troubleshooting
Security
Backup & Recovery

Overview¶

KnowledgeTree is HiveMatrix's graph-based knowledge management system that organizes institutional knowledge in a hierarchical, filesystem-like structure. Using Neo4j's graph database, it provides fast traversal, flexible relationships, and powerful context-building capabilities for both human users and AI assistants.

Unlike other HiveMatrix services that use PostgreSQL, KnowledgeTree leverages Neo4j's graph database to efficiently represent hierarchical knowledge structures, track relationships between topics, and quickly gather contextual information across multiple related articles.

Primary Responsibilities¶

Hierarchical Knowledge Organization - Filesystem-like structure with sections, categories, and topics
Markdown Content Management - Rich text documentation with code blocks, tables, and formatting
External Data Integration - Syncs companies, contacts, assets, and tickets from Codex
Context Building - Gathers related knowledge for AI assistants (Brainhair)
Search & Discovery - Full-text search across all articles and folders
File Attachments - Upload and serve files linked to knowledge articles

Key Features¶

✅ Hierarchical Organization - Three-level structure (sections → categories → topics) ✅ Graph Relationships - Neo4j enables flexible connections between knowledge items ✅ Markdown Support - Full markdown rendering with syntax highlighting ✅ Data Sync - Automated sync from Codex for companies, users, assets, tickets ✅ Context System - Intelligently gathers related knowledge for AI queries ✅ Attached Folders - Special folders that automatically include in context ✅ Read-Only Nodes - Synced external data is marked read-only ✅ Multiple Views - Grid, list, and tree views for browsing ✅ Full-Text Search - Search across titles and content ✅ Export/Import - Backup and restore user-created knowledge

Architecture¶

Database: Neo4j Graph Database¶

KnowledgeTree uses Neo4j 5.14.0, a native graph database optimized for hierarchical data and relationship traversal.

Why Neo4j?¶

Natural Hierarchy - Perfect fit for section → category → topic structure
Fast Traversal - Efficiently navigate deep folder structures
Flexible Relationships - Easy to link related articles, prerequisites, dependencies
Graph Queries - Find connections and build context across multiple paths
Schema-less - Easy to evolve the data model

Graph Schema¶

Node Labels:

(:ContextItem)  // Folders and articles
(:File)         // File attachments

ContextItem Properties: - id (string, unique) - Node identifier (UUID or deterministic path-based) - name (string) - Display name - content (string) - Markdown content (for articles) - is_folder (boolean) - True for folders, false for articles - is_attached (boolean) - True for special folders that include in context - read_only (boolean) - True for synced data (from Codex)

File Properties: - id (string, unique) - File identifier (UUID) - filename (string) - Original filename

Relationships:

(:ContextItem)-[:PARENT_OF]->(:ContextItem)  // Hierarchy
(:ContextItem)-[:HAS_FILE]->(:File)           // Attachments

Schema Initialization¶

On startup, KnowledgeTree ensures the root node exists:

MERGE (r:ContextItem {id: 'root', name: 'KnowledgeTree Root'})
ON CREATE SET r.content = '# Welcome to KnowledgeTree',
              r.is_folder = true,
              r.is_attached = false,
              r.read_only = false

Hierarchical Structure¶

root (KnowledgeTree Root)
├── IT Documentation (Section)
│   ├── Network (Category)
│   │   ├── Router Configuration.md (Topic)
│   │   ├── VLAN Setup.md (Topic)
│   │   └── Firewall Rules.md (Topic)
│   ├── Server Management (Category)
│   │   ├── Windows Updates.md (Topic)
│   │   ├── Backup Procedures.md (Topic)
│   │   └── Active Directory.md (Topic)
│   └── End User Support (Category)
│       ├── Password Resets.md (Topic)
│       ├── Email Configuration.md (Topic)
│       └── VPN Setup.md (Topic)
└── Companies (Synced from Codex)
    ├── Acme Corporation
    │   ├── Users
    │   │   ├── John Doe
    │   │   │   ├── Contact.md
    │   │   │   └── Tickets (attached)
    │   │   │       └── Ticket_12345.md
    │   │   └── Jane Smith
    │   │       ├── Contact.md
    │   │       └── Tickets (attached)
    │   └── Assets
    │       ├── ACME-PC-001.md
    │       └── ACME-SRV-001.md
    └── Wayne Enterprises
        ├── Users
        └── Assets

Node Types¶

1. User-Created Nodes - Created via UI or API - Editable by users - read_only: false - Uses UUID for id

2. Synced Nodes (Read-Only) - Created by sync_codex.py or sync_tickets.py - Not editable via UI - read_only: true - Uses deterministic path-based id (e.g., root_Companies_Acme_Corporation)

3. Attached Folders - Special folders with is_attached: true - Automatically included when building context - Example: User's "Tickets" folder

4. Regular Folders - Organizational containers - is_folder: true, is_attached: false

5. Articles (Files) - Markdown content - is_folder: false - Can have file attachments via HAS_FILE relationship

Data Synchronization¶

KnowledgeTree syncs all external data from Codex, which acts as the single source of truth for companies, contacts, assets, and tickets.

Sync Architecture¶

PSA System (Freshservice) → Codex → KnowledgeTree
Datto RMM              → Codex → KnowledgeTree

Sync Scripts¶

1. sync_codex.py¶

Purpose: Syncs company structure, users, and assets from Codex

Creates:

/Companies/
  /{Company Name}/
    /Users/
      /{User Name}/
        Contact.md          (user details)
        /Tickets/           (attached folder, empty until ticket sync)
    /Assets/
      {hostname}.md         (asset details)

Run Command:

cd hivematrix-knowledgetree
source pyenv/bin/activate
python sync_codex.py

What It Does: 1. Fetches all companies from Codex API 2. For each company: - Creates company folder under /Companies/ - Creates Users and Assets subfolders - Fetches company users from Codex - Creates user folder with Contact.md containing user details - Creates empty Tickets attached folder (populated by ticket sync) - Fetches company assets from Codex - Creates markdown file for each asset with specs

Contact.md Example:

# Contact Information for John Doe

- **Email:** john.doe@acme.com
- **Title:** IT Manager
- **Mobile Phone:** (555) 123-4567
- **Work Phone:** (555) 123-4568
- **Active:** Yes

Asset.md Example:

# Computer Information: ACME-PC-001

- **Operating System:** Windows 11 Pro
- **Hardware Type:** Desktop
- **Internal IP:** 192.168.1.100
- **External IP:** 203.0.113.45
- **Last Logged In User:** john.doe
- **Status:** ✓ Online
- **Last Seen:** 2024-11-22 15:30:00
- **Domain:** acme.local

2. sync_tickets.py¶

Purpose: Syncs support tickets from Codex (originally from PSA system)

Creates:

/Companies/{Company}/Users/{User}/Tickets/
  Ticket_12345.md
  Ticket_12346.md

Run Command:

cd hivematrix-knowledgetree
source pyenv/bin/activate
python sync_tickets.py

What It Does: 1. Fetches all companies from Codex 2. For each company: - Fetches tickets from Codex /api/companies/{account}/tickets - Fetches contacts to map ticket requesters to users - For each ticket: - Finds the user's Tickets attached folder - Creates Ticket_{id}.md with full ticket details, conversations, and notes - Marks as read_only: true

Ticket.md Example:

# Ticket #12345: Email Configuration Issues

## Ticket Information
- **Requester:** John Doe (john.doe@acme.com)
- **Status:** Closed
- **Priority:** High
- **Created:** 2024-11-20 09:15:00
- **Last Updated:** 2024-11-21 14:30:00
- **Closed:** 2024-11-21 16:00:00
- **Hours Spent:** 2.50 hours

## Description
User unable to connect Outlook to Exchange server. Error: "Cannot connect to server."

## Conversation History

### Message 1 - → Incoming
**From:** john.doe@acme.com
**Date:** 2024-11-20 09:15:00

I'm getting an error when trying to set up my email in Outlook...

---

### Message 2 - ← Outgoing
**From:** support@msp.com
**Date:** 2024-11-20 10:00:00

Thank you for contacting support. Let's try reconfiguring your profile...

---

## Internal Notes

### Note 1
**From:** tech@msp.com
**Date:** 2024-11-20 10:30:00

Checked Exchange logs - user account was locked. Reset password and unlocked.

---

*Ticket data synced from Codex/PSA*

Sync Utilities¶

sync_utils.py - Shared helper function:

def ensure_node(session, parent_id, name, is_folder=True,
                is_attached=False, content='', read_only=True):
    """
    Creates or updates a node in Neo4j.

    Intelligently handles node creation/updates:
    1. Checks if node with same name exists under parent
    2. If found, reuses existing ID (preserves manually created nodes)
    3. If not found, generates deterministic path-based ID
    4. Creates or updates node with MERGE

    Returns: node_id
    """

Why This Matters: - Prevents duplicate folders when re-running sync - Preserves manually created nodes with UUIDs - Uses deterministic IDs for synced nodes (e.g., root_Companies_Acme) - Idempotent - safe to run multiple times

Scheduling Syncs¶

Recommended Cron Jobs:

# Run company/asset sync daily at 2 AM
0 2 * * * cd /path/to/hivematrix-knowledgetree && source pyenv/bin/activate && python sync_codex.py >> logs/sync_codex.log 2>&1

# Run ticket sync every 4 hours
0 */4 * * * cd /path/to/hivematrix-knowledgetree && source pyenv/bin/activate && python sync_tickets.py >> logs/sync_tickets.log 2>&1

Order of Operations: 1. PSA/Datto → Codex sync (see Codex documentation) 2. Codex → KnowledgeTree company sync (sync_codex.py) 3. Codex → KnowledgeTree ticket sync (sync_tickets.py)

Context System¶

One of KnowledgeTree's most powerful features is the context system, which gathers related knowledge for AI assistants like Brainhair.

How Context Works¶

When you request context for a node, KnowledgeTree:

Traverses the Path - Finds all ancestors from root to the target node
Gathers Articles - Collects articles at each level of the path
Includes Attached Folders - Automatically includes content from folders marked is_attached: true
Excludes Specified Folders - Allows excluding certain attached folders
Builds Hierarchical Context - Organizes by depth with markdown headers

Example: Context for a User¶

If Brainhair asks about user "John Doe" at path:

/Companies/Acme Corporation/Users/John Doe/

Context API Call:

POST /api/context/node_id_for_john_doe
Content-Type: application/json

{
  "excluded_ids": []  // Optional: exclude specific attached folders
}

Context Response:

# Context: KnowledgeTree Root

## Context: Companies

### Context: Acme Corporation

File: README.md
> Company overview and important notes...

#### Context: Users

##### Context: John Doe

File: Contact.md

# Contact Information for John Doe
- **Email:** john.doe@acme.com
- **Title:** IT Manager
...

File: Ticket_12345.md (from attached folder: Tickets)

# Ticket #12345: Email Configuration Issues
...

File: Ticket_12346.md (from attached folder: Tickets)

# Ticket #12346: Password Reset Request
...

Attached Folders¶

Purpose: Special folders whose content is automatically included when building context for child nodes.

Use Cases: - Tickets Folder - Include all user's tickets in their context - Procedures Folder - Include standard procedures for all items in a category - Notes Folder - Include general notes for all company assets

Creating Attached Folders:

Via API:

POST /api/node
{
  "parent_id": "user_folder_id",
  "name": "Tickets",
  "is_folder": true,
  "is_attached": true
}

Via UI: - Attached folders created automatically by sync scripts - Not currently creatable via UI (feature request)

Context API Usage¶

Endpoint: GET/POST /api/context/<node_id>

Parameters: - excluded_ids (array, optional) - List of attached folder IDs to exclude

Response:

{
  "context": "# Context: Root\n\n## Context: Section1\n\n..."
}

Example Use Case - Brainhair:

When a user asks Brainhair "What tickets does John Doe have?", Brainhair: 1. Searches KnowledgeTree for "John Doe" 2. Gets the node ID 3. Calls /api/context/<node_id> 4. Receives all of John's contact info and ticket history 5. Uses this context to answer the question accurately

API Reference¶

Search & Browse¶

Search Knowledge Base¶

GET /api/search?query={query}&start_node_id={node_id}

Description: Full-text search across node names and content.

Parameters: - query (required) - Search string (case-insensitive) - start_node_id (optional) - Limit search to descendants of this node (default: root)

Response:

[
  {
    "id": "node-uuid-123",
    "name": "Email Configuration Guide",
    "is_folder": false,
    "folder_path": "IT Documentation / Email / Configuration",
    "url_path": "IT%20Documentation/Email/Configuration"
  }
]

Example:

curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:5020/api/search?query=email+configuration"

Browse Folder¶

GET /api/browse?path={path}

Description: Browse the knowledge tree at a specific path. Returns categories (subfolders) and articles.

Parameters: - path (optional) - Path to browse (default: /)

Response:

{
  "path": "/IT Documentation",
  "current_node": {
    "id": "node-it-docs"
  },
  "categories": [
    {
      "name": "Network",
      "path": "/IT Documentation/Network"
    },
    {
      "name": "Server Management",
      "path": "/IT Documentation/Server Management"
    }
  ],
  "articles": [
    {
      "id": "node-readme",
      "title": "README.md",
      "summary": "This section contains IT documentation..."
    }
  ]
}

Use Case: Service-to-service calls to browse knowledge structure.

Node Management¶

Create Node¶

POST /api/node
Content-Type: application/json

{
  "parent_id": "parent-node-id",
  "name": "New Article",
  "is_folder": false,
  "is_attached": false
}

Description: Creates a new folder or article under a parent node.

Request Body: - parent_id (required) - Parent node ID - name (required) - Node name - is_folder (optional) - True for folders, false for articles (default: false) - is_attached (optional) - True for attached folders (default: false)

Response:

{
  "success": true,
  "id": "new-node-uuid"
}

Error Response (409):

{
  "error": "A node with this name already exists in this location",
  "existing_id": "existing-node-uuid"
}

Get Node Details¶

GET /api/node/<node_id>

Description: Retrieves node details including content and attached files.

Response:

{
  "id": "node-uuid",
  "name": "Email Setup Guide",
  "content": "# Email Setup\n\nFollow these steps...",
  "content_html": "<h1>Email Setup</h1><p>Follow these steps...</p>",
  "is_folder": false,
  "is_attached": false,
  "read_only": false,
  "files": [
    {
      "id": "file-uuid",
      "filename": "screenshot.png"
    }
  ]
}

Features: - Markdown content converted to HTML - Supports strikethrough syntax (~~text~~ → <del>text</del>) - Returns list of attached files

Update Node¶

PUT /api/node/<node_id>
Content-Type: application/json

{
  "name": "Updated Article Title",
  "content": "# Updated Content\n\nNew markdown here..."
}

Description: Updates node name and/or content.

Request Body: - name (optional) - New node name - content (optional) - New markdown content

Response:

{
  "success": true
}

Delete Node¶

DELETE /api/node/<node_id>

Description: Deletes a node and all its descendants.

⚠️ Warning: This is a recursive delete! All children will be removed.

Response:

{
  "success": true
}

Move Node¶

POST /api/node/<node_id>/move
Content-Type: application/json

{
  "new_parent_id": "target-parent-id"
}

Description: Moves a node to a new parent folder.

Validations: - Target must be a folder - Cannot move root node - Cannot move folder into itself or descendants (prevents cycles)

Response:

{
  "success": true
}

Error Response (400):

{
  "error": "Cannot move a folder into itself or its descendants"
}

Get Node Children¶

GET /api/node/<node_id>/children

Description: Get immediate children of a node.

Response:

[
  {
    "id": "child-1",
    "name": "Subfolder",
    "is_folder": true,
    "is_attached": false,
    "read_only": false
  },
  {
    "id": "child-2",
    "name": "Article.md",
    "is_folder": false,
    "is_attached": false,
    "read_only": false
  }
]

Folder Tree¶

Get Folder Tree¶

GET /api/folders/tree

Description: Returns the entire folder hierarchy as a nested tree structure (folders only, no articles).

Response:

{
  "id": "root",
  "name": "KnowledgeTree Root",
  "is_attached": false,
  "children": [
    {
      "id": "it-docs",
      "name": "IT Documentation",
      "is_attached": false,
      "children": [
        {
          "id": "network",
          "name": "Network",
          "is_attached": false,
          "children": []
        }
      ]
    }
  ]
}

Use Case: Building folder picker UI, navigation trees, etc.

File Management¶

Upload File¶

POST /api/upload/<node_id>
Content-Type: multipart/form-data

file: <file_data>

Description: Upload a file attachment to a node.

Response:

{
  "success": true,
  "filename": "screenshot.png"
}

File Storage: - Files saved to instance/uploads/ - Original filename preserved - Creates HAS_FILE relationship in Neo4j

Download File¶

GET /uploads/<filename>

Description: Serves an uploaded file.

Example:

<img src="/knowledgetree/uploads/screenshot.png">

Context Management¶

Get Context Tree¶

GET /api/context/tree/<node_id>

Description: Get list of attached folders in the path from root to this node.

Response:

{
  "attached_folders": [
    {
      "id": "tickets-folder",
      "name": "Tickets"
    }
  ]
}

Use Case: Display which attached folders will be included in context.

Get Full Context¶

GET /api/context/<node_id>
POST /api/context/<node_id>
Content-Type: application/json

{
  "excluded_ids": ["folder-id-to-exclude"]
}

Description: Gathers full context for a node by traversing from root and including attached folders.

Request Body (POST only): - excluded_ids (array, optional) - List of attached folder IDs to exclude from context

Response:

{
  "context": "# Context: Root\n\n## Context: Section\n\nFile: Article.md\n\nContent here..."
}

How It Works: 1. Finds path from root to target node 2. For each node in path: - Collects direct child articles - Collects articles from attached folders (unless excluded) 3. Builds hierarchical markdown with depth-based headers

Example Usage:

# Get full context
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:5020/api/context/node-uuid

# Exclude specific attached folder
curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"excluded_ids": ["tickets-folder-id"]}' \
  http://localhost:5020/api/context/node-uuid

Admin Operations¶

Admin Settings Page¶

GET /admin/settings

Description: Admin control panel for KnowledgeTree.

Permission: Requires admin permission level.

Features: - View Neo4j configuration - View Codex integration status - Trigger data syncs - View sync statistics - Database management tools

Trigger Codex Sync¶

POST /admin/sync/codex

Description: Manually trigger sync_codex.py to sync companies, users, and assets from Codex.

Permission: Admin only.

Response:

{
  "success": true,
  "message": "Codex sync completed successfully",
  "output": "--- KnowledgeTree Codex Sync ---\n..."
}

Timeout: 5 minutes

Trigger Ticket Sync¶

POST /admin/sync/tickets
Content-Type: application/json

{
  "overwrite": false
}

Description: Manually trigger sync_tickets.py to sync tickets from Codex.

Parameters: - overwrite (optional) - Whether to overwrite existing tickets (default: false)

Permission: Admin only.

Response:

{
  "success": true,
  "message": "Ticket sync completed successfully",
  "output": "--- KnowledgeTree Ticket Sync ---\n..."
}

Timeout: 10 minutes

Get Sync Status¶

GET /admin/sync/status

Description: Get statistics about synced data.

Permission: Admin only.

Response:

{
  "company_items": 150,
  "ticket_count": 500
}

Export Data¶

GET /admin/export

Description: Export all user-created (non-read-only) data to JSON file.

Permission: Admin only.

Response: File download knowledgetree_export.json

Export Format:

[
  {
    "path": "IT Documentation/Network/Router Config.md",
    "content": "# Router Configuration\n...",
    "is_folder": false,
    "is_attached": false
  }
]

What Gets Exported: - User-created nodes only (read_only: false) - Full path from root - Content and metadata - Does NOT export synced data from Codex

Import Data¶

POST /admin/import
Content-Type: multipart/form-data

file: <json_export_file>

Description: Import data from a previously exported JSON file.

Permission: Admin only.

Process: 1. Sorts items by path (ensures parents created before children) 2. Traverses from root to find parent node 3. Uses MERGE to create or update nodes 4. Preserves existing node IDs

Response:

{
  "success": true,
  "message": "Import successful."
}

Use Cases: - Restore from backup - Migrate knowledge between environments - Bulk import documentation

Wipe Database¶

POST /admin/wipe

Description: ⚠️ DANGER! Deletes all nodes and re-initializes the root node.

Permission: Admin only.

Process:

MATCH (n) DETACH DELETE n

Then recreates:

MERGE (r:ContextItem {id: 'root', name: 'KnowledgeTree Root'})

Response:

{
  "success": true,
  "message": "Database wiped and re-initialized."
}

⚠️ Use with extreme caution! This cannot be undone.

User Interface¶

Browse View¶

URL: /browse/ or /browse/<path>

Features: - Breadcrumb Navigation - Shows current path with clickable links - View Modes: - Grid view - Icon-based display - List view - Compact table - Tree view - Hierarchical tree - Search: - Scope: Current folder or all items - Real-time results - Shows full path to results - Context Menu: - Open (double-click or menu) - Rename - Move - Delete - Create folder - Create article - Visual Indicators: - 📁 Folder icon - 📄 Article icon - 📎 Attached folder icon - Read-only badge for synced content

Keyboard Shortcuts: - Enter - Open selected item - Delete - Delete selected item - Ctrl+N - New article - Ctrl+Shift+N - New folder

Article Viewer¶

URL: /view/<node_id>

Features: - Markdown Rendering: - Headers, lists, tables - Code blocks with syntax highlighting - Inline images - Links - Strikethrough support - Edit Mode: - Live markdown editor - Preview toggle - Auto-save - File Attachments: - Upload files - Download attachments - Preview images - Breadcrumb Navigation - Return to folder view - Metadata Display: - Read-only indicator for synced content - Last modified (if available)

Admin Dashboard¶

URL: /admin/settings

Sections:

1. Configuration - Neo4j URI and connection status - Codex integration URL - Configuration file location

2. Data Sync - Trigger company/asset sync from Codex - Trigger ticket sync from Codex - View sync statistics (item counts)

3. Database Management - Export user-created data - Import from JSON backup - Wipe database (danger zone)

Configuration¶

Database Configuration¶

File: instance/knowledgetree.conf

Format: INI-style configuration (use RawConfigParser)

[database]
neo4j_uri = bolt://localhost:7687
neo4j_user = neo4j
neo4j_password = your-secure-password

[services]
codex_url = http://localhost:5010

Environment Variables¶

File: .flaskenv (auto-generated by Helm's config_manager.py)

# Core Service (for JWT validation)
CORE_SERVICE_URL=http://localhost:5000

# Service Identity
SERVICE_NAME=knowledgetree

# Logging
LOG_LEVEL=INFO
ENABLE_JSON_LOGGING=true

# Neo4j (fallback if not in config file)
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=

Upload Configuration¶

Upload Directory: instance/uploads/

Created automatically on startup if it doesn't exist.

File Serving: Files accessible at /uploads/<filename>

Installation & Setup¶

Prerequisites¶

Neo4j 5.14.0+ installed and running
Python 3.8+ with pip
Codex service running and configured (for sync)

Install Neo4j¶

Automatic (via Helm):

cd hivematrix-helm
./start.sh  # Installs Neo4j if not present

Manual:

# Ubuntu/Debian
wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo apt-key add -
echo 'deb https://debian.neo4j.com stable latest' | sudo tee /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
sudo apt-get install neo4j

# Start Neo4j
sudo systemctl enable neo4j
sudo systemctl start neo4j

# Set initial password
cypher-shell -u neo4j -p neo4j
# Change password when prompted

Verify:

sudo systemctl status neo4j
# Neo4j browser: http://localhost:7474

Install KnowledgeTree¶

Via Helm:

cd hivematrix-helm
source pyenv/bin/activate
python install_manager.py install knowledgetree
./start.sh

Manual:

cd hivematrix-knowledgetree

# Create virtual environment
python3 -m venv pyenv
source pyenv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Symlink services.json
ln -sf ../hivematrix-helm/services.json services.json

# Run interactive setup
python init_db.py

Interactive Setup (init_db.py)¶

The setup wizard prompts for:

1. Neo4j Configuration - URI: bolt://localhost:7687 - Username: neo4j - Password: Your Neo4j password - Tests connection before saving

2. Codex Integration - Codex Service URL: http://localhost:5010

3. Configuration Sync - Saves to instance/knowledgetree.conf - Updates Helm's master_config.json - Syncs to .flaskenv

4. Database Initialization - Creates root node - Primes schema with dummy nodes (deleted immediately)

First-Time Data Sync¶

After installation, sync data from Codex:

source pyenv/bin/activate

# Sync companies, users, assets
python sync_codex.py

# Sync tickets
python sync_tickets.py

Development¶

Running Locally¶

Development Server:

cd hivematrix-knowledgetree
source pyenv/bin/activate
python run.py

Access: - Direct: http://localhost:5020/ - Via Nexus: https://your-server/knowledgetree/

Development Mode:

# run.py
if __name__ == '__main__':
    app.run(host='127.0.0.1', port=5020, debug=True)  # Enable debug mode

Code Structure¶

hivematrix-knowledgetree/
├── app/
│   ├── __init__.py              # Flask app setup, Neo4j initialization
│   ├── auth.py                  # @token_required, @admin_required
│   ├── routes.py                # All endpoints and UI routes
│   ├── service_client.py        # call_service() for Codex integration
│   ├── rate_limit_key.py        # Per-user rate limiting
│   ├── error_responses.py       # RFC 7807 error responses
│   ├── structured_logger.py     # JSON logging with correlation IDs
│   ├── helm_logger.py           # Centralized logging to Helm
│   ├── version.py               # Git-based version generation
│   └── templates/
│       ├── index.html           # Main browse interface
│       ├── view.html            # Article viewer/editor
│       ├── error.html           # Error page
│       └── admin/
│           └── settings.html    # Admin dashboard
├── instance/
│   ├── knowledgetree.conf       # Database config
│   └── uploads/                 # File attachments
├── sync_codex.py                # Company/user/asset sync script
├── sync_tickets.py              # Ticket sync script
├── sync_utils.py                # Shared sync utilities (ensure_node)
├── init_db.py                   # Interactive setup wizard
├── run.py                       # Application entry point
├── health_check.py              # Health check library
├── requirements.txt             # Python dependencies
├── .flaskenv                    # Environment variables (auto-generated)
└── services.json                # Symlink to Helm's registry

Key Components¶

app/init.py: - Initializes Flask app - Connects to Neo4j with driver pooling - Sets up ProxyFix middleware - Configures rate limiter (10000/hour, 500/minute) - Registers error handlers (RFC 7807) - Initializes Swagger documentation - Creates root node if missing

app/routes.py: - Main routes: /, /browse/<path>, /view/<node_id> - API endpoints: /api/search, /api/node, /api/context, etc. - Admin routes: /admin/settings, /admin/sync/* - Health check: /health

app/auth.py: - @token_required - Validates JWT from Core - @admin_required - Checks admin permission - Supports both user tokens and service tokens - Sets g.user, g.service, g.is_service_call

sync_codex.py: - Fetches companies from Codex /api/companies - For each company, fetches users and assets - Creates hierarchical folder structure - Generates Contact.md for each user - Generates Asset.md for each device - Uses ensure_node() for idempotent creation

sync_tickets.py: - Fetches tickets from Codex /api/companies/{account}/tickets - Creates full ticket markdown with conversations and notes - Stores in user's Tickets attached folder - Marks tickets as read-only

sync_utils.py: - ensure_node() - Smart node creation/update - Prevents duplicates by checking existing nodes - Reuses UUIDs from manually created nodes - Generates deterministic IDs for synced nodes

Adding New Features¶

Example: Add Related Articles Feature

Update Graph Schema (add relationship):

(:ContextItem)-[:RELATED_TO]->(:ContextItem)

Create API Endpoint (app/routes.py):

@app.route('/api/node/<node_id>/related', methods=['POST'])
@token_required
def add_related_article(node_id):
    data = request.json
    related_id = data.get('related_id')

    driver, error = get_neo4j_driver()
    if error:
        return error

    with driver.session() as session:
        session.run("""
            MATCH (source:ContextItem {id: $source_id})
            MATCH (target:ContextItem {id: $target_id})
            MERGE (source)-[:RELATED_TO]->(target)
        """, source_id=node_id, target_id=related_id)

    return jsonify({'success': True})

Query Related Articles:

@app.route('/api/node/<node_id>/related', methods=['GET'])
@token_required
def get_related_articles(node_id):
    driver, error = get_neo4j_driver()
    if error:
        return error

    with driver.session() as session:
        result = session.run("""
            MATCH (source:ContextItem {id: $node_id})-[:RELATED_TO]->(related)
            RETURN related.id as id, related.name as name
        """, node_id=node_id)

        related = [dict(record) for record in result]

    return jsonify(related)

Update UI (templates/view.html):

<div class="related-articles">
  <h3>Related Articles</h3>
  <ul id="related-list"></ul>
</div>

<script>
fetch(`/api/node/${nodeId}/related`, {credentials: 'same-origin'})
  .then(r => r.json())
  .then(related => {
    related.forEach(item => {
      const li = document.createElement('li');
      li.innerHTML = `<a href="/view/${item.id}">${item.name}</a>`;
      document.getElementById('related-list').appendChild(li);
    });
  });
</script>

Testing¶

Manual Testing with JWT:

cd hivematrix-helm
source pyenv/bin/activate

# Generate test token
TOKEN=$(python create_test_token.py 2>/dev/null)

# Test search
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:5020/api/search?query=email"

# Test browse
curl -H "Authorization: Bearer $TOKEN" \
  "http://localhost:5020/api/browse?path=/IT%20Documentation"

# Test create node
curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"parent_id":"root","name":"Test Folder","is_folder":true}' \
  http://localhost:5020/api/node

# Test get context
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:5020/api/context/some-node-id

Neo4j Cypher Testing:

cypher-shell -u neo4j -p your-password

# List all folders
MATCH (n:ContextItem {is_folder: true})
RETURN n.name, n.id
LIMIT 20;

# Find path to a node
MATCH p = (:ContextItem {id: 'root'})-[:PARENT_OF*..]->(target {name: 'Contact.md'})
RETURN [n IN nodes(p) | n.name];

# Count articles vs folders
MATCH (n:ContextItem)
RETURN n.is_folder, count(n);

# Find all attached folders
MATCH (n:ContextItem {is_attached: true})
RETURN n.name, n.id;

Monitoring & Logging¶

Health Check¶

Endpoint: GET /health

Checks: - Neo4j database connectivity - Disk space availability - Core service availability - Codex service availability

Response (200 - Healthy):

href="#__codelineno-81-1">{ "status": "healthy", "timestamp": "2024-11-22T10:30:00Z", "service": "knowledgetree", "checks": { "neo4j": { "status": "healthy", "message": "Connected to Neo4j" }, "disk": { "status": "healthy", "usage_percent": 45.2, "available_gb": 120.5 }, "dependencies": { "core": { "status": "healthy", "response_time_ms": 15 }, "codex": { "status": "healthy", "response_time_ms": 23 } } } }

Response (503 - Unhealthy):

{
  "status": "degraded",
  "timestamp": "2024-11-22T10:30:00Z",
  "service": "knowledgetree",
  "checks": {
    "neo4j": {
      "status": "unhealthy",
      "error": "Connection refused"
    }
  }
}

Monitoring:

# Check health
curl http://localhost:5020/health | jq

# Monitor in loop
watch -n 5 'curl -s http://localhost:5020/health | jq .status'

Structured Logging¶

Log Format: JSON with correlation IDs

Example Log Entry:

{
  "timestamp": "2024-11-22T10:30:00Z",
  "level": "INFO",
  "service": "knowledgetree",
  "correlation_id": "req-abc123",
  "user": "john.doe@example.com",
  "message": "Article created",
  "extra": {
    "node_id": "new-article-uuid",
    "parent_path": "/IT Documentation/Network"
  }
}

View Centralized Logs:

cd hivematrix-helm
source pyenv/bin/activate

# View KnowledgeTree logs
python logs_cli.py knowledgetree --tail 50

# Filter by level
python logs_cli.py knowledgetree --level ERROR --tail 100

# Real-time monitoring
watch -n 2 'python logs_cli.py knowledgetree --tail 20'

Rate Limiting¶

Configuration: - Per-user limits: 10000 requests/hour, 500 requests/minute - Key: JWT subject (sub claim) or IP address fallback - Storage: In-memory (resets on restart)

Rate Limit Headers:

X-RateLimit-Limit: 500
X-RateLimit-Remaining: 498
X-RateLimit-Reset: 1700654400

Rate Limit Exceeded (429):

{
  "type": "https://httpstatuses.com/429",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "Rate limit exceeded. Try again later."
}

Metrics to Monitor¶

Application: - Request rate and response times - Rate limit violations - Authentication failures - API error rates

Neo4j: - Database size and growth rate - Query execution times - Connection pool usage - Memory usage

Business: - Number of articles created per day - Search query volume - Popular search terms - Sync job success/failure rates

Troubleshooting¶

Neo4j Connection Issues¶

Symptom: Database not configured error

Check Neo4j Status:

sudo systemctl status neo4j

Common Issues:

Neo4j not running:

sudo systemctl start neo4j
sudo systemctl enable neo4j  # Auto-start on boot

Wrong credentials in config:

cd hivematrix-knowledgetree
cat instance/knowledgetree.conf

# Test connection manually
cypher-shell -u neo4j -p your-password

Firewall blocking bolt port:

sudo ufw allow 7687/tcp  # Bolt protocol
sudo ufw allow 7474/tcp  # Browser UI

Re-run setup:
```
python init_db.py
```

Search Not Working¶

Symptom: Search returns no results for known content

Create Full-Text Index:

// Connect to Neo4j
cypher-shell -u neo4j -p your-password

// Create full-text search index
CREATE FULLTEXT INDEX article_search
FOR (n:ContextItem)
ON EACH [n.name, n.content];

// Verify index
SHOW INDEXES;

Query Index:

// Search using full-text index
CALL db.index.fulltext.queryNodes("article_search", "email configuration")
YIELD node, score
RETURN node.name, score;

Sync Scripts Failing¶

Symptom: sync_codex.py or sync_tickets.py errors

Common Issues:

Codex not running:

cd hivematrix-helm
python cli.py status codex
python cli.py start codex

Service token expired:

# Tokens expire after 5 minutes
# Sync scripts request new token for each run
# Check Codex logs for auth failures
python logs_cli.py codex --tail 50

Codex has no data:

# Ensure Codex has synced from PSA/Datto first
cd hivematrix-codex
python pull_freshservice.py  # or your PSA sync script

Neo4j constraint violations:

// Check for constraint errors
SHOW CONSTRAINTS;

// Drop problematic constraints if needed
DROP CONSTRAINT constraint_name;

Permissions issues:

# Ensure sync scripts can write to Neo4j
ls -la instance/
chmod 755 instance/

Debug Sync:

# Run with verbose output
python sync_codex.py 2>&1 | tee sync.log

# Check for errors
grep -i error sync.log

Slow Performance¶

Symptom: Slow searches, browsing, or context building

Create Indexes:

// Index on node IDs (should exist automatically)
CREATE CONSTRAINT context_item_id IF NOT EXISTS
FOR (n:ContextItem) REQUIRE n.id IS UNIQUE;

// Index on folder flag for faster queries
CREATE INDEX folder_index IF NOT EXISTS
FOR (n:ContextItem) ON (n.is_folder);

// Index on read_only flag
CREATE INDEX readonly_index IF NOT EXISTS
FOR (n:ContextItem) ON (n.read_only);

Check Query Performance:

// Enable query profiling
PROFILE MATCH p = (:ContextItem {id: 'root'})-[:PARENT_OF*..]->(n)
WHERE n.name CONTAINS 'search term'
RETURN n;

// Analyze execution plan
EXPLAIN MATCH (n:ContextItem)
WHERE n.content CONTAINS 'keyword'
RETURN n;

Optimize Neo4j Configuration:

# Edit Neo4j config
sudo nano /etc/neo4j/neo4j.conf

# Increase memory (adjust based on available RAM)
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=2G
dbms.memory.pagecache.size=1G

# Restart Neo4j
sudo systemctl restart neo4j

Check Database Size:

// Count nodes
MATCH (n) RETURN count(n);

// Count relationships
MATCH ()-[r]->() RETURN count(r);

// Find large content nodes
MATCH (n:ContextItem)
WHERE size(n.content) > 10000
RETURN n.name, size(n.content) as size
ORDER BY size DESC;

Missing Data After Sync¶

Symptom: Companies or tickets not appearing in KnowledgeTree

Verify Codex Has Data:

TOKEN=$(python create_test_token.py 2>/dev/null)

# Check companies
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:5010/codex/api/companies | jq

# Check tickets for a company
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:5010/codex/api/companies/12345/tickets | jq

Check Neo4j for Synced Data:

// Count synced companies
MATCH (root:ContextItem {id: 'root'})-[:PARENT_OF]->(companies {name: 'Companies'})
MATCH (companies)-[:PARENT_OF]->(company)
RETURN count(company);

// List companies
MATCH (:ContextItem {id: 'root'})-[:PARENT_OF]->(:ContextItem {name: 'Companies'})-[:PARENT_OF]->(c)
RETURN c.name, c.id;

// Count tickets
MATCH (n:ContextItem)
WHERE n.name STARTS WITH 'Ticket_'
RETURN count(n);

Re-run Sync:

# Delete synced data and re-sync
cypher-shell -u neo4j -p your-password

// Delete Companies folder and all children
MATCH (root:ContextItem {id: 'root'})-[:PARENT_OF]->(companies {name: 'Companies'})
MATCH (companies)-[:PARENT_OF*0..]->(child)
DETACH DELETE companies, child;

# Re-run sync
python sync_codex.py
python sync_tickets.py

Rate Limit Errors¶

Symptom: 429 Too Many Requests

Check Limits:

# app/__init__.py
limiter = Limiter(
    app=app,
    key_func=get_user_id_or_ip,
    default_limits=["10000 per hour", "500 per minute"],
    storage_uri="memory://"
)

Increase Limits (if needed):

# For specific endpoints
@app.route('/api/search')
@limiter.limit("1000 per minute")  # Override default
@token_required
def search_nodes():
    ...

Exempt Endpoint from Rate Limiting:

@app.route('/api/public-endpoint')
@limiter.exempt
def public_endpoint():
    ...

File Upload Failures¶

Symptom: File upload returns 500 error

Check Upload Directory:

ls -la instance/uploads/
chmod 755 instance/uploads/

Check Disk Space:

df -h

Check File Size Limits:

# app/__init__.py
app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024  # 16 MB limit

# Increase if needed
app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024  # 100 MB

Check File Permissions:

# Ensure Flask can write to uploads folder
ls -la instance/uploads/
sudo chown -R $USER:$USER instance/uploads/

Security¶

Authentication¶

All endpoints require JWT authentication except: - /health - Public health check

Token Validation: - Validates signature against Core's JWKS endpoint - Checks expiration timestamp - Verifies issuer (hivematrix-core) - Extracts user/service identity

Authorization¶

Permission Levels: - Admin - Full access including admin endpoints - Technician - Create/edit user nodes, view all data - Billing - View access only - Client - View scoped to company data

Endpoint Protection:

@app.route('/admin/wipe')
@admin_required  # Only admins
def admin_wipe():
    ...

@app.route('/api/node', methods=['POST'])
@token_required  # All authenticated users
def create_node():
    # Additional checks inside route
    if g.user.get('permission_level') not in ['admin', 'technician']:
        abort(403)
    ...

Data Security¶

Read-Only Nodes: - Synced data marked read_only: true - UI prevents editing - API should validate before updates (feature request)

Input Sanitization: - Markdown content sanitized during rendering - Prevents XSS via malicious markdown - File uploads: validate file types, scan for malware (recommended)

Network Security: - Localhost binding: Service listens on 127.0.0.1 only - Nexus proxy: Only entry point to KnowledgeTree - Neo4j: Should be firewalled, accessible only from localhost

Neo4j Security¶

Authentication: - Use strong password for Neo4j user - Change default neo4j password immediately - Store password securely in instance/knowledgetree.conf

Network Access:

# Block external access to Neo4j
sudo ufw deny 7687/tcp  # Bolt protocol
sudo ufw deny 7474/tcp  # Browser UI

# Allow only localhost
sudo ufw allow from 127.0.0.1 to any port 7687

Backup Encryption:

# Encrypt Neo4j dumps
neo4j-admin database dump --database=neo4j --to=/backup/neo4j.dump
gpg --encrypt --recipient admin@example.com /backup/neo4j.dump

Security Best Practices¶

Rotate Neo4j Password Regularly
Use TLS for Neo4j (bolt+s:// instead of bolt://)
Enable Neo4j Auth (never disable authentication)
Validate File Uploads (check file types, scan for malware)
Audit Logs (track who created/modified what)
Backup Regularly (export user data, dump Neo4j database)
Monitor Access Patterns (detect unusual activity)

Backup & Recovery¶

Export User Data¶

Via Admin UI: 1. Navigate to /admin/settings 2. Click "Export Data" 3. Download JSON file

Via API:

TOKEN=$(python create_test_token.py 2>/dev/null)
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:5020/admin/export \
  -o knowledgetree_backup.json

What Gets Exported: - User-created nodes only (read_only: false) - Full hierarchical paths - Markdown content - Metadata (is_folder, is_attached)

Does NOT Export: - Synced data from Codex (can be re-synced) - File attachments (separate backup needed) - Neo4j system data

Backup Neo4j Database¶

Full Database Dump:

# Stop Neo4j
sudo systemctl stop neo4j

# Create dump
sudo neo4j-admin database dump \
  --database=neo4j \
  --to=/backup/neo4j-$(date +%Y%m%d).dump

# Start Neo4j
sudo systemctl start neo4j

# Compress and encrypt
gzip /backup/neo4j-*.dump
gpg --encrypt --recipient admin@example.com /backup/neo4j-*.dump.gz

Online Backup (Enterprise Edition):

# Requires Neo4j Enterprise
neo4j-admin backup \
  --backup-dir=/backup \
  --name=neo4j-backup

Backup File Attachments¶

# Backup uploads folder
tar -czf uploads-$(date +%Y%m%d).tar.gz instance/uploads/

# Sync to remote backup
rsync -avz instance/uploads/ backup-server:/backups/knowledgetree/uploads/

Restore from Export¶

Via Admin UI: 1. Navigate to /admin/settings 2. Click "Import Data" 3. Select JSON export file 4. Click "Import"

Via API:

TOKEN=$(python create_test_token.py 2>/dev/null)
curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@knowledgetree_backup.json" \
  http://localhost:5020/admin/import

Process: 1. Sorts items by path (parents before children) 2. Traverses from root to find parent nodes 3. Uses MERGE to create or update nodes 4. Preserves existing node IDs

Restore Neo4j Database¶

# Stop Neo4j
sudo systemctl stop neo4j

# Restore from dump
sudo neo4j-admin database load \
  --from=/backup/neo4j-20241122.dump \
  --database=neo4j \
  --overwrite-destination

# Start Neo4j
sudo systemctl start neo4j

# Verify data
cypher-shell -u neo4j -p your-password
MATCH (n:ContextItem) RETURN count(n);

Automated Backup Script¶

#!/bin/bash
# backup_knowledgetree.sh

BACKUP_DIR="/backups/knowledgetree"
DATE=$(date +%Y%m%d_%H%M%S)

# Export user data
cd /path/to/hivematrix-knowledgetree
source pyenv/bin/activate
TOKEN=$(python ../hivematrix-helm/create_test_token.py 2>/dev/null)
curl -s -H "Authorization: Bearer $TOKEN" \
  http://localhost:5020/admin/export \
  -o "$BACKUP_DIR/export-$DATE.json"

# Backup Neo4j
sudo systemctl stop neo4j
sudo neo4j-admin database dump \
  --database=neo4j \
  --to="$BACKUP_DIR/neo4j-$DATE.dump"
sudo systemctl start neo4j

# Backup uploads
tar -czf "$BACKUP_DIR/uploads-$DATE.tar.gz" instance/uploads/

# Cleanup old backups (keep last 30 days)
find "$BACKUP_DIR" -name "*.json" -mtime +30 -delete
find "$BACKUP_DIR" -name "*.dump" -mtime +30 -delete
find "$BACKUP_DIR" -name "*.tar.gz" -mtime +30 -delete

echo "Backup completed: $DATE"

Schedule Daily Backup:

# Add to crontab
crontab -e

# Run daily at 2 AM
0 2 * * * /path/to/backup_knowledgetree.sh >> /var/log/knowledgetree_backup.log 2>&1

Integration with Other Services¶

Codex Integration¶

Purpose: KnowledgeTree syncs all external data from Codex

Data Flow:

PSA (Freshservice) → Codex → KnowledgeTree (companies, contacts, tickets)
Datto RMM          → Codex → KnowledgeTree (assets)

API Calls Made to Codex:

# Get all companies
response = call_service('codex', '/api/companies')

# Get users for a company
response = call_service('codex', f'/api/companies/{account}/users')

# Get assets for a company
response = call_service('codex', f'/api/companies/{account}/assets')

# Get tickets for a company
response = call_service('codex', f'/api/companies/{account}/tickets')

Configuration:

# instance/knowledgetree.conf
[services]
codex_url = http://localhost:5010

Brainhair Integration¶

Purpose: AI assistant searches and uses KnowledgeTree for context

API Calls Made by Brainhair:

# Search for relevant articles
response = call_service('knowledgetree', '/api/search?query=email+setup')

# Browse knowledge structure
response = call_service('knowledgetree', '/api/browse?path=/IT+Documentation')

# Get full context for a node (most important)
response = call_service('knowledgetree', f'/api/context/{node_id}')

Example Workflow:

User asks Brainhair: "How do I reset John Doe's password?"
Brainhair searches KnowledgeTree:
/api/search?query=John+Doe → finds user node
/api/search?query=password+reset → finds procedure article
Brainhair gets context:
/api/context/{john_doe_node_id} → includes user's tickets, company info
/api/node/{password_reset_article_id} → gets procedure details
Brainhair combines context and generates answer

Context Building:

Brainhair uses the context API to gather all relevant knowledge: - User's contact information - User's ticket history (from attached Tickets folder) - Company-specific procedures - General password reset documentation

This gives Brainhair complete context to answer accurately.

Core Integration¶

Purpose: JWT authentication

Flow: 1. User requests KnowledgeTree page via Nexus 2. Nexus includes JWT token in cookie/header 3. KnowledgeTree validates JWT: - Fetches JWKS from Core: GET /.well-known/jwks.json - Validates signature, expiration, issuer - Extracts user identity and permissions 4. KnowledgeTree renders page with user context

Service-to-Service:

# KnowledgeTree (or any service) calls another service
# 1. Request service token from Core
response = requests.post(
    f"{CORE_SERVICE_URL}/service-token",
    json={'calling_service': 'knowledgetree'}
)
service_token = response.json()['token']

# 2. Use token to call target service
response = requests.get(
    f"{CODEX_URL}/api/companies",
    headers={'Authorization': f'Bearer {service_token}'}
)

Nexus Integration¶

Purpose: Frontend proxy and global CSS injection

Proxy Configuration:

Nexus proxies /knowledgetree/ to KnowledgeTree service:

# In Nexus
@app.route('/knowledgetree/', defaults={'path': ''})
@app.route('/knowledgetree/<path:path>')
def proxy_knowledgetree(path):
    return proxy_service('knowledgetree', path, inject_html=True)

CSS Injection:

Nexus injects global.css into all KnowledgeTree HTML responses:

<head>
  ...
  <link rel="stylesheet" href="/static/global.css">
</head>

URL Handling:

KnowledgeTree uses ProxyFix to handle X-Forwarded-Prefix:

# app/__init__.py
from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(
    app.wsgi_app,
    x_for=1, x_proto=1, x_host=1, x_prefix=1
)

This ensures url_for() generates correct URLs like /knowledgetree/browse/...

Performance Optimization¶

Neo4j Optimization¶

1. Create Indexes:

// Unique constraint on node IDs (auto-created)
CREATE CONSTRAINT context_item_id IF NOT EXISTS
FOR (n:ContextItem) REQUIRE n.id IS UNIQUE;

// Index for folder queries
CREATE INDEX folder_index IF NOT EXISTS
FOR (n:ContextItem) ON (n.is_folder);

// Index for read-only flag
CREATE INDEX readonly_index IF NOT EXISTS
FOR (n:ContextItem) ON (n.read_only);

// Full-text search index
CREATE FULLTEXT INDEX article_search IF NOT EXISTS
FOR (n:ContextItem) ON EACH [n.name, n.content];

2. Increase Memory:

# Edit /etc/neo4j/neo4j.conf
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=2G

3. Enable Query Logging:

# Monitor slow queries
dbms.logs.query.enabled=true
dbms.logs.query.threshold=1s

Application Optimization¶

1. Connection Pooling:

Neo4j driver uses connection pooling by default:

driver = GraphDatabase.driver(
    uri,
    auth=basic_auth(user, password),
    max_connection_pool_size=50,
    connection_acquisition_timeout=60
)

2. Caching:

Add Redis caching for frequently accessed nodes:

import redis
cache = redis.Redis(host='localhost', port=6379)

@app.route('/api/node/<node_id>')
@token_required
def get_node(node_id):
    # Check cache first
    cached = cache.get(f'node:{node_id}')
    if cached:
        return jsonify(json.loads(cached))

    # Query Neo4j
    with driver.session() as session:
        result = session.run(...)
        data = dict(result)

    # Cache result (5 minute TTL)
    cache.setex(f'node:{node_id}', 300, json.dumps(data))
    return jsonify(data)

3. Lazy Loading:

Load child nodes on-demand instead of entire tree:

// Load children when folder is expanded
folder.addEventListener('click', async () => {
  const response = await fetch(`/api/node/${folderId}/children`, {
    credentials: 'same-origin'
  });
  const children = await response.json();
  renderChildren(children);
});

4. Pagination:

Add pagination to large result sets:

@app.route('/api/search')
@token_required
def search_nodes():
    query = request.args.get('query', '')
    page = request.args.get('page', 1, type=int)
    per_page = request.args.get('per_page', 20, type=int)
    skip = (page - 1) * per_page

    with driver.session() as session:
        result = session.run("""
            MATCH (node:ContextItem)
            WHERE toLower(node.name) CONTAINS toLower($query)
            RETURN node
            SKIP $skip
            LIMIT $limit
        """, query=query, skip=skip, limit=per_page)

Query Optimization¶

Before (Slow):

// Searches all nodes, then filters by path
MATCH (node:ContextItem)
WHERE toLower(node.name) CONTAINS 'search'
MATCH p = (:ContextItem {id: 'root'})-[:PARENT_OF*..]->(node)
RETURN node, [n IN nodes(p) | n.name];

After (Fast):

// Uses full-text index, limits results early
CALL db.index.fulltext.queryNodes("article_search", "search")
YIELD node, score
WITH node, score
LIMIT 15
MATCH p = (:ContextItem {id: 'root'})-[:PARENT_OF*..]->(node)
RETURN node, [n IN nodes(p) | n.name], score
ORDER BY score DESC;

Changelog¶

Version 2.0.0 (2024-11-22)¶

Complete graph-based knowledge management system
Neo4j integration with hierarchical node structure
Codex sync for companies, users, assets, tickets
Context system for AI assistant integration
Attached folders feature
Multiple view modes (grid, list, tree)
Full-text search across all content
Export/import functionality
Admin dashboard for database management
Markdown rendering with syntax highlighting
File attachment support
Per-user rate limiting (10000/hour, 500/minute)
Structured JSON logging with correlation IDs
RFC 7807 error responses
Health check monitoring
Swagger/OpenAPI documentation