Software that's fast and intelligent —
engineered, not assembled.
Devigenix is an independent practice building full-stack applications with real AI capability underneath — combining systems-level performance work in Rust and Go with modern LLM and RAG architectures.
Inference latency — same model, two runtimes
Benchmarked on identical hardware (M4 Air), same model weights, single inference request.
Four ways to work together.
Scoped as a single build, an ongoing engagement, or a focused consult — whichever fits the problem.
Full-Stack Development
End-to-end web applications — React/Next.js frontends, Go or Node backends, PostgreSQL data layers, and deployment pipelines. MVP to production.
AI & LLM Integration
RAG systems, document Q&A, chat interfaces, and LLM-powered features that are grounded in your actual data — not a bolted-on chatbot.
Performance Engineering
Rewriting slow services in Rust or Go, benchmarking bottlenecks, and optimizing latency-critical paths. My most differentiated skill.
Real-Time & Distributed Systems
WebSocket-based apps, job queues, and concurrent processing pipelines built to handle load, not just demos.
Two disciplines, rarely paired.
Most developers specialize in either performance engineering or AI integration — not both. Devigenix exists in that gap: building applications where the AI layer is genuinely fast, and the systems underneath are genuinely intelligent.
That means writing inference servers in Rust when Python isn't fast enough, designing RAG pipelines that actually retrieve the right context, and shipping full-stack products — frontend to database to model — as one coherent system rather than bolted-together parts.
Currently taking on freelance projects, consulting engagements, and select full-time roles.
Six systems, six proofs.
Each project was scoped to demonstrate a specific engineering capability — from full-stack fundamentals to distributed systems design.
Rust AI Inference Server ★
Served a model via Candle + Axum and benchmarked it directly against an equivalent Python/PyTorch implementation — 22ms vs 240ms, an 11× speedup, with a live latency dashboard.
RAG Document Q&A
Upload a document, generate embeddings, store in pgvector, and answer questions with streamed LLM responses grounded in retrieved context.
Distributed Task Queue
Go orchestrator pushes jobs to Redis; a configurable pool of Rust workers pulls, processes, and reports status, visualized on a live React dashboard.
URL Shortener
High-throughput short-link service with collision detection, click analytics, and rate limiting — my first formally benchmarked project.
Real-Time Chat App
Multi-room chat with live presence and message history, plus an AI assistant streaming responses directly into the conversation over SSE.
Full-Stack Todo App
JWT-authenticated CRUD application with a typed React frontend and a one-command Dockerized deployment — the foundation the rest was built on.