Agent Wave: The Future of Personalized Software Integration

Abstract

The "Agent Wave" is not just about chatbots, it is a fundamental shift on how we integrate distinct workflows to complete end to end task. Here is how I setup a local agentic system using PicoClaw running on Raspberry Pi 3B+, a Mac Studio hosting the Inference Engine (Ollama), and a Telegram bot for interaction., without a single dollar spent on new hardware. Having a digital worker that completes an personal end to end task running locally isn't just a privacy win; it's a blueprint for the next decade of computing.

The Architecture Stack

One of my Goal is to set up my Agent using Local Models. I ended up having to separate the Agent Runtime (PicoClaw) from the Inference Engine (Ollama). My original attempt to use LMStudio for Inference hit a number of hiccups with context window persistence, jinga template issues, and a lack of consistency across models that made me do the switch to Ollama.

Ollama

Inference

HostMac Studio

API

PicoClaw

Agent Runtime

HostPi 3B+

API

Chat

AccessDirect Message

Models in Use

Currently, my daily driver is qwen3.5:35b, which provides a massive context and stable performance for general orchestration. For tasks requiring deeper logic, I pivot to the qwen3-5-27b-opus-thinking model, which uses a chain of thought scratchpad to navigate complex instructions. When I need to engage during intense deep dive sessions with rapid iterations, I switch to cloud models like Gemini 3, leveraging its speed and Multimodal capabilities to complete complex tasks on time.

Wins: Things that went Well

1. Local LLMs

Setting up Local LLMs were relatively straight forward and a no brainer considering I have a Mac Studio with 32GB of RAM already. Which allowed me to run a 35B model utilizing 100% GPU offloading. This also means I could interact with my AI Agent with sensitive data without worrying about data privacy, such as reading my homelab configurations, IP address and my personal notes.

2. Scalable Local Memory Architecture

PicoClaw uses a lightweight, filesystem based memory system optimized for the Raspberry Pi. This approach avoids heavy databases by separating memory into distinct, easily parsed files for active knowledge (context, goals, and patterns), task management (fast objective tracking), chronological archival storage (keeping the workspace uncluttered), and secure contexts for accessing isolated, sensitive domains.

3. Telegram ID Based Routing

Instead of a single, chaotic chat history, I leveraged Telegram Groups which integrated with topic isolation. PicoClaw natively reads the group id, effectively treating messages from each group as an isolated conversation. This is an important feature for me as I can now use my AI Agent for multiple purposes without worrying about context switching or context contamination.

4. Tmpfs Systemd Ramdisk

This was a critical pain point Raspberry Pi has strict read-only /tmp filesystem restrictions, which effectively slowed down all skill installations until I found this fix. I bypassed it by injecting PrivateTmp=yes and ReadWritePaths=/tmp into the systemd override file. This was a total game changer, allowing PicoClaw to download and install new skills directly without asking it to manually use tmp directly inside the workspace.

Model Utilization Distribution

100%Total

qwen3.5:35bLocal

General Orchestration

75%

Gemini 3Cloud

Focus Sessions

15%

qwen3-5-27b-opusLocal

Deep Logic / Thinking

10%

Challenges: Things that did not Go Well

Building an edge compute instance is never without its challenges. Here are the roadblocks I hit and how I navigated them.

Retries / Uninteded DDOS

When I first implemented a subagent monitoring task meant to guarantee overall task completion, I triggered an unintended "infinite retry" loop due to the heartbeat configuration. Failed subagent tasks were retried relentlessly, effectively DDoS-ing my local inference engine and blocking real tasks indefinitely. This became multiday troubleshooting excercise before I identified and disabled the faulty configuration in the Inference Runtime.

The 32 Bit ARM Dependency Trap

My attempt to setup the scrapling skill to bypass bot checks failed. The underlying playwright library does not support 32 bit ARM (armv7l), and a ton of new skills requires Python 3.10+, while the Pi's Bullseye OS caps at Python 3.9.

Fix: I had to abandon heavy browser automation on this hardware for now. To future proof this 3 year old, $50 setup, I'll likely need to either pave the Pi with a 64 bit Bookworm OS or finally bite the bullet and upgrade the agent runtime to a Mac Mini (ride the trend).

Context Limit Crashes

I kept hitting cannot truncate prompt with nkeep >= nctx (4096) errors. Because PicoClaw sends massive system prompts (including all tool instructions), it instantly blew past the default 4K context window.

Fix: Switch to Ollama and permanently increased the context length to 32K in the model configuration.

The "Thinking" Dilemma

The qwen3-5-27b-opus-thinking model flooded my Telegram chats with massive walls of <think> text before actually answering.

Fix: I injected {%- set enable_thinking = false %} into the Jinja template and stripped the hardcoded <think> tag to force the model to skip the scratchpad phase.
The Unintended Consequence: Stripping the model's ability to "think" out loud severely lobotomized its logic. It started failing to format JSON tool calls correctly. It would infinitely retry, hit the max_tool_iterations limit, and never replied. I ultimately had to enable thinking and just accept the chat clutter.

Real World Use Cases

Despite the hurdles, the agent is actively handling real work:

Near Realtime Monitoring Stack: My primary use case involves a fleet of agents that monitor the various websites and services I own & manage. If a site goes down or a service becomes unresponsive, the agent detects the latency spike and response code and immediately pushes a alert to my Telegram Monitoring Group, allowing me to engage quickly before my users notify me.
Knowledge Retrieval (Local RAG): I use the bot to query Notion for my home lab documentation, network configurations, or past configuration commands so I don't have to hunt for them manually.
Github Ops My primary use case is to monitor my github repositories for my CI/CD ops and deployments and update me on any new PRs or issues have been rasied. It is also turning out to be a proxy to run direct gh commands in chat instead of terminal.

What’s Next?

While the current setup is a major win for privacy , there is still room to grow.

Multi Bot Sandboxing: I am exploring the option of running multiple distinct Telegram bots on the same Raspberry Pi by cloning the orchestration directories and spinning up isolated systemd services.
OpenClaw Pivot: I am also evaluating whether to pivot certain complex workflows to OpenClaw. While PicoClaw is incredibly lightweight and perfect for the Pi, OpenClaw’s richer ecosystem might be worth the extra overhead for reasoning tasks.
Token & Success Rate Monitoring: While anecdotally I use Qwen3.5 for 85% of my tasks, I am looking to build a monitoring dashboard that tracks my input/output tokens and other metrics on a monthly basis to help build a comprehensive data driven summary of agent performance and potential costs.

Riding the Agent Wave: The Future of Agentic Software