Why local AI?
Cloud AI services (OpenAI, Anthropic, Google) charge per token and send your data to third-party servers. That means your unpublished drafts, notes, and ideas leave your machine.
Ollama changes this. It runs models like Llama 3, Mistral, Phi, and thousands of others directly on your hardware. Your prompts never leave your machine. Latency is often lower than cloud APIs. And the cost is zero once you have the hardware.
Minimum: 8GB RAM (for smaller models like Phi-3, TinyLlama)
Recommended: 16GB+ RAM with a dedicated GPU for Llama 3, Mistral 7B, and similar models
Install Ollama
Ollama supports macOS, Linux, and Windows. Download the app from ollama.com or install via command line.
curl -fsSL https://ollama.com/install.sh | shDownload from ollama.com/downloadAfter installation, Ollama runs as a background service on port 11434. It starts automatically on macOS and Linux. On Windows, launch the app from your Start menu.
Pull a model
Once Ollama is running, pull a model from the Ollama library. Each model has different size and capability tradeoffs.
ollama pull llama3Other good options for writing assistance:
ollama pull mistral~4GBExcellent all-rounder, great at following instructions
ollama pull phi3~2GBMicrosoft's efficient model, runs on modest hardware
ollama pull codellama~4GBGreat if you write about code or technical topics
ollama pull llama3:70b~40GBMost capable, but requires significant GPU memory
Connect to EZPost
Open EZPost, go to Settings in the sidebar, then select AI Settings. Enter your Ollama URL and click Test Connection to verify.
http://localhost:11434If Ollama is running on a different machine or port, update the URL accordingly. EZPost saves this setting per-user, so it persists across sessions.
Using the AI Assistant
Once connected, the AI Assistant appears as a floating button next to the editor toolbar. Open it and select a model from the dropdown. The assistant understands your post’s title and category for contextual help.
Quick prompts are available to get started without typing:
- Suggest title — Generate 3 catchy post titles
- Write outline — Create a structured markdown outline
- Write intro — Draft an engaging opening paragraph
- Improve content — Fix grammar and expand on existing text
After receiving a response, click Insert to add the content to your post, or Use title to apply it as your post title. Conversations are saved per post so you can pick up where you left off.
Troubleshooting
Connection failed — refused
Ollama isn't running. Open a terminal and run `ollama serve`, or launch the Ollama app from your applications.
Connection failed — timeout
Ollama is running but slow to respond. The model may still be loading. Wait a moment and try again, or restart Ollama.
Test connection succeeds but chat doesn't work
The `/api/tags` endpoint works but `/api/chat` may be blocked. Ensure you're using the same URL for both — some firewalls misidentify the streaming endpoint.
Slow responses
Smaller models (Phi-3, TinyLlama) respond faster on CPU. For GPU-accelerated speed, ensure Ollama detects your GPU (ollama run llama3 should log CUDA/Vulkan usage).
Model not found
Run `ollama pull <model-name>` first. Pulling downloads the model weights to your local Ollama library.