Setting Up Ollama for Local AI Writing Assistance | EZPost

Why local AI?

Cloud AI services (OpenAI, Anthropic, Google) charge per token and send your data to third-party servers. That means your unpublished drafts, notes, and ideas leave your machine.

Ollama changes this. It runs models like Llama 3, Mistral, Phi, and thousands of others directly on your hardware. Your prompts never leave your machine. Latency is often lower than cloud APIs. And the cost is zero once you have the hardware.

Hardware requirements

Minimum: 8GB RAM (for smaller models like Phi-3, TinyLlama)
Recommended: 16GB+ RAM with a dedicated GPU for Llama 3, Mistral 7B, and similar models

Install Ollama

Ollama supports macOS, Linux, and Windows. Download the app from ollama.com or install via command line.

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download from ollama.com/download

After installation, Ollama runs as a background service on port 11434. It starts automatically on macOS and Linux. On Windows, launch the app from your Start menu.

Pull a model

Once Ollama is running, pull a model from the Ollama library. Each model has different size and capability tradeoffs.

Terminal

ollama pull llama3

Other good options for writing assistance:

ollama pull mistral~4GB

Excellent all-rounder, great at following instructions

ollama pull phi3~2GB

Microsoft's efficient model, runs on modest hardware

ollama pull codellama~4GB

Great if you write about code or technical topics

ollama pull llama3:70b~40GB

Most capable, but requires significant GPU memory

Connect to EZPost

Open EZPost, go to Settings in the sidebar, then select AI Settings. Enter your Ollama URL and click Test Connection to verify.

Default URL (Ollama default)

http://localhost:11434

If Ollama is running on a different machine or port, update the URL accordingly. EZPost saves this setting per-user, so it persists across sessions.

Using the AI Assistant

Once connected, the AI Assistant appears as a floating button next to the editor toolbar. Open it and select a model from the dropdown. The assistant understands your post’s title and category for contextual help.

Quick prompts are available to get started without typing:

Suggest title — Generate 3 catchy post titles
Write outline — Create a structured markdown outline
Write intro — Draft an engaging opening paragraph
Improve content — Fix grammar and expand on existing text

After receiving a response, click Insert to add the content to your post, or Use title to apply it as your post title. Conversations are saved per post so you can pick up where you left off.

Troubleshooting

Connection failed — refused

Ollama isn't running. Open a terminal and run `ollama serve`, or launch the Ollama app from your applications.

Connection failed — timeout

Ollama is running but slow to respond. The model may still be loading. Wait a moment and try again, or restart Ollama.

Test connection succeeds but chat doesn't work

The `/api/tags` endpoint works but `/api/chat` may be blocked. Ensure you're using the same URL for both — some firewalls misidentify the streaming endpoint.

Slow responses

Smaller models (Phi-3, TinyLlama) respond faster on CPU. For GPU-accelerated speed, ensure Ollama detects your GPU (ollama run llama3 should log CUDA/Vulkan usage).

Model not found

Run `ollama pull <model-name>` first. Pulling downloads the model weights to your local Ollama library.

Ready to start writing?

Connect Ollama and unlock AI-powered writing assistance inside EZPost.

Set up local AI writing assistance with Ollama