ollama run my-assistant
ollama create my-assistant -f ./Modelfile
ollama run my-assistant
ollama list
What is an Ollama Modelfile?
A Modelfile is Ollama's version of a Dockerfile — a plain text configuration file that defines a custom AI model. It tells Ollama which base model to build from, what system prompt to use, and how to set inference parameters like temperature and context length.
Once you have a Modelfile, you create your custom model with one command: ollama create my-model -f Modelfile. From then on, ollama run my-model launches your personalized assistant immediately — no cloud, no API key, no token cost.
Key Modelfile directives
- FROM — specifies the base model (required). Must be a model already pulled via
ollama pull. - SYSTEM — the hidden system prompt sent before every conversation.
- PARAMETER — sets inference parameters like temperature, context window, and stop tokens.
- MESSAGE — adds few-shot examples to teach the model its expected input/output format.
- TEMPLATE — overrides the default prompt template. Rarely needed unless you know the model's exact format.
Ollama parameter reference
| Parameter | Default | Range | What it does |
|---|---|---|---|
| temperature | 0.8 | 0.0 – 2.0 | Higher = more random/creative. Lower = more focused/deterministic. Set to 0 for fully deterministic output. |
| top_p | 0.9 | 0.0 – 1.0 | Nucleus sampling. Only tokens in the top P% of probability mass are considered. Lower = more conservative. |
| top_k | 40 | 1 – 200 | Limits the token pool to the top K candidates at each step. Lower = less variety, higher = more diverse output. |
| num_ctx | 2048 | 512 – 131072 | Context window in tokens. Larger = more conversation history, but uses significantly more VRAM. |
| num_predict | 128 | -1 – ∞ | Maximum number of tokens to generate per response. -1 means unlimited (model decides when to stop). |
| repeat_penalty | 1.1 | 0.5 – 2.0 | Penalizes repeated tokens to reduce loops and repetitive output. 1.0 disables it entirely. |
| seed | 0 | any int | Random seed. 0 = random each run. Any other value = same output for the same input, useful for reproducibility. |
| stop | (model default) | string(s) | Stop sequences — strings that cause generation to halt immediately when produced. |