How to increase Ollama context length
OLLAMA_CONTEXT_LENGTH environment variable didn’t have an effect, but there’s another way
I was trying out glm-ocr, and discovered, that though it has performance close to Qwen3-VL or deepseek-ocr, while requiring less resources, it produces empty output with Ollama’s (tiny, 4096) default model context size.
Discussion here pointed me in the right direction.
According to Ollama’s docs, you can set the context length with the OLLAMA_CONTEXT_LENGTH environment variable.
I tried it, both by exporting the variable and restarting the Ollama service (sudo service ollama restart), and by passing it directly to the Ollama run command. No luck!
Rather than debug what was going wrong, I found a workaround.
It was simple to set the context length from the REPL that starts when you start a session with ollama run glm-ocr, with no prompt:
/set parameter num_ctx 10240
But I wasn’t running glm-ocr from the REPL, I was running it from the CLI. And /set doesn’t persist once you exit the REPL.
I found the answer I needed in a comment on the r/LocalLLaMA subreddit.
Set the context, as above, with:
/set parameter num_ctx 10240
Then, save a copy of the model with the current parameters as the default settings:
/save glm-ocr-10k
Now, I can use it on the CLI by using the new model name:
ollama run glm-ocr-10k "Text Recognition: ./image.jpg"
What values work well?
Since Ollama silently truncates context, it’s hard to know what’s the right value to use. Set it too high, and it will max out your resources. Ollama recommends 64000 for agents etc, but this won’t run on an older laptop.
- The default (4096) produces no output with
glm-ocr, just empty markdown and text code fences. - 10240 produces output (with errors)
- I’m currently trying 20480. It has the same errors as 10240, but is pretty good; I don’t know whether the errors relate to the context size or not.
- 64000 requires > 16gb RAM.
Why not just downscale the images?
That’s what I’m going to try next.