The Autodidacts

Exploring the universe from the inside out

How to increase Ollama context length

OLLAMA_CONTEXT_LENGTH environment variable didn’t have an effect, but there’s another way

Note: this post is part of #100DaysToOffload, a challenge to publish 100 posts in 365 days. These posts are generally shorter and less polished than our normal posts; expect typos and unfiltered thoughts! View more posts in this series.

I was trying out glm-ocr, and discovered, that though it has performance close to Qwen3-VL or deepseek-ocr, while requiring less resources, it produces empty output with Ollama’s (tiny, 4096) default model context size.

Discussion here pointed me in the right direction.

According to Ollama’s docs, you can set the context length with the OLLAMA_CONTEXT_LENGTH environment variable.

I tried it, both by exporting the variable and restarting the Ollama service (sudo service ollama restart), and by passing it directly to the Ollama run command. No luck!

Rather than debug what was going wrong, I found a workaround.

It was simple to set the context length from the REPL that starts when you start a session with ollama run glm-ocr, with no prompt:

/set parameter num_ctx 10240

But I wasn’t running glm-ocr from the REPL, I was running it from the CLI. And /set doesn’t persist once you exit the REPL.

I found the answer I needed in a comment on the r/LocalLLaMA subreddit.

Set the context, as above, with:

/set parameter num_ctx 10240

Then, save a copy of the model with the current parameters as the default settings:

/save glm-ocr-10k

Now, I can use it on the CLI by using the new model name:

ollama run glm-ocr-10k "Text Recognition: ./image.jpg"

What values work well?

Since Ollama silently truncates context, it’s hard to know what’s the right value to use. Set it too high, and it will max out your resources. Ollama recommends 64000 for agents etc, but this won’t run on an older laptop.

  • The default (4096) produces no output with glm-ocr, just empty markdown and text code fences.
  • 10240 produces output (with errors)
  • I’m currently trying 20480. It has the same errors as 10240, but is pretty good; I don’t know whether the errors relate to the context size or not.
  • 64000 requires > 16gb RAM.

Why not just downscale the images?

That’s what I’m going to try next.

Sign up for updates

Join the newsletter for curious and thoughtful people.
No Thanks

Great! Check your inbox and click the link to confirm your subscription.