Skip to main content

Using ngrok with Ollama

Ollama is a locally deployed AI model runner, designed to allow users to download and execute large language models (LLMs) locally on your machine. A perfect pairing for ngrok. By combining Ollama with ngrok, you can give your local Ollama an endpoint on the internet, enabling remote access and integration with other applications.

What you'll need

1. Connect the agent to your ngrok account

Sign up for an account at ngrok.com and run:

Loading…

2. Install and run Ollama on your machine

Download Ollama by following the instructions on the Ollama website and search for a model you'd like to use.

Pull the model you've chosen to your instance:

Loading…

Start the Ollama server:

Loading…

By default, Ollama will start on http://localhost:11434.

3. Create an endpoint for your Ollama server

In a new terminal window, start an ngrok tunnel to your local Ollama port:

Loading…

ngrok will generate a public forwarding URL like https://abcd1234.ngrok.app. This URL now provides public access to your local Ollama instance.

4. Use your Ollama instance from anywhere

You can now send requests to your Ollama server from anywhere using the ngrok URL. For example, run the curl command below, replacing abcd1234.ngrok.app with your domain name and gemma3 with a model you've pulled, to prompt your LLM.

Loading…

Last thing, you now have a public endpoint for your Ollama instance, which means anyone on the internet could find it and use it.

5. Protect your Ollama instance with basic auth

You may not want everyone to be able to access your LLM. ngrok can quickly add authentication to your LLM without any changes. Explore Traffic Policy to understand all the ways ngrok can protect your endpoint.

Create a new traffic-policy.yml file and paste in the policy below, which uses the basic-auth Traffic Policy action to only allow visitors with the credentials user:password1 or admin:password2 to access your app.

Loading…

Start the agent again with the --traffic-policy-file flag:

Loading…

You can test your traffic policy by sending the same LLM prompt to Ollama's API with the Authorization: Basic header and a base64 encoded version of user:password1.

Loading…

If you send the same request without the Authorization header, you should receive a 401 Unauthorized response.

Your personal LLM is now locked down to only accept authenticated users.