
Getting Local AI Running
Here’s a list of simple steps you can follow to get local AI running on your machine.
Confirm that your graphics card and computer have enough resources to run local AI models.
In my experience, small models take about 4 GB of video memory and larger models can take more than this. My advice is to start out with a medium-size model and see if that runs fast enough for you and can meet your needs. Model sizes are described by the number of parameters in the model, the smaller the number the smaller the number of parameters my video card will fit about a 7B model and for reference, That card has 12 GB of V RAM
Step One: Install ollama Ollama.
Install ollama from the official website (ollama.com) for your operating system.
Step Two: Choose an AI Model
Go back to the ollama website and click on the “Models” button. This will bring up a webpage that has all of the models available through the ollama program. When selecting an AI model to download, my recommendation is to start with a smaller model and test it with a simple prompt to see how fast it performs.
Step Two (a): Model Sizes
Models may be available in different sizes. The number of parameters a model has indicates its size and, therefore, how much video or system RAM the model will occupy when it runs. Video card RAM is always faster than system RAM, but both are available to ollama.
If you choose a model that is larger than your RAM size, the download will fail, and you will be unable to load the model and use it. Scroll through the list of models until you find one that seems interesting. Here, I have chosen Gemma by Google. There is a drop-down menu here so that you can download different sizes of the model.
Downloading Your Model
There is a text box on each page of the ollama models section With the exact command you need to enter to install the model. Once you’ve copied the command go to the terminal And enter the command it will start to download the model
To download the model, copy the ollama command here. It is just the command “pull” and then the model name. But the long names can be difficult to type and must be entered exactly in order to work, so it’s better to copy and paste them.
Wait for your model to download and then you can test it at the command prompt by typing “ollama run” and then your model name. That’s it! You can now chat with an AI at the command prompt locally. But if you want a fancier interface, similar to many of the popular web interfaces for AIs, then you need to proceed with a few more steps.
Installing the Interface Windows | Docker Docs
To install the interface, you will need to install Docker for Windows. Once it is installed, go to the terminal and paste in this command to download the interface for ollama in a docker container:
docker run -d –network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 –name open-webui –restart always ghcr.io/open-webui/open-webui:main
The interface software is called automatic 111 and is difficult to configure and install, so if you run this docker container, it will do it for you. Once you’ve downloaded the container, you should be able to click on the Docker for Windows program to start it and see the container that you’ve just downloaded.
To start the container, click “Run.” A window will then come up asking you to set options for the container. I recommend that you set one advanced option: go down to the ports field and enter the number 0 which will assign a random port to your AI for it to use. It is important that the container uses this so that you can access the AI.
Once the container is running, if you go to the container section of the interface, you should be able to click on the port as the hyperlink and begin using the AI model you selected.
Having an interface is very helpful because it allows you to use your voice with the AI as well as upload your own files to it. It also keeps a history of your inquiries if you wanted to, so that you can go back in time and remember what you have asked the AI previously.


Leave a comment