FYI after quite a bit of testing I settled on qwen2.5 coder 1.5b for autocomplete and llama 3.2 1b for chat. These models are tiny, but bigger models had too poor of speed performance on an M1 laptop for daily use. Iβm sure the results will pale in comparison to larger models, but it is certainly better than nothing for free!