Skip to the content.

LLM reviews

Subjective LLM reviews. Mostly on open weight models.

Llama series

Gemma series

Qwen series

Arcee

Starling-LM-7B-alpha

This is a lesser known, older (released in late 2023) model that I included for review because it consistently performs well on a very specific problem: given a title of a youtube video (which is often long, spammy, clickbaity and could contain Chinese characters (for my use case)), shorten it. (This is the high level idea not the prompt) At the time I wrote the script, only Starling-LM-7B-alpha actually followed the instructions and did what I wanted… Llama3 (7B) just didn’t like the Chinese input.

This shows how regardless of fancy benchmarks, the best model is the one that performs well in your use case (and why having lots of open weight models to choose from is great!)

DeepSeek

Phi

Kimi

dots.llm1

I haven’t had the chance to thoroughly review this yet unfortunately.

Mistral series

qwen2.5-7b-ins-v3

This isn’t a qwen model per-se. It was uploaded by somebody, without a model card, to huggingface, and the only real info about the model was in a r/LocalLLaMA/ post: https://www.reddit.com/r/LocalLLaMA/comments/1g03rdn/hidden_gem_happzy2633qwen257binsv3_is_an/

The model is nothing special these days, but at the time of its upload (I hesitate to call it “release”) in around Sep 2024, it was basically one of the first reasoning models to exist. Reasoning models were still a new thing, and the only other offering out there was OpenAI’s o1 preview model (which did not give you the reasoning tokens). DeepSeek R1 had not been released yet, neither had Qwen’s qwq.

Of course its actual performance (quality) was pretty meh, given that it probably was fine-tuned from qwen2.5 7b base (per the name), but it was definitely doing the reasoning just like all the other models we know in 2025.

It’s interesting how glimpse of “real” history like this is often lost to popular narrative. Anyways.