Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open Llms

Posted4 months agoActive4 months ago

jinqueeny

132 points

19 comments

github.comTechstory

supportivepositive

Debate

20/100

LLM Fine-TuningAI Model OptimizationNlp

Key topics

LLM Fine-Tuning

AI Model Optimization

Nlp

Llama-Factory is a unified and efficient platform for fine-tuning 100+ open LLMs, sparking discussion on its applications, GPU requirements, and comparisons to other similar libraries.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

45m

Peak period

0-12h

Avg / period

4.8

Comment distribution19 data points

Loading chart...

Based on 19 loaded comments

Key moments

01Story posted
Sep 18, 2025 at 7:48 PM EDT
4 months ago
Step 01
02First comment
Sep 18, 2025 at 8:33 PM EDT
45m after posting
Step 02
03Peak activity
14 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Sep 23, 2025 at 4:41 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (19 comments)

Showing 19 comments

Twirrim

4 months ago

1 reply

https://llamafactory.readthedocs.io/en/latest/

I found this link more useful.

"LLaMA Factory is an easy-to-use and efficient platform for training and fine-tuning large language models. With LLaMA Factory, you can fine-tune hundreds of pre-trained models locally without writing any code."

yunohn

4 months ago

1 reply

Is it a bug or are most documentation pages only available in ZH-CN and not EN?

tempodox

4 months ago

I’d say documentation that is only readable for those who know Chinese is a bug. You could open an issue to ask for translation if there isn’t one yet.

hall0ween

4 months ago

1 reply

are there any use cases, aside from code generation and formatting, where fine-tuning consistently useful?

clipclopflop

4 months ago

Creating small, specialized models for specific tasks. Being able to leverage the up front training/data as a generalized base allows you to quickly create a small local model that can generate outputs for that task that can come close to or match the same you would see in a large/hosted model.

tensorlibb

4 months ago

1 reply

This is incredible! What gpu configs, budget to ultra high-end, would you recommend for local fine tuning?

Always curious to see what other ai enthusiasts are running!

spagettnet

4 months ago

axolotl is great on consumer hardware.

kelsey98765431

4 months ago

2 replies

FYI it also supports pre-training, reward model training and RL, not just fine tuning (sft). My team built a managed solution for training that runs on top of llama factory and it's quite excellent and well supported. You will need pretty serious equipment to get good results out of it, think 8xh200. For people at home i would look at doing an sft of gemma3 270m or maybe a 1.6b qwen3, but keep in mind you have to have the dataset in memory as well as the model and kv-cache. cheers

zwaps

4 months ago

Why do you have to keep the dataset in memory? We had good distributed streaming datasets for a good while now, no?

spagettnet

4 months ago

depends ln your goals of course. but worth mentioning there are plenty of narrowish tasks (think text-to-sql, and other less general language tasks) where llama8b or phi-4 (14b) or even up to 30b with quantization can be trained on 8xa100 with great results. plus these smaller models benefit from being able to be served on a single a100 or even L4 with post training quantization, with wicked fast generation thanks to the lighter model.

on a related note, at what point are people going to get tired of waiting 20s for an llm to answer their questions? i wish it were more common for smaller models to be used when sufficient.

stefanwebb

4 months ago

1 reply

There’s a similar library that also includes data synth and LLM-as-a-Judge: https://github.com/oumi-ai/oumi

BoorishBears

4 months ago

1 reply

Yet another framework lying about Deepseek support.

I've been trying to actually finetune Deepseek (not distills) and there are few options

3abiton

4 months ago

1 reply

Which version were you trying? Doesn't unsloth already support finetuning?

BoorishBears

4 months ago

Previous V3 base

Unsloth doesn't have an official multi-GPU story: there's hacked together solutions but they're finicky as it is for smaller models

In general Deepseek has very few resources on finetuning, that get even further muddied by people referring to the distills when they claim to be finetuning it.

Ambix

4 months ago

I've used this meta framework for LLM tuning, it really one of the best out there.

sabareesh

4 months ago

This is great,but most work is involved in curating the dataset and the objective functions for RL.

edd25

4 months ago

This looks awesome! I've been struggling fine tuning using Discord messages from my server (for memes), issues with CUDA mostly. Will defo try this out!

On a side note, has anyone tried something similar? I have 100K messages and want to make a "dumb persona" which reflects the general Discord server vibe. I don't really care if it's accurate. What models would be most suitable for this task? My setup is not that powerful: 4070S, 32GB of RAM for training, Lenovo M715q for running with, Ryzen 5 PRO 2400GE, 16GB of memory.

metadat

4 months ago

This reminds me conceptually of the Nvidia NIM factory where they attempt to optimize models in bulk / en-masse.

https://www.nvidia.com/en-us/ai/nim-for-manufacturing/

Word on the street is the project has yielded largely unimpressive results compared to its potential, but NV is still investing in an attempt to further raise the GPU saturation waterline.

p.s. This project logo stood out to me at presenting the Llama releasing some "steam" with gusto. I wonder if that was intentional? Sorry for the immature take but stopping the scatological jokes is tough.

jcuenod

4 months ago

Can you compare this to Unsloth?

View full discussion on Hacker News

ID: 45296403Type: storyLast synced: 11/20/2025, 12:53:43 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN