Deploy a fine-tuned or uploaded LoRA model on serverless for inference
Organization | Base Model Name | Base Model String | Quantization |
---|---|---|---|
Meta | Llama 3.1 8B Instruct | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | BF16 |
Meta | Llama 3.1 70B Instruct | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | BF16 |
Alibaba | Qwen2.5 14B Instruct | Qwen/Qwen2.5-14B-Instruct* | FP8 |
Alibaba | Qwen2.5 72B Instruct | Qwen/Qwen2.5-72B-Instruct | FP8 |
--validation-file
and --n-evals
(the number of evaluations over the entire job) parameters. --n-evals
needs to be set as a number above 0 in order for your validation set to be used.
Step 2: Run LoRA Inference:
Once you submit the fine-tuning job you should be able to see the model name in the response:
adapter_config.json
and adapter_model.safetensors
files.
adapter_config.json
and adapter_model.safetensors
files in Files and versions tab on HuggingFace.