About 384,000 results
Open links in new tab
  1. edumunozsala/llama-2-7B-4bit-python-coder - GitHub

    Our goal is to fine-tune the pretrained model, Llama 2 7B parameters, using 4-bit quantization to produce a Python coder. We will run the training on Google Colab using a A100 to get better …

  2. 4-Bit VS 8-Bit Quantization Performance Comparison on Llama-2 and Falcon-7B

    Mar 30, 2024 · I conducted an empirical evaluation, training both Llama-2 and Falcon-7B models under 4-bit and 8-bit quantization schemes. The training process spanned 50 epochs, allowing …

  3. Deploying Llama 7B Model with Advanced Quantization

    Jan 16, 2024 · We investigated two advanced 4-bit quantization techniques to compare with the baseline fp16 model. One is activation-aware weight quantization (AWQ) and the other is …

  4. Fine-tuning Llama 2 7B on your own data - Google Colab

    This tutorial will use QLoRA, a fine-tuning method that combines quantization and LoRA. For more information about what those are and how they work, see this post . In this notebook, we …

  5. Making LLMs even more accessible with bitsandbytes, 4-bit quantization ...

    May 24, 2023 · We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit …

  6. LoftQ/Llama-2-7b-hf-4bit-64rank - Hugging Face

    LoftQ (LoRA-fine-tuning-aware Quantization) provides a quantized backbone Q and LoRA adapters A and B, given a full-precision pre-trained weight W. This model, Llama-2-7b-hf-4bit …

  7. ringerH/Llama-2-7b-finetuning: LoRA fine-tuning Llama-2-7b

    Quantization: The model weights are quantized to 4-bit precision using a normalized float format. This allows the model to be more memory-efficient, making it suitable for deployment in …

  8. GPTQ Quantization Benchmarking of LLaMA 2 and Mistral 7B

    This project benchmarks the memory efficiency, inference speed, and accuracy of LLaMA 2 (7B, 13B) and Mistral 7B models using GPTQ quantization with 2-bit, 3-bit, 4-bit, and 8-bit …

  9. Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

    Jun 1, 2023 · Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred …

  10. clibrain/Llama-2-7b-ft-instruct-es-gptq-4bit - Hugging Face

    Compared to OBQ, the quantization step itself is also faster with GPTQ: it takes 2 GPU-hours to quantize a BERT model (336M) with OBQ, whereas with GPTQ, a Bloom model (176B) can be …

Refresh