Skip to content

[QUESTION] Quantizing in a different way... #1256

@0wwafa

Description

@0wwafa

Hello!
I did some research (using llama.cpp) and I found out that quantizing the input and embed tensors to f16 and the other tensors to q5_k or q6_k gives excellent results and almost indistinguishable from the pure f16 and with half the size.

Is it possible to do the same with bitsandbytes/transformers so to produce a model quantized in this way from a normal model?

You find my (gguf) quantization at https://huggingface.co/ZeroWw for reference.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions