[QUESTION] Quantizing in a different way...

Hello!
I did some research (using llama.cpp) and I found out that quantizing the input and embed tensors to f16 and the other tensors to q5_k or q6_k gives excellent results and almost indistinguishable from the pure f16 and with half the size.

Is it possible to do the same with bitsandbytes/transformers so to produce a model quantized in this way from a normal model?