⚡ DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration

🚀 A breakthrough approach to long-context inference in large language models by dynamically learning adaptive attention masks at the granularity of individual attention maps.