AI Research Engineer (Model Compression & Quantization)

Jobgether • India

No Relocation

Posted: May 21, 2026

Additional Content

Job Description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for an AI Research Engineer (Model Compression & Quantization) in India. This role sits at the forefront of efficient AI systems research, focusing on making large-scale multimodal models practical for real-world deployment. You will work on advancing state-of-the-art techniques in model compression, enabling LLMs and vision-language models to run efficiently on resource-constrained devices such as mobile and edge hardware. The position combines deep research with hands-on engineering, requiring you to design and optimize pipelines that reduce memory usage, latency, and compute cost without sacrificing model performance. You will explore and implement techniques such as quantization, pruning, and knowledge distillation, contributing directly to scalable AI infrastructure. Operating in a highly research-driven and experimental environment, you will collaborate with AI engineers and researchers to push the boundaries of efficient multimodal intelligence. This is a high-impact role for someone passionate about both cutting-edge AI research and real-world deployment constraints.
Accountabilities: Design and implement model compression techniques such as quantization, pruning, and knowledge distillation to optimize large multimodal AI models (LLMs and VLMs) for efficiency and scalability. Develop low-bit and mixed-precision quantization strategies to reduce model size and inference latency while preserving accuracy and output quality. Build and refine knowledge distillation pipelines to transfer capabilities from large teacher models to compact student models for efficient inference. Analyze performance trade-offs between accuracy, latency, memory usage, and throughput across different compression techniques and propose empirical improvements. Conduct research on emerging model compression methods and contribute to experimental validation of novel approaches for multimodal architectures. Document experiments, methodologies, and findings to ensure reproducibility and effective collaboration across research and engineering teams. Contribute to scientific publications and technical papers for leading AI conferences, advancing the field of efficient model deployment. Requirements: PhD or equivalent experience in Computer Science, Machine Learning, NLP, or a related field, with a strong research track record in AI or deep learning. Strong hands-on experience with PyTorch or equivalent deep learning frameworks for training and optimizing large-scale models. Proven expertise in model quantization, including both Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Practical experience with knowledge distillation techniques for compressing large neural networks into smaller, efficient models. Solid understanding of model pruning methods and neural network optimization strategies for efficiency improvement. Deep knowledge of transformer-based architectures (LLMs, VLMs), including training dynamics, backpropagation, fine-tuning, and optimization techniques. Strong research mindset with the ability to evaluate trade-offs and design experiments in multimodal AI systems. Familiarity with C++ for low-level optimization and inference acceleration is a plus. Benefits: Opportunity to work on cutting-edge AI research focused on efficient multimodal and generative model deployment. High-impact role contributing directly to scalable AI systems for real-world edge and mobile applications. Fully remote, global-first working environment with international collaboration. Strong focus on research freedom, experimentation, and publication in top-tier AI conferences. Exposure to advanced AI systems including LLMs, VLMs, and multimodal architectures at scale. Competitive compensation aligned with experience and technical expertise. Opportunity to shape next-generation AI efficiency standards and deployment techniques.
How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
apply for this job

Apply Now View Full Posting

RemoteJob Guru

Menu

AI Research Engineer (Model Compression & Quantization)

Additional Content

Job Description

AI Research Engineer (Model Compression &amp; Quantization)

Additional Content

Job Description

AI Research Engineer (Model Compression & Quantization)