AI Research Intern, Vision-Language Model Direction

Internship, onsite
Sony Electronics Singapore
Singapore, Singapore

Salary undisclosed

Checking job availability...

Original

Simplified

On behalf of Pentas Vision, a subsidiary company of Sony Semiconductor Solutions Corporation (SSS), we are seeking for an AI Research Intern who is highly analytical and driven to enhance and optimise vision-language models aligned with the business expertise and purpose.

Pentas Vision specialises in the research and development of AI and in particular - Computer Vision technologies. Our core expertise lies in integrating AI solutions with diverse Image Sensors, facilitating seamless incorporation into embedded devices like mobile phones, IoT cameras, and other relevant devices.

Job Responsibilities:

Instruction Fine-Tuning: Assist in refining pre-trained vision-language models to better understand and follow complex instructions, enhancing their applicability across diverse tasks.
Deployment Acceleration: Collaborate with the team to optimize model inference performance, ensuring rapid deployment on various platforms.
Model Distillation: Engage in distilling knowledge from large-scale models into smaller, more efficient counterparts without significant loss of performance, facilitating easier deployment and scalability.
Quantization: Contribute to the quantization of vision-language models to reduce memory footprint and computational requirements, enabling deployment in resource-constrained environments.
Research and Development: Stay abreast of the latest advancements in vision-language modeling, propose innovative ideas, and apply cutting-edge technology to practical scenarios.

Minimum Qualifications:

Currently pursuing or recently completed a Master’s or Ph.D. in Computer Science, Electrical Engineering, or a related field with a focus on artificial intelligence, machine learning, or computer vision.
Strong understanding of deep learning architectures, particularly those related to vision-language models.
Experience with instruction fine-tuning, model distillation, quantization, and deployment optimization techniques.
Proficiency in programming languages such as Python, and familiarity with deep learning frameworks like TensorFlow or PyTorch.
Excellent problem-solving skills, ability to work independently and collaboratively in a team environment.
Strong communication skills and a track record of research publications are a plus.

Preferred Experience:

Experience with large-scale model training and optimization.
Familiarity with deployment of models on edge devices and understanding of hardware constraints.
Knowledge of natural language processing and computer vision integration.

Job Responsibilities:

Instruction Fine-Tuning: Assist in refining pre-trained vision-language models to better understand and follow complex instructions, enhancing their applicability across diverse tasks.
Deployment Acceleration: Collaborate with the team to optimize model inference performance, ensuring rapid deployment on various platforms.
Model Distillation: Engage in distilling knowledge from large-scale models into smaller, more efficient counterparts without significant loss of performance, facilitating easier deployment and scalability.
Quantization: Contribute to the quantization of vision-language models to reduce memory footprint and computational requirements, enabling deployment in resource-constrained environments.
Research and Development: Stay abreast of the latest advancements in vision-language modeling, propose innovative ideas, and apply cutting-edge technology to practical scenarios.

Minimum Qualifications:

Currently pursuing or recently completed a Master’s or Ph.D. in Computer Science, Electrical Engineering, or a related field with a focus on artificial intelligence, machine learning, or computer vision.
Strong understanding of deep learning architectures, particularly those related to vision-language models.
Experience with instruction fine-tuning, model distillation, quantization, and deployment optimization techniques.
Proficiency in programming languages such as Python, and familiarity with deep learning frameworks like TensorFlow or PyTorch.
Excellent problem-solving skills, ability to work independently and collaboratively in a team environment.
Strong communication skills and a track record of research publications are a plus.

Preferred Experience:

Experience with large-scale model training and optimization.
Familiarity with deployment of models on edge devices and understanding of hardware constraints.
Knowledge of natural language processing and computer vision integration.