Position Overview

We are seeking a motivated Research Intern to join our AI research team, focusing on Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) technologies. The intern will play a crucial role in evaluating our proprietary models against industry benchmarks, analyzing competitive voice agent platforms, and contributing to cutting-edge research in speech AI technologies.

Skills & Competencies

Technical Competencies

Strong analytical and problem-solving abilities
Ability to design and conduct rigorous experiments
Experience with statistical analysis and performance metrics
Understanding of audio signal processing fundamentals
Knowledge of distributed training and large-scale model development

Soft Skills

Excellent written and verbal communication skills
Ability to work independently and manage multiple projects
Strong attention to detail and commitment to reproducible research
Collaborative mindset and ability to work in cross-functional teams
Curiosity and passion for staying current with AI research trends

Duration and Compensation

Duration: 6 months

Compensation: Monthly Stipend: Base stipend of INR 8,000 per month, with the potential to increase up to INR 15,000 based on performance evaluations.

Performance-Based Pay Scale: Eligibility for monthly performance-based bonuses, rewarding exceptional project contributions and teamwork.

Additional Benefits: Access to professional development opportunities, including workshops, tech talks, and mentoring sessions.

What You'll Gain

Learning Opportunities

Hands-on experience with state-of-the-art speech AI technologies
Exposure to full model development lifecycle from research to deployment
Mentorship from experienced AI researchers and engineers
Opportunity to contribute to cutting-edge research projects

Professional Development

Experience with industry-standard tools and methodologies
Opportunity to present research findings to technical and business stakeholders
Potential for research publication and conference presentations
Networking opportunities within the AI research community

Key Responsibilities

Conduct comprehensive evaluation of our TTS and ASR models against existing state-of-the-art models
Design and implement evaluation metrics and frameworks for speech quality assessment
Perform comparative analysis of model performance across different datasets and use cases
Generate detailed reports on model strengths, weaknesses, and improvement opportunities
Evaluate and compare our voice agent platform with existing solutions (Vapi, Bland AI, and other competitors)
Analyze feature sets, performance metrics, and user experience across different voice agent platforms
Conduct technical deep-dives into competitive architectures and methodologies
Provide strategic recommendations based on competitive landscape analysis
Monitor and analyze emerging trends in ASR, TTS, and voice AI technologies
Research novel approaches to improve ASR and TTS model performance
Investigate new architectures, training techniques, and optimization methods
Stay current with academic literature and industry developments in speech AI
Assist in training TTS and ASR models on various datasets
Implement and experiment with different model architectures and configurations
Perform model fine-tuning for specific use cases and domains
Optimize models for different deployment scenarios (edge, cloud, real-time)
Conduct data preprocessing and augmentation for training datasets
Maintain detailed documentation of experiments, methodologies, and results
Create visualization and analysis tools for model performance tracking
Prepare technical reports and presentations for internal stakeholders

Requirements

Programming Languages: Proficiency in Python; experience with PyTorch, TensorFlow
Speech AI Frameworks: Experience with libraries like librosa, torchaudio, speechbrain, or similar
Machine Learning: Strong understanding of deep learning architectures, training procedures, and evaluation methods
Data Processing: Experience with audio data preprocessing, feature extraction, and dataset management
Tools & Platforms: Familiarity with Colab or Jupyter notebooks, Git, Docker, and cloud platforms (AWS/GCP/Azure)
Knowledge of speech synthesis techniques (WaveNet, Tacotron, FastSpeech, etc.)
Understanding of ASR architectures (Wav2Vec, Whisper, Conformer, etc.)
Experience with model optimization techniques (quantization, pruning, distillation)
Familiarity with MLOps tools and model deployment pipelines
Previous work with voice AI applications or conversational AI systems preferred

AI Research Intern

Job Description