Back to Trellion

Applied Research Scientist – Multimodal Modeling

Montreal, CACAD160,000-200,000/yearlyDec 10, 2025HYBRIDengineeringSENIOR

About the Role

Trellion is building the intelligence core of next-generation hiring: not just parsing text, but understanding humans across video, audio, language, and behavior. Our mission is to turn messy human signals into clean, structured insight that empowers better decisions.

This role helps shape our multimodal AI backbone. We are looking for an Applied Research Scientist who can design, train, evaluate, and deploy multimodal models for real-time and offline understanding of interviews. You will drive frontier work combining computer vision, speech processing, NLP, behavioral inference, and temporal modeling into robust, high-signal systems.

Responsibilities

  • Design multimodal architectures that combine video, audio, text, and interaction signals
  • Build feature representations for gaze, posture, prosody, sentiment, and conversational dynamics
  • Train sequence and transformer models on multimodal datasets
  • Research and implement state-of-the-art models in vision, speech, and language
  • Improve robustness, fairness, and generalizability across diverse users
  • Work closely with ML engineering and product teams to integrate models into production
  • Publish internal research notes and advance Trellion’s intellectual backbone

You will operate as both a scientist and an engineer. Ideas matter, but so does execution. Your models will power production systems used by real customers.

Requirements

Core Expertise (strong in most of the following):

  • Deep learning with PyTorch or TensorFlow
  • Experience with transformer architectures
  • Strong mathematical grounding in optimization and representation learning
  • Ability to build and iterate on multimodal models (any two of: vision, audio, NLP)

Vision

  • Face and gesture analysis
  • Action recognition or temporal CNN/ViT models
  • Pose estimation, gaze tracking, or visual attention modeling

Speech & Audio

  • Prosody and paralinguistic features
  • Voice activity detection
  • ASR models and embeddings

Language

  • LLMs, text embeddings, contextual modeling

General Skills

  • Dataset curation, alignment, and augmentation
  • Experiment design and large-scale training
  • Model evaluation and interpretability methods
  • Experience with GPUs, distributed training, and model optimization

Nice to have

  • Prior published research
  • Experience with fairness, bias mitigation, or explainability
  • Real-time inference constraints

What We Care About

  • You think clearly and write clearly
  • You understand the difference between novelty and usefulness
  • You believe models should earn their place in production
  • You value signal over complexity
  • You can collaborate with engineers, designers, and product teams
  • You enjoy building systems that help real humans

Compensation & Benefits

  • $160,000–$200,000 CAD base salary
  • Equity in a fast-moving AI startup
  • Hybrid work setting in Montreal
  • Ownership over a critical research track
  • A culture of rigor, speed, and autonomy

How to Apply

Send your resume, GitHub, and any research or project links to [email protected]. Strong candidates typically include a portfolio of experiments or publications demonstrating real depth.

Requirements

Multimodal modelingDeep learningPyTorchTensorFlowTransformer architecturesComputer visionSpeech processingNLPPose estimationGaze trackingAction recognitionTemporal CNNViTProsody analysisVoice activity detectionASRLLMsText embeddingsDataset curationDistributed training

Ready to apply?

Join Trellion

Apply Now