Back to Portfolio
ThesisJan – Apr 2025

Robotic Telemanipulation

NLP-controlled robotic system for mobile device interaction using computer vision and AI. MSc thesis project at NUS.

3-stage
Pipeline
tap, swipe, type, read
Interactions
MSc @ NUS
Thesis
2 yrs robotics
Foundation

Architecture

  • Three-stage pipeline: NLP intent parsing → computer vision UI detection → robotic motion execution
  • ROS for robot communication, OpenCV for vision processing
  • Natural language → structured action → G-code motor commands
  • Camera-based screen understanding for dynamic UI element identification

Key Decisions

Structured intermediate representation between NLP and robot

Why: Going directly from text to motor commands is too fragile

Tradeoff: Additional parsing step adds latency, but reliability is non-negotiable for physical systems

Proof-of-concept scope (4-month thesis timeline)

Why: MSc timeline limited scope — prioritized working demo over production robustness

Tradeoff: Not production-grade, but demonstrates the full pipeline end-to-end

Technologies

PythonComputer VisionNLPROSRobotics

What I Learned

  • The gap between 'works in the lab' and 'works reliably' is enormous for physical systems — lighting, camera angles, and reflections break vision pipelines.
  • NLP-to-robot translation needs a structured intermediate representation; going directly from text to motor commands is too fragile.
  • Two years of robotics experience at Mozark (delta robots, CV pipelines) was the foundation that made a 4-month thesis feasible.