Back to Portfolio
ThesisJan – Apr 2025
Robotic Telemanipulation
NLP-controlled robotic system for mobile device interaction using computer vision and AI. MSc thesis project at NUS.
3-stage
Pipeline
tap, swipe, type, read
Interactions
MSc @ NUS
Thesis
2 yrs robotics
Foundation
Architecture
- Three-stage pipeline: NLP intent parsing → computer vision UI detection → robotic motion execution
- ROS for robot communication, OpenCV for vision processing
- Natural language → structured action → G-code motor commands
- Camera-based screen understanding for dynamic UI element identification
Key Decisions
Structured intermediate representation between NLP and robot
Why: Going directly from text to motor commands is too fragile
Tradeoff: Additional parsing step adds latency, but reliability is non-negotiable for physical systems
Proof-of-concept scope (4-month thesis timeline)
Why: MSc timeline limited scope — prioritized working demo over production robustness
Tradeoff: Not production-grade, but demonstrates the full pipeline end-to-end
Technologies
PythonComputer VisionNLPROSRobotics
What I Learned
- The gap between 'works in the lab' and 'works reliably' is enormous for physical systems — lighting, camera angles, and reflections break vision pipelines.
- NLP-to-robot translation needs a structured intermediate representation; going directly from text to motor commands is too fragile.
- Two years of robotics experience at Mozark (delta robots, CV pipelines) was the foundation that made a 4-month thesis feasible.