The Proactive Robot - Enhanced Decision-Making for Sustained Mission Autonomy with Multimodal Large Language Models

Robots today are great at doing what we tell them - clean this floor, pick up that box, move from A to B. But what if they could decide for themselves what needs doing? Imagine a robot noticing a messy desk, spotting a coffee cup left behind, and cleaning up, without anyone asking.

That’s the vision behind ProPlan, a framework I developed during my time at QUT. Instead of following rigid, pre-programmed tasks, ProPlan uses Multimodal Large Language Models (MLLMs) to give robots a sense of context. By combining what they see (images of the environment) with what they know (text-based reasoning), robots can plan and execute meaningful actions on their own.

Why is this cool? Because it shifts robotics from task-oriented behaviour (“do this one thing”) to mission-oriented behaviour (“keep the room tidy”). That means robots can adapt, recover from mistakes, and stay useful over longer missions, whether it’s cleaning a kitchen, assisting in aged care, or organising industrial spaces.

In our experiments, a mobile robot equipped with ProPlan could map a room, spot out-of-place items, and carry out multi-step tidying tasks like placing a banana in a fruit bowl or returning mugs to the sink, all without explicit instructions.

👉 Read the full paper here
🎥 A video of ProPlan in action below.

Asier Goni

My name is Asier! I am a student engineer and wanted to find a way to share my projects in a slick way :)

https://asiergoni.com
Next
Next

Green Guide Solutions - BinSight