An illustration depicting a drone in a warehouse
In the illustration above, a drone is depicted with inventory in a warehouse. TMU Computer Science Professor Richard Wang says small autonomous drones can assist with tasks like inventory control - but first they need to “open their eyes” - a feature Wang and his team are working on with industry partners. Illustration by Choi Haeryung.

Research

AI helping build smarter machines, from pixels to perception

Computer vision is learning to see like us

By Robert Gerlsbeck

An illustration depicting a drone in a warehouse
In the illustration above, a drone is depicted with inventory in a warehouse. TMU Computer Science Professor Richard Wang says small autonomous drones can assist with tasks like inventory control - but first they need to “open their eyes” - a feature Wang and his team are working on with industry partners. Illustration by Choi Haeryung.

Imagine robots that can spot crop pests, drones that navigate warehouse aisles without GPS and machines that can see and understand the world as we do.

At TMU's Computer Vision and Intelligent Systems Laboratory, Computer Science Professor Richard Wang is turning these possibilities into reality by giving machines the power of sight.

"If you really want robot intelligence, you have to let it see," says Wang, who leads research in computer vision at TMU.

His research is already transforming agriculture and warehouse logistics through practical applications of AI that can see and interpret visual information.

"Images contain a lot of information," he says. "Enabling AI to interpret what it sees can become a very powerful tool."

Robotic pest patrol

One real-world application of Wang’s computer vision research is revolutionizing farming. A few years ago, Wang teamed up with researchers at American universities, the U.S. Department of Agriculture and the National Science Foundation to tackle one of agriculture’s biggest problems: pests.

Every year, pests destroy up to 40 per cent of global crops, worth an estimated $US70 billion. 

Controlling insect populations requires around two million tonnes of pesticides each year, which contributes to pollution and can harm wildlife and people.

Wang and his fellow researchers used deep learning models and computer vision to enable machines to spot aphid clusters in sorghum fields. 

With this knowledge, robots can patrol fields and detect infestations. 

“Once an infection reaches a certain level,” he says, “a sprayer on the robot can target and apply chemicals to infected areas, rather than spraying the entire field.”

Richard Wang poses for a photo near a window.

TMU computer science professor Richard Wang said that about 25 years ago, progress in computer vision was slow - but today, conferences on the topic attract thousands from both academia and industry. Photo by Sarah McIntyre.

Drone ranger

Wang's lab is also partnering with industry to revolutionize warehouse operations using computer vision-equipped drones.

Distribution centres are busy places, shipping thousands of different products in and out. Efficiency and accuracy are essential. 

Small autonomous drones can assist with tasks such as pallet identification and inventory control. But first, they need to “open their eyes.”

Wang and his lab are working with industry to enable drones to operate indoors. They partnered with SOTI Aerospace, a Mississauga, Ont.-based mobile technology firm, to develop computer vision algorithms that enable drones to navigate freely and safely in indoor environments.

Since drones can’t rely on GPS inside a warehouse, Wang says, “the vision system will allow them to understand their surroundings, plan optimal flight paths and avoid collisions.”

They’re also collaborating with Toronto-based Antek Logistics to track freight through unloading, stocking and shipping using computer vision. The system Wang and his team developed is being tested at Antek’s Toronto and Montreal warehouses.

The science behind sight

Behind Wang's applications lies a fundamental challenge: How to make machines interpret visual information as effectively and efficiently as humans do.

Human vision is stunningly complex. When we glance at a tree and a house, we instantly know which is which – not because our eyes take a perfect picture, but because our minds constantly interpret what we see. 

Vision accounts for around  80 per cent of human perception, making it the primary means by which we understand the world around us. 

Replicating this ability in computers requires more than just cameras – it demands advanced algorithms capable of processing and interpreting visual data.

This is where Wang's expertise comes in. A recent study he supervised improves vision transformer models (which break down images into "patches") to learn from smaller datasets, solving data limitation challenges in real-world use and lowering the computational cost.

TMU prof on a mission

Wang’s own journey started in China, where he spent his childhood building electric motor and home-made radios. 

Later, he moved to Canada to earn his PhD, then taught at the University of Kansas, in the U.S. 

In 2020, he returned to Canada to join the Faculty of Science at TMU, where he also leads the Computer Vision and Intelligent Systems Laboratory.

When Wang began working in the field some 25 years ago, progress in computer vision was still slow and drew limited interest. He remembers attending conferences that attracted only a small number of researchers from around the world. Today, those same gatherings pull in many thousands from both academia and industry.

The past decade has been a game-changer. Advances in neural networks, computational resources and the explosion of big data have made computer vision a reality, powering everything from self-driving cars to the next generation of robotics.

Necessary foresight

Next up? Computer vision researchers are turning their attention to combining computer vision with large language models (like ChatGPT). An emerging area of study, Vision-Language Models, or VLMs, promises to give machines greater reasoning capabilities.

Wang predicts VLMs and other computer vision technologies will have profound effects on artificial intelligence capabilities in coming years. “We’re approaching a point,” he says, “where machines will be able to see and understand the world as well as humans.”

 

Robert Gerlsbeck.
Robert Gerlsbeck is a freelance journalist based in Kingston, Ont. His articles have run in the Globe and Mail, Toronto Star and Today’s Parent, and he’s been an editor at MoneySense, Marketing and other magazines. He started his career as a daily newspaper reporter in Oshawa, Ont.

Love the digital mag?

Sign up now. Don’t miss an issue of Toronto Met University Magazine.