People have an unrivalled ability to recognise and follow people they know, for example, both from behind and in profile, and no matter how they behave. It is this ability that future collaborative robots will need, to some extent, in order to be able to work with us. Applications are found in many fields, such as autonomous vehicles that must keep track of the motion of cars, pedestrians, cyclists and animals; collaborative drones that must be able to keep tabs on each other and other objects; and – not least – for the social robots we expect to be developed in the future.
Goutam Bhat, today PhD student at ETH in Zürich.Goutam Bhat arrived at LiU in the autumn of 2016 on an internship in the group of Michael Felsberg, professor in computer vision.“We usually take on one or two students on internship from an excellent university every year. Goutam already had the qualities expected of a doctoral student when he arrived”, confirms Michael Felsberg.
After that year, Goutam Bhat chose to continue to a master’s degree in computer science, specialising in computer vision. He carried out his prize-winning degree project in the Computer Vision Laboratory.
“The year I was working in the division gave me some ideas that I wanted to test, and they then became my degree project”, he says.
Visual object tracking
To put it simply, visual object tracking proceeds by defining a square region, such as a region with a face, in a photograph, in what is called a “bounding box”. The colours of the pixels together with the shapes and other characteristics inside the box build a model that the computer learns to recognise. As the face moves, its size and possibly also its colour and lighting change, which means that the model must be continuously updated. This is where machine learning and deep learning come in.
The computer analyses all of the images it receives and learns how the object, and thus also the model, can change both in colour and shape. When the face the computer is to follow is a member of a crowd, it may be hidden by passers by, and the computer must also in these conditions learn to recognise the face and rapidly relocate the defined box.
This is a difficult problem in machine learning, and one of Goutam Bhat’s ideas was to train a classifier. In this case, the computer first learns to classify pixels, assign them to groups, which simplifies and accelerates the otherwise extremely demanding calculations.
High ranked conference
“From a technical point, we realized how important it was to treat the bounding box estimation task in tracking separately from the target localization task”, says Goutam Bhat.
The result was a mixture of online and offline learning, and an algorithm for computer vision that is the best currently available.
Michael Felsberg Photo credit Göran Billeson“The algorithm is extremely fast, close to realtime performance. Goutam has made a significant contribution to methods in the field, shown that it works, and documented everything at a professional scientific level”, says Michael Felsberg.
The work has also been published in the proceedings of one of the highest ranked computer and information science conferences in the world, CVPR2019, and Goutam Bhat is co-author on several scientific publications.
He is to continue his scientific career as doctoral student at the Computer Vision Lab at ETH in Zürich. And it his work there that prevents him travelling to Linköping to receive the award.
“I’m on a conference in Seoul, but I hope to be able to join in over a video link. And I plan to continue working with object tracking”, he assures us, happy and grateful to receive the scholarship.
Translated by George Farrants