Anton van den Hengel Professor, School of Computer Science, The University of Adelaide


About

My CV is available here

I am currently a Professor in the Scool of Computer Science at the University of Adelaide, the Director of the Australian Centre for Visual Techonolgies, and a Program Leader in The Data to Decisions (D2D) CRC.

I have worked in a variety of research areas within Computer Vision including Parameter Estimation, Machine Learning, Video Surveillance and Interactive Image-based Modelling. In each case my discoveries have come through the application of fundamental mathematics to problems of practical impact.

Parameter Estimation

Iconic entities such as the Fundamental Matrix, Homographies and Tri-Focal Tensor take very specific forms as a result of the projection that lies at the heart of the image formation process. With Profs Brooks and Chojnacki I developed a statistical foundation for the estimation of such projective entities on the basis of image-based measurements, which is one of the core problems in Computer Vision. Over 6 years we carried out an in-depth mathematical analysis of the nature of the entities to be estimated and the constraints that bind the values that they may reasonably take. This investigation required developing sophisticated mathematical tools, and a common framework for the competing estimation strategies (which we labelled the Fundamental Numerical Scheme), both of which have since been taken up more broadly by other researchers. This depth of analysis was necessary to understand the abstruse relationships between the noise in the data and the value of the constrained final estimate. This issue had confounded previous attempts at this analysis, leading them into overly complex and less effective solutions. This research began from a desire to find a statistically valid means of enforcing the well-understood constraints that bind the Fundamental Matrix, but developed into the discovery of the more complex constraints that bind the Tri-Focal Tensor and later sets of consistent Homographies. The constraints upon the forms of these entities are extremely complicated, but need to be satisfied if an estimated parameter set is to be physically valid. In each case the constraints we uncovered remain the maximal set discovered thus far, and the estimation processes we devised are the only methods available which actually enforce the full set. In the case of consistent sets of Homographies we proved that ours constituted the full set of constraints.

This work delivered a new, deeper, understanding of these fundamental entities, and a new approach to their estimation which enforces the inherent constraints upon their forms. It demonstrated the shortcomings of standard unconstrained estimation techniques, and the situations in which they may safely be used. Although this work was not solely responsible for the subsequent widespread adoption of statistical methods within Computer Vision, it certainly played a vital role. Some of the complex mathematical analysis carried out is accessible only to mathematically focussed researchers, but the fundamental nature of the work sees the papers continuing to be cited regularly a decade after publication. My recent work in this area into the estimation of a wider range of parameters, one aspect of which received the best paper prize at the 2010 IEEE Conference on Computer Vision and Pattern Recognition, which is one of the most prestigious awards in Computer Vision.

Machine Learning

My contributions to Machine Learning have largely focussed on improving algorithm performance without increasing computational load. This work is fundamental in nature, and has involved the development and analysis of a variety of approaches to feature selection, feature sharing, and data-dependent hashing. My research in this area was initially motivated by the desire to improve the performance of real-time tracking in video surveillance.

My work in the area has developed new efficient methods for carrying out standard tasks in Machine Learning. The scale at which Machine Learning algorithms operate is one of their key limitations. It limits the ability to produce practical outcomes, and to deliver the broader potential benefits that might come from operating at the same scale as human beings.

Much of this work has been carried out with Dr. Chunhua Shen, an ex PhD student, and now colleague, and Dr. Qinfeng Shi, a post-doc. My role in the group, having previously developed the research theme, is in identifying the critical impediments that must be overcome to improve algorithm performance, and the techniques by which this might be achieved. This effort is particularly directed towards algorithms that are of general practical significance within the field, or which are important to key problems within my research group. Together with this group I have carried out a thorough analysis of the mathematics, and particularly the applications of constrained optimisation, which underpin much of modern Machine Learning. The depth of this analysis has uncovered methods able to increase the scale and efficiency of existing algorithms, and new algorithms offering the same benefits when applied to existing problems. Machine Learning is an extremely mathematically sophisticated area. The key to our success has been the combination of skills we have developed within the team, allowing comparisons to be made across algorithms, applications, and disciplines.

In the course of this analysis of existing algorithms we began a process of questioning the fundamental assumptions on which they rely. Two of my papers have invalidated the assumptions underpinning some of the most popular Machine Learning techniques:

The first paper analyses whether the popular data reduction method of random projections can justifiably be applied as a pre-process to Machine Learning algorithms which rely on the margin between sets within the data. The paper showed that these methods cannot be used unaltered, and derived a modified form of margin which may be applied within these methods. The second paper questions whether sparse representations of the space of all faces are really justifiable. Such representations have been used widely in the literature, but are shown in our work to be unsupported by the data. This work was quite unpopular with adherents to the method, but has received praise from researchers around the globe for its insight, and the rigor with which it was carried out.

Evidence of the impact of this work is evident in the classification-based approach to the analysis of agar plates my team and I have developed in collaboration with LBT Innovations, a small South Australian medical devices company. The approach we developed is currently being prepared for US FDA approval. The technology offers significant benefits to the health system through greatly reduced plate processing costs and improved speed and accuracy. LBT are making a significant investment in developing the technology, and an even greater investment in commercialising it. Video Surveillance

The tracking of targets through video is one of the fundamental problems in Video Surveillance, because higher-level analysis so often depends on an accurate understanding of the movement of targets through the scene. It was originally envisaged by many researchers that Video Surveillance was destined to be a very applied area of research, because it had an obvious practical application. Video Surveillance has, however, turned out to be far more difficult than most researchers expected it to be, largely because tracking is a much more obstinate problem than was envisaged.

Initially I worked on developing statistical approaches to improving the efficiency of the tracking process. Work on probabilistic cue integration, and on kernel density mode seeking demonstrated that easily implemented changes to existing techniques could significantly improve tracking performance. My most recent contribution in tracking has been the development of reservoir sampling methods for online appearance modelling. Online appearance modelling has received a vast amount of attention from the research community, because the offline alternative is impractical. The reservoir sampling approach we proposed is the first that has been proven to converge to its offline counterpart -- a very significant result. Reservoir sampling thus provides a statistically well-founded method for updating an appearance model over time, and across multiple viewpoints. This result was achieved because the team involved carried out the depth of mathematical analysis required to bring together results from different fields to address this important problem. This approach has not solved the problem of tracking in its entirety, but it does provide a strong statistical basis from which to proceed. My most significant contribution to the field of Video Surveillance has been in the estimation of the relationships between the fields of view of surveillance cameras. This information is essential for cross-camera tracking, particularly over large networks of cameras, which is the most practically significant case. An estimate of the relationships between the fields of view of the cameras thus provides access to the information generated by such networks which would otherwise be completely incomprehensible. Previous research had addressed the problem by trying to match features between fields of view, but this approach does not scale as the features visible in most networks are not individually distinctive. An alternative approach calculated correlations between the traffic viewed by different cameras in network, but even after significant modification could not be made to scale to the size of practical camera networks. After much exploration of statistical alternatives, the breakthrough was the realisation that an understanding of the relationships between the fields of view of the cameras might be achieved only by identifying pairs of regions that do not overlap. This `Exclusion' method showed that it was possible to estimate the topology accurately without human input, and that this might be achieved for very large camera networks.

I formed a (University owned) start-up company to commercialise the technology in 2009. The company has since received over $3M in Venture capital, has 5 engineers, released version 1.1 of the software, signed partnership agreements with Milestone and DVTel and an international distribution agreement Pacific Communications, deployed the system at a major Australian airport, and is currently negotiating an arrangement with Honeywell, a major supplier of surveillance systems. The CTO of the company is one of the members of the team I formed to carry out the research, and I serve as a member of the board. Interactive Image-based Modelling

My team and I pioneered the development of interactive technologies for generating 3D models that accurately reflect the shapes of objects depicted in image sets. This process is a key step in a number of processes in film postproduction, visualisation and simulation, but its greatest potential lies in user-created 3D content. This was the first published work in interactive structure-from-motion, a field which has since gone on to become popular amongst researchers and in industry.

Videotrace, one piece of work in this area, is innovative in that it allows a user to control the application of a set of sophisticated Computer Vision algorithms through a set of powerful yet intuitive interactions, informed by structure-from-motion analysis. Although not our first work in the area, Videotrace was the first to have a significant global impact. The work was published in SIGGRAPH, which is the single most prestigious venue in Computer Graphics, the YouTube video for VideoTrace has been viewed over 273,000 times, and the technical web page for the project has had over 400,000 views. Prior to this research there was a view within Computer Vision that interaction is an indication of a failed attempt at automation rather than a critical tool in generating outcomes of practical significance. Interactive image-based modelling has since become a focus of both research and commercial attention. Autodesk, and Google, amongst others, have since released tools that follow the Videotrace approach to interactive image-based modelling.