Abstract: Conventional supervised learning algorithms are "passive", because they cannot operate unless the class labels of all the objects are already available. Object labeling, however, is exceedingly expensive in many scenarios, rendering the labeling cost to overwhelm that of the actual learning. This has motivated active learning -- a fast growing field in machine learning and data mining -- where learning and labeling are intermixed to minimize the amount of labeling without sacrificing the quality of learning.In this talk, we will look at a line of work that is similar to active learning in nature, but carries unique features that permit new algorithmic development with solid principles. We will introduce a series of problems that aim to classify a finite set of objects, based on a labeling oracle that charges a unit cost for every label revealed. Nothing is known about the oracle, except that it conforms to certain inference rules which can be used to deduce unknown labels from the revealed ones. The challenge is how to use these rules strategically to minimize the labeling work. We will discuss several upper bounds (algorithms) and lower bounds (hardness) on these problems. Open questions that call for non-trivial research will be raised. Bio: Yufei Tao is a Professor the School of Information Technology and Electrical Engineering, the University of Queensland (UQ) in Jan 2016. He served as an associate editor of ACM Transactions on Database Systems (TODS) from 2008 to 2015, and of IEEE Transactions on Knowledge and Data Engineering (TKDE) from 2012 to 2014. He served as a PC co-chair ofInternational Conference on Data Engineering (ICDE) 2014, and of International Symposium on Spatial and Temporal Databases (SSTD) 2011. He gave a keynote speech at International Conference on Database Theory (ICDT) 2016. He received two SIGMOD best paper awards, in 2013 and 2015, respectively. Yufei's current research aims to develop "small-and-sweet" algorithms: (i) small: easy to implement for deployment in practice, and (ii) sweet: having non-trivial theoretical guarantees. He is particularly interested in algorithms dealing with massive datasets that do not fit in memory.
Abstract: Over the past three decades, my interest in analysing data has taken me to highly applied areas and very theoretical topics of Computer Science and Applied Mathematics. My talk will draw from the lessons learnt during the past years with datasets from biotechnology, business and consumer analytics, linguistics and stylistics.
After all, if "Data is the new oil" and "Data scientist is the sexiest job of the 21st Century", aren't we witnessing a new paradigm change? What can we learn from the lessons learned over the past three decades?Like earthquakes, major shifts in Science come from slow and imperceptible changes over decades. In 1985 I attended a seminar that changed my life and drew me to Computer Science. Prof. Virasoro discussed, in a somewhat unified way, a number of seemingly disconnected themes. He presented a new optimisation method (Simulated Annealing), its application to Machine Learning (Neural and Boolean Networks), the progress made with Statistical Mechanics on the understanding of why some combinatorial optimisation problems are hard (exemplified with the Traveling Salesman Problem) and brought other exciting ideas in classification (Ultrametricity). Today many of these themes are part of the research area that we call "Data Science". There is an unconfirmed quote by Picasso that says: "Learn the rules like a pro, so you can break them like an artist." There is another two other ones attributed to him: "Art is the elimination of the unnecessary" and "Art is a lie which makes us see the truth." My talk will advocate for the development of new methodologies that help us to see the truth in data with methods that preserve the best parts of our craft. Bio: Prof. Pablo Moscato is an Australian Research Council Future Fellow and Professor of Computer Science at The University of Newcastle. At the California Institute of Technology (1988-89), he developed a methodology called "memetic algorithms", which is now widely used around the world in artificial intelligence, data science, and business and consumer analytics. Pablo was Founding Director of both the Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-based Medicine (2007-2015) and the Newcastle Bioinformatics Initiative (2002-2006). His expertise in data science was essential for a large number of applied projects. Prof. Moscato has been working in Applied Mathematics and Computer Science for 30 years, and in heuristic methods for Operations Research problems since 1985. Also at Caltech, he introduced the idea of a deterministic update in Simulated Annealing in 1990 (Physics Letters A, 146 (4), 204-208). He is responsible for the introduction of Evolutionary Attack of Algorithms too (Applied Mathematics Letters, 16 (1), 41-47), and used it against the world's best exact algorithm for the Traveling Salesman Problem (see Lecture Notes in Computer Science, vol. 6624, pp. 1-11, 2011). He has also introduced Fractal Instances of the Traveling Salesman Problem (INFORMS Journal on Computing, Vol 10, No. 2, 1998), gave tight bounds to the parameterized complexity of problems in (e.g. k-Feature Set, Journal of Computer and System Sciences, 67 (2003) 686-690) and developed new algorithms for phylogenetics, linguistics, visualization, and the analysis of datasets in Breast and Prostate Cancer, Melanoma and in neurodegeneration (e.g. Alzheimer's Disease). He introduced novel methods based on Information Theory to find biomarkers that track the progression of the disease. His work and ideas have been highly influential in a large number of scientific and technological fields, and his publications have been highly cited. The journal "Memetic Computing" is largely dedicated to the methodology he championed (memetic algorithms). Nearly every 24 hours a new published paper brings a novel application of these techniques. Due to this work and his other contributions in the areas of classification and machine learning, Pablo is now well-established around the world and he has become one of Australia's most respected computer scientists.
Abstract: The NSW public sector has a number of challenging goals to achieve in the coming years including gender equality for its senior leaders and doubling the number of Aboriginal people in senior roles by 2025. To achieve these targets a deep understanding of the characteristics of the workforce and is critical. The NSW Government has maintained an extensive dataset of its 400,000 strong workforce for nearly twenty years covering information on topics such as occupation, diversity, remuneration, leave and location. This has been primarily used for reporting and providing an evidence base for workforce reform but in the past two years has been used to provide insight and direction to the senior leaders of the Public Sector in the management of their workforce. Through better understanding the emerging and changing trends within the workforce and considering their broader relationship to organisational performance workforce data analytics is providing the impetus for change and a more effective and diverse NSW Public Sector.Bio: Scott Johnston is the Director of the Workforce Information Branch. Scott Joined the PSC in June 2014 after a long career in official statistics for both the Australian Bureau of Statistics and the Office for National Statistics (UK) where his focus was primarily economic statistics including prices and national accounts. Scott has a Bachelor of Commerce degree and post graduate qualifications in statistics and finance and investment. The Workforce Information Branch compiles the annual Workforce Profile and manages the various data assets that are contained within the PSC. An important role of the branch is to provide workforce analytical support and leadership to the PSC and the Sector.
Abstract: Massive amounts of geo-textual data that contain both geospatial and textual content are being generated at an unprecedented scale from social media websites. Example user generated geo-textual content includes geo-tagged micro-blogs, photos with both tags and geo-locations in social photo sharing websites, as well as points of interest (POIs) and check-in information in location-based social networks. This talk presents recent results by the speaker and his colleagues on querying, exploring, and mining geo-textual Data from Social Media feeds. The talk will cover (1) indexing and querying geo-textual data, (2) querying geospatial social media data streams, (3) exploratory search on geo-textual data, and (4) context-aware POI recommendations.Bio: Gao Cong is currently an Associate Professor in the School of Computer Engineering, Nanyang Technological University (NTU). Before joining NTU, he was an Assistant professor in Aalborg University, Denmark. Before that, he worked as a researcher at Microsoft Research Asia. From 2004 to 2006, he worked as a postdoc in University of Edinburgh. He received his Ph.D. in Computer Science from National University of Singapore in 2004. His current research interests include geospatial and textual data management, data mining, context aware recommendation, and mining social networks and social media. His work was published in ACM SIGMOD, VLDB, ACM KDD, ICDE, WWW, ACM SIGIR, etc. He served as a PC co-chair for E&A track of VLDB 2014. He is an associate editor for ACM Transactions on Database Systems (TODS).