Melbourne School of Engineering Computer Science & Software Engineering

Knowledge Discovery

Overview

The central theme of the Knowledge Discovery Research Group is storing, manipulating, and exploiting information stored on computers. That information comes in many forms, and the various projects that are being carried out within the group correspond to the various forms of data that computers store. For example, when the data is in the form of structured records, data mining techniques can be used to extract knowledge. When the data is unstructured text, information retrieval techniques are applied. And when the data is spoken or written language records, human language technology comes in to play.

The paragraphs below describe the current activities of the group. For more detailed information, and advice about research opportunities, contact the leader of that project.


Data Compression and Coding

Compression is an important enabling technology, and is embedded in devices ranging from fax machines to digital cameras. The emphasis in this project is on lossless compression modelling for text, images, and other data sources; and on source coding techniques including algorithms for constrained and unconstrained minimum-redundancy coding, and techniques for fast arithmetic coding. One particular emphasis is the compression mechanisms used in large-scale information retrieval systems, including for the text, and for the inverted index. A range of novel techniques have been developed.

Project Leader: Professor Alistair Moffat.
More Information: http://www.cs.mu.oz.au/~alistair/abstracts/
Project Members: Mike Ciavarella.
Research Students: R. Isal, Kusnadi Kusnadi, Mike Liddell.
Funding: This project has received funding via ARC Discovery projects.

Human Language Technology

Human language and communication are extremely complex, naturally occurring phenomena. The analysis of language, whether in speech, text or multimodal forms, is a significant computational challenge. As the quantity of information on the Web grows, the need for automated language analysis systems becomes more pressing. Equally, as mobile and embedded computers proliferate, the need for natural linguistic interaction between humans and machines becomes more acute. Language technologies are beginning to address these needs. In this project we are pursuing open research questions in the following areas: language modelling, multimodal linguistic annotation, high performance computing for natural language processing, electronic documentation of endangered languages, structured models for linguistic data, and digital libraries for language resources.

Project Leader: Associate Professor Steven Bird.
More Information: University of Melbourne Language Technology Group
Project Members: Dr Tim Baldwin, Cathy Bow, Baden Hughes.
Research Students: Phil Blunsom, Trevor Cohn, Rebecca Dridan, Rod Farmer, Edward Ivanovic, Catherine Lai, Robert Marshall, David Penton, Patrick Ye.
Funding: This project has received funding from the Australian Research Council, the Australian Institute for Aboriginal and Torres Strait Islander Studies, the United States National Science Foundation, the Victorian Partnership for Advanced Computing, the Association for Computational Linguistics, and the Linguistic Data Consortium at the University of Pennsylvania.
Other Collaborators: Professor Robert Dale (Macquarie University), Dr Nicholas Evans (University of Melbourne), David Grayden (Bionic Ear Institute), Dr Steve Cassidy (Macquarie University), Dr Gillian Wigglesworth (University of Melbourne).

Information Discovery

Large document collection, such as those maintained by corporations, provide challenges in several ways. There is the challenge of storing the data efficiently; the challenge of indexing it, so that it can be searched by content; and there is the challenge of knowing how to implement searching so as to return the documents that are likely to be relevant to the query posed by the searcher. In this project we consider the efficient and effective implementation of Information Retrieval systems for multi-gigabyte text collections. We are particularly interested in storage and indexing solutions that provide fast heuristic searching; and in techniques that offer the prospect of scaling to cope with very large collections indeed. Experimental work includes involvement with the US-funded TREC investigation.

Project Leader:Professor Alistair Moffat.
More Information: http://www.cs.mu.oz.au/~alistair/abstracts/
Research Students: Vo Ngoc Anh, Li Sun, Raymond Wan.
Other Collaborators: Dr David Hawking (CSIRO), Professor Justin Zobel (RMIT).
Funding: This project has received funding via ARC Discovery projects, and from the Victorian Partnership for Advanced Computing.

Machine Learning and Data Mining

Finding patterns in large collections of data and using these patterns for reasoning is a challenging task. In this project we are interested in efficient algorithms for mining a number of fundamental types of patterns and methods for using these patterns in classification.

Project Leader: Dr James Bailey.
More Information: http://www.cs.mu.oz.au/~jbailey/
Project Members: Professor Rao Kotagiri, Dr Chris Leckie, Dr Laurence Park, Dr Tao Peng.
Research Students: Hongjian Fan, Hamad Al Hammaday, Thomas Manoukian, Benjamin Rubinstein, Roger Ting.
Funding: This project has received funding via ARC Discovery projects, and from the Victorian Partnership for Advanced Computing.

Security and Cryptography

Security is an important topic in modern computing and networking technologies. Sensitive and financially important data needs to be protected against both active and passive malicious activities, including eavesdropping and unauthorized access. Cryptography provides essential tools to achieve the necessary information security objectives, including confidentiality, integrity and authentication. In this project we are concerned with both theoretical and practical problems in security and cryptography

Project Leader: Dr Udaya Parampalli.
More Information: http://www.cs.mu.oz.au/~udaya/
Research Students: Andrew Newlands, Shivaramakrishnan, Abdun Mahmood.
Other Collaborators: Dr Margreta Kuijper (Electrical and Electronic Engineering), Dr Xinwen Wu (Electrical and Electronic Engineering).
Funding: This project has received funding via ARC Discovery projects.

Spatial Data

Spatial data is an important component in many modern applications. Cars with global positioning systems, geographic information systems utilized in natural resources management, and even homepages of schools, hospitals, and restaurants with addresses in them are forming the pieces of this emerging wave of spatial content. In addition, recent advances in mobile and ubiquitous computing will make even more spatial content available, while making the content more decentralized and dynamic. Current querying technologies do not address the needs of accessing such spatial content. First, users cannot easily search within the data. Second, users cannot utilize various attributes of the data for their searches but only simple attributes like filenames. Queries like searching a region in a city for a certain cuisine are impossible to perform satisfactorily with current methods. In this project we are interested in finding new methods that enable users to efficiently query spatial content in both centralized and decentralized environments.

Project Leader: Dr Egemen Tanin.
More Information: http://www.cs.mu.oz.au/~egemen/

XML and Semi-Structured Data

XML is in important standard for representing and exchanging information on the Web. In this project we are interested in the analysis and efficient implementation of languages for querying and transforming XML, such as XPath and XSLT. Another focus in on data mining methods for XML and data on the Web.

Project Leader: Dr James Bailey.
More Information: http://www.cs.mu.oz.au/~jbailey/
Project Members: Professor Rao Kotagiri.
Research Students: Ce Dong, Zhou Zhu.
Funding: This project has received funding from the European Union.

 


Selected Publications, 2001-2004

    Books

  1. A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, Massachusetts, 2002.

    Edited Collections

  2. S. Bird and J. Harrington (eds). (2001) Speech Annotation and Corpus Tools - Special Issue. Speech Communication 33(1,2).

    Chapters in Books

  3. R. Baeza-Yates, A. Moffat and G. Navarro. Searching large text collections. In J. Abello, P. Pardalos and M. Resende (Eds.) Handbook of Massive Data Sets. pp.195--244. Kluwer Academic Publishers, Boston, 2002.
  4. P. Gruba. Computer-assisted language learning. In A. Davies and C. Elder (Eds.) The Handbook of Applied Linguistics. pp.623-648. Blackwell, London, 2004.

    Journal Articles

  5. S. Au, C. Leckie, A. Parhar and G. Wong. (2004) Efficient visualization of large routing topologies. International Journal of Network Management 14 pp.105-118.
  6. J. Bailey, A. Poulovassilis and P. Wood. (2002) Analysis and optimisation of event-condition-action rules on XML. Computer Networks 39(3) pp.239-260.
  7. J. Bailey, G. Dong and R. Kotagiri. (2004) On the decidability of the termination problem of active database systems. Theoretical Computer Science 311(1-3) pp.389-437.
  8. S. Bird and M. Liberman. (2001) A formal framework for linguistic annotation. Speech Communication 33(1-2). pp.23-60.
  9. S. Bird and G. Simons. (2003) Extending Dublin Core metadata to support the description and discovery of language resources. Computers and the Humanities 37(4) pp.375-388.
  10. S. Bird and G. Simons. (2003) Seven dimensions of portability for language documentation. Language 79(3) pp.557-582.
  11. A. Bonnecaze, P. Solé and P. Udaya. (2001) Tricolore 3-designs in type III codes. Discrete Mathematics 241(1-3) pp.129-138.
  12. O. de Kretser and A. Moffat. (2004) SEFT: A search engine for text. Software Practice and Experience 34(10) pp.1011-1023.
  13. K. Horadam and U. Parampalli. (2003) A new class or ternary cocyclic Hadamard codes. Applicable Algebra in Engineering, Communication and Computing 14(1) pp.65-73.
  14. K. Horadam and P. Udaya. (2002) A new construction of central relative (p^a, p^a, p^a, 1)-difference sets. Designs, Codes and Cryptography 27(3) pp.281-295.
  15. H. Samet, H. Alborzi, F. Brabec, C. Esperanca, G. Hjaltason, F. Morgan and E. Tanin. (2003) Use of the SAND spatial browser for digital government applications. Communications of the ACM 46(1) pp.63-66.
  16. G. Simons and S. Bird. (2003) Building an open language archives community on the OAI foundation. Library Hi Tech 21(2) pp.210-218.
  17. G. Simons and S. Bird. (2003) The open language archives community: an infrastructure for distributed archiving of language resources. Literary and Linguistic Computing 18(12) pp.117-128.
  18. A. Turpin and A. Moffat. (2001) On-line adaptive canonical prefix coding with bounded compression loss. IEEE Transactions on Information Theory 47(1) pp.88-98.
  19. X. Wu, M. Kuijper and U. Parampalli. (2003) Lee-metric decoding of BCH and Reed-Solomon Codes. Electronics Letters 39(21) pp.1522-1523.

    Conference Publications

  20. K. Bae and J. Bailey. CodeX: an approach for debugging XSLT transformations. In Proceedings of the 4th International Conference on Web Information Systems Engineering pp.309-312 Rome, Italy, December 2003.
  21. J. Bailey, T. Manoukian and R. Kotagiri. Classification using constrained emerging patterns. In Advances in Web-Age Information Management (LNCS 2762) pp.226-237 Chengdu, China, August 2003.
  22. J. Bailey, T. Manoukian and R. Kotagiri. A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In Proceedings of the Third IEEE International Conference on Data Mining pp.485-488 Melbourne, Florida, November 2003.
  23. M. Ciavarella and A. Moffat. Lossless image compression using pixel reordering. In Proceedings of the 27th Australasian Computer Science Conference pp.125-132 Dunedin, New Zealand, January 2004.
  24. N. Craswell, F. Crimmins, D. Hawking and A. Moffat. Performance and cost tradeoffs in web search. In Proceedings of the 19th Australasian Database Conference pp.161-169 Dunedin, New Zealand, January 2004.
  25. H. Fan and R. Kotagiri. A Bayesian approach to use emerging patterns for classification. In Proceedings of the 18th Australasian Database Conference pp.39-48 Adelaide, Australia, February 2003.
  26. H. Fan and R. Kotagiri. Efficiently mining interesting emerging patterns. In Advances in Web-Age Information Management (LNCS 2762) pp.189-201 Chengdu, China, August 2003.
  27. D. Gibbon, C. Bow, S. Bird and B. Hughes. Securing interpretability: the case of Ega language documentation. In Proceedings of the 4th International Conference on Language Resources and Evaluation pp.1369-1372 Lisbon, Portugal, May 2004.
  28. B. Hughes. Metadata Quality Evaluation: Experience from the Open Language Archives Community. Proceedings of the 7th International Conference on Asian Digital Libraries. Lecture Notes on Computer Science 3334. Springer-Verlag. pp 320-329 2004.
  29. B. Hughes, S. Bird and C. Bow. Encoding and presenting interlinear text using XML technologies. In Proceedings of the Australasian Language Technology Workshop 2003 pp.105-113 Melbourne, Australia, December 2003.
  30. B. Hughes and S. Bird. Grid-enabling natural language engineering by stealth. In Proceedings of the 2003 Workshop on Software Engineering and Architecture of Language Technology Systems pp.31-38 Edmonton, Canada, May 2003.
  31. B. Hughes, D. Penton, S. Bird, C. Bow, G. Wigglesworth, P. McConvell and J. Simpson. Management of metadata in linguistic fieldwork - experience from ACLA. In Proceedings of the 4th International Conference on Language Resources and Evaluation pp.193-196 Lisbon, Portugal, May 2004.
  32. R. Kotagiri and J. Bailey. Discovery of emerging patterns and their use in classification. In AI 2003: Advances in Artificial Intelligence (LNCS 2903) pp.1-12 Perth, Australia, December 2003.
  33. C. Leckie and R. Kotagiri. Policies for sharing distributed probabilistic beliefs. In Proceedings of the 26th Australasian Computer Science Conference pp.285-290 Adelaide, Australia, February 2003.
  34. M. Liddell and A. Moffat. Hybrid prefix codes for practical use. In Proceedings of the IEEE Data Compression Conference pp.392-401 Snowbird, Utah, March 2003.
  35. L. Park and K. Ramamohanarao. (2004) Hybrid pre-query term expansion using Latent Semantic Analysis. IEEE Conference on Data Mining 2004 pp.178-185.
  36. T. Peng, C. Leckie and R. Kotagiri. Detecting distributed denial of service attacks by sharing distributed beliefs. In Information Security and Privacy (LNCS 2727) pp.214-225 Wollongong, Australia, July 2003.
  37. T. Peng, C. Leckie and R. Kotagiri. Detecting reflector attacks by sharing beliefs. In Globecom '03 - IEEE Global Telecommunications Conference pp.1358-1362 San Francisco, California, December 2003.
  38. T. Peng, C. Leckie and R. Kotagiri. An efficient filter for denial-of-service bandwidth attacks. In Globecom '03 - IEEE Global Telecommunications Conference pp.1353-1357 San Francisco, California, December 2003.
  39. T. Peng, C. Leckie and R. Kotagiri. Protection from distributed denial of service attacks using history-based IP filtering. In Frontiers in Telecommunications: 2003 IEEE International Conference on Communications pp.1-5 Anchorage, Alaska, USA, May 2003.
  40. T. Peng, C. Leckie and R. Kotagiri. Proactively detecting distributed denial of service attacks using source IP address monitoring. In Third International IFIP-TC6 Networking Conference (Networking 2004) pp.771-782 Athens, Greece, May 2004.
  41. R. Wan and A. Moffat. Evaluating statistically generated phrases. In Proceedings of the Eighth Australasian Document Computing Symposium pp.67-70 Canberra, Australia, December 2003.
  42. X. Wu, M. Kuijper and U. Parampalli. A Lee-Metric decoding algorithm for Reed-Solomon codes over GF(p)*. In Proceedings of the 7th International Syposium on Digital Signal Processing and Communications Systems and the 2nd Workshop on the Internet, Telecommunications and Signal Processing pp.26-31 Gold Coast, Australia, December 2003.
  43. A. Vo and A. Moffat. Integrated impacts for web retrieval. In Proceedings of the Eighth Australasian Document Computing Symposium pp.25-30 Canberra, Australia, December 2003.
  44. A. Vo and A. Moffat. Index compression using fixed binary codewords. In Proceedings of the 19th Australasian Database Conference pp.61-67 Dunedin, New Zealand, January 2004.

 

top of page