Knowledge Discovery in Databases
- Lecturer (Coordinator):
- Juan Pedro Caraça-Valente
- jpvalente@fi.upm.es
- Lecturer:
- Aurora Pérez
- aurora@fi.upm.es
Semester
First semester
Credits
4 ECTS
Outline
Techniques for knowledge discovery (or data mining) in large volumes of information are widely used today in different domains like medicine, banking environments, industrial systems, etc., and have a wide range of applications, such as, for example, data analysis, fraud detection, risk analysis, mailing campaigns, etc.
This subject will review all the stages of the knowledge discovery process and list the most important techniques for each stage. It will highlight data cleaning and preprocessing techniques, which are often overlooked.
It will then address major data mining techniques, including classification, clustering and association rules. Genetic algorithms have become exceptionally popular in recent years, and many have been applied in the field of knowledge discovery. This subject will also explore genetic algorithms.
There is a recent trend towards building temporal information into large databases to preserve historical information, be able to analyse the evolution of a variable or determine when a data item is valid. Additionally, there are domains where the information takes the form mainly of time series. Such domains require specialized treatment. This subject addresses information discovery techniques in time series, as this data type poses a major challenge to traditional data mining techniques and calls for new solutions.
Learning Goals
- Be aware of and know how to apply all the knowledge discovery process stages and the major techniques of each stage in a particular domain
- Know how to analyse a domain (problem, data and goals) to determine the key characteristics and their influence on decision making on which data mining to use
- Be aware of data mining techniques and know how to apply them to specific problems
- Evaluate the operation and results of a knowledge discovery system
Syllabus
- Introduction
- Data types, Time series
- Basic concepts
- Knowledge discovery process
- Knowledge discovery process stages
- Data preprocessing for basic data types and time series
- KDD Tools
- Background
- A KDD tool: WEKA
- Data mining techniques
- Classification
- Advanced Methods for Data Analysis
- Clustering
- Time Series Techniques
- Evaluation
- Objectives
- Evaluation techniques
Recommended reading
- WEKA
- J. Han; M. Kamber: "Data Mining: Concepts and Techniques". Morgan Kauffman. 2006
- M. Kantardzic: "Data Mining: Concepts, Models, Methods, and Algorithms". John Wiley & Sons. 2003.
- U. Fayyad; G. Piatetsky-Shapiro; P. Smyth: "From Data Mining to Knowledge Discovery in Databases". AI Magazine 17(3), 1996.
Tuition language
English
Subject-Specific Competences
Code | Competence | Proficiency Level |
---|---|---|
CEM2 | Acquisition of an advanced level of knowledge in order to analyse and synthesize solutions to problems requiring innovative approaches to the definition of the computational infrastructure, processing and analysis of heterogeneous data types | S |
CEM7 | Knowledge of the theoretical foundations and training in the many available techniques for knowledge extraction and discovery from large datasets and related research topics | S |
CEM8 | Application of the theoretical and mathematical foundations of heterogeneous functions and data processing and analysis and evaluation and design of related methods for application in practical domains | S |
Learning Outcomes
Code | Learning Outcome | Associated competences | Proficiency level |
---|---|---|---|
RA-APDI-68 | Be able to analyse a domain to determine the relevance of its temporal characteristics and the knowledge discovery tasks worth undertaking | CEM2, CEM7, CEM8 | S |
RA-APDI-69 | Be able to use knowledge discovery techniques and their applicability in each case | CEM2, CEM7, CEM8 | S |
RA-APDI-70 | Be able to conduct a complete evaluation of the operation and usefulness of such a project | CEM2, CEM7, CEM8 | S |