Master in Software and Systems

Knowledge Discovery in Databases

Lecturer (Coordinator):
Juan Pedro Caraça-Valente
jpvalente@fi.upm.es
Lecturer:
Aurora Pérez
aurora@fi.upm.es

Semester

First semester

Credits

4 ECTS

Outline

Techniques for knowledge discovery (or data mining) in large volumes of information are widely used today in different domains like medicine, banking environments, industrial systems, etc., and have a wide range of applications, such as, for example, data analysis, fraud detection, risk analysis, mailing campaigns, etc.

This subject will review all the stages of the knowledge discovery process and list the most important techniques for each stage. It will highlight data cleaning and preprocessing techniques, which are often overlooked.

It will then address major data mining techniques, including classification, clustering and association rules. Genetic algorithms have become exceptionally popular in recent years, and many have been applied in the field of knowledge discovery. This subject will also explore genetic algorithms.

There is a recent trend towards building temporal information into large databases to preserve historical information, be able to analyse the evolution of a variable or determine when a data item is valid. Additionally, there are domains where the information takes the form mainly of time series. Such domains require specialized treatment. This subject addresses information discovery techniques in time series, as this data type poses a major challenge to traditional data mining techniques and calls for new solutions.

Learning Goals

Syllabus

  1. Introduction
    1. Historical outline
    2. Basic concepts
  2. Knowledge discovery process stages
    1. Process stages
    2. Data preprocessing
  3. KDD Tools
    1. Introduction
    2. A KDD tool: WEKA
  4. Data mining techniques
    1. Classification
    2. Clustering
    3. Genetic Algorithms
    4. Temporal Data Mining
  5. Results evaluation
    1. Importance and objectives
    2. Verification and validation techniques

Recommended reading:

Assessment Method

This subject shall be graded on continuous assessment and the practical assignment.

Continuous assessment shall consider attendance, active participation and assessable exercises set in class.

The practical assignment shall be assessed according to the three phases described below and the respective weights.

Practical assignment

The practical assignment shall be completed by groups of two students or, exceptionally, individually. Students shall complete and submit the practical assignment incrementally as follows:

Students shall present the practical assignment in class. Each group shall have 15 minutes for the oral presentation, plus a 5-minute question and answer session.

Grading criteria

The subject shall be graded on a scale of 1 to 10: 3 points for continuous assessment and 7 points for the practical assignment. Students shall have to attend at least 70% of classes and attain an overall grade equal to or greater than 5 points to pass the subject.

The three practical assignment submissions are compulsory and shall be assessed according to the weights specified below.

More information:

This table shows the activity and the weight of each of these activities

Activity Weight
Practical assignment. Phase 1 10%
Practical assignment. Phase 2 20%
Practical assignment. Phase 3 10%
Practical assignment presentation 30%
Attendance, participation and assessable exercises 30%

The statements of each part of the practical assignment shall specify the submission and grade posting dates.

Students shall have the opportunity to submit any outstanding practical assignments in the deferred/referred examinations. Continuous assessment shall not be repeated, and the subject grade shall be calculated from the practical assignment. However, deferred/referred examinations may include an examination in substitution of continuous assessment.

Tuition language

Spanish (documentation in English)

Subject-Specific Competences

More information:

This table shows the code, description and proficiency level for each subject-specific competence

Code Competence Proficiency Level
SSC2 Acquisition of an advanced level of knowledge in order to analyse and synthesize solutions to problems requiring innovative approaches to the definition of the computational infrastructure, processing and analysis of heterogeneous data types S
SSC7 Knowledge of the theoretical foundations and training in the many available techniques for knowledge extraction and discovery from large datasets and related research topics S

Learning Outcomes

More information:

This table shows the code, description and proficiency level for each subject learning outcome

Code Learning Outcome Associated competences Proficiency level
RA-APDI-11 Be able to analyse a domain to determine the relevance of its temporal characteristics and the knowledge discovery tasks worth undertaking SSC2, SSC7 S
RA-APDI-12 Be able to use knowledge discovery techniques and their applicability in each case SSC2, SSC7 S
RA-APDI-13 Be able to conduct a complete evaluation of the operation and usefulness of such a project SSC2, SSC7 S

Learning Guide

Subject learning guide for Knowledge Discovery in Databases