Data Mining

Lecturer (Coordinator):
Javier Segovia
fsegovia@fi.upm.es
Lecturer:
Ernestina Menasalvas
emenasalvas@fi.upm.es

Semester

Second semester

Credits

4 ECTS

Outline

This subject will detail the techniques, development processes, models and challenges in data mining project development. An impressive 60% of business intelligence projects are abandoned or fail due to inadequate planning, incomplete tasks, missed deadlines, poor project management, business requirements non-delivery, poor quality deliverables.

Any business intelligence project involves the development of a data mining project designed to discover "business intelligence". Many data mining project development process models have been proposed. It is evident that, despite all the research and projects conducted, the way in which data mining projects are developed today is still more like an art than an engineering process. Data mining experts automatically translate business requirements into goals and data mining techniques. This means that projects are fully dependent on their developers. If the data mining expert leaves, the project will fail as he or she will not have stipulated or documented the steps to be taken.

In the course, a methodology will be learned to convert business objectives into data analysis objectives, and the basic techniques of statistics and Artificial Intelligence will also be learned to achieve these objectives by practising with a professional Data Mining tool and real databases.

Learning Goals

  • Know examples of real applications and trends and lines of research
  • Manage software applications to perform data mining tasks
  • Understand the fundamentals and apply a wide and varied repertoire of clustering algorithms, estimation, prediction and classification

Syllabus

  1. Introduction to data engineering
  2. The tool: IBM SPSS modeler
  3. Descriptive, Diagnostic, Predictive and Prescriptive Analysis
  4. RFM analysis
  5. Clustering
  6. Linear regression
  7. Logistic regression
  8. Nearest neighbour
  9. Decision trees
  10. Neural networks
  11. Ensemble methods
  12. Association rules
  13. Dealing with time

Recommended reading

Prerequisites

  • Artificial Intelligence
  • Estadística

Tuition language

English

Subject-Specific Competences

Code, description and proficiency level for each subject-specific competence
Code Competence Proficiency Level
CEM2 Analysis and synthesis of solutions to problems requiring innovative approaches to the definition of the computational infrastructure, processing and analysis of heterogeneous data types C
CEM7 Evaluation and application of diverse mathematical and statistical theories, and available knowledge extraction and discovery processes, methods and techniques for large data volumes P
CEM8 Application of the theoretical and mathematical foundations of heterogeneous functions and data processing and analysis and evaluation and design of related methods for application in practical domains P

Learning Outcomes

Code, description and proficiency level for each subject learning outcome
Code Learning Outcome Associated competences Proficiency level
RA-APDI-20 Use software applications for data mining tasks CEM2, CEM7 P
RA-APDI-21 Understand the foundations and apply a broad and wide-ranging repertory of clustering, estimation, prediction and classification algorithms CEM2, CEM7, CEM8 P
RA-APDI-22 Be familiar with examples of real applications and research trends and lines CEM2, CEM7 P

Learning Guide

Learning Guide: Data Mining