Concepts, Practice and Research Challenges in Data Mining

Lecturer (Coordinator):
Javier Segovia
fsegovia@fi.upm.es
Lecturer:
Ernestina Menasalvas
emenasalvas@fi.upm.es

Semester

Second semester

Credits

4 ECTS

Outline

This subject will detail the techniques, development processes, models and challenges in data mining project development. An impressive 60% of business intelligence projects are abandoned or fail due to inadequate planning, incomplete tasks, missed deadlines, poor project management, business requirements non-delivery, poor quality deliverables.

Any business intelligence project involves the development of a data mining project designed to discover "business intelligence". Many data mining project development process models have been proposed. It is evident that, despite all the research and projects conducted, the way in which data mining projects are developed today is still more like an art than an engineering process. Data mining experts automatically translate business requirements into goals and data mining techniques. This means that projects are fully dependent on their developers. If the data mining expert leaves, the project will fail as he or she will not have stipulated or documented the steps to be taken.

The first question is, What methodology should be followed to transform business goals into data mining goals? Unfortunately there is no such methodology to date. To answer this question, we have to address issues, like How are business goals stated? What is a data mining goal? What types of problems can data mining solve? What do all the problems have in common? What are the requirements for successfully solving a given problem? This subject will deal with the many approaches that try to solve these problems. Converting business intelligence project development from an art into a full-blown engineering discipline entails applying methodologies that conform to this new type of projects. Traditional development practices are inadequate and inappropriate as business intelligence is an evolving area in all organizations, subject to continual changes and improvements based on business community feedback.

Learning Goals

  • Understand the importance of data mining projects and differences from other project types developed by organizations
  • Analyse existing challenges for data mining project management

Syllabus

  1. Introduction to data engineering
  2. The tool: IBM SPSS modeler
  3. The process CRISP-DM
  4. Linear regression
  5. Logistic regression
  6. RFM analysis
  7. Decision trees
  8. Neural networks
  9. Clustering
  10. Nearest neighbour
  11. Association rules

Recommended reading

  • D. Hand: Principles of Data Mining (Adaptive Computation and Machine Learning), MIT Press, 2001
  • Jiawei Han, Micheline Kamber: Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006
  • Michael J. A. Berry, Gordon Linoff: Data Mining Techniques: Marketing, Sales and Customer Support, John Wiley & Sons, 1997
  • Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson Addison Wesley, 2005
  • Ian Witten, Eibe Frank, Mark Hall: Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2011

Prerequisites

  • Knowledge of database features, functionalities and structure and ability to properly use, and design, analyse and implement database applications
  • Knowledge and application of the key principles and basic techniques of artificial intelligence and their practical application
  • Skills to apply knowledge of statistics and optimization

Assessment Method

The evaluation is based on the assignments and the final project.

Assignments and projects will be performed individually or by groups, depending on the size of the course.

To pass the course it is mandatory to present all the assignments and the final project, in any modality of evaluation.

Participation in class would give a 10% increase in the final score.

Tuition language

English

Subject-Specific Competences

Code, description and proficiency level for each subject-specific competence
Code Competence Proficiency Level
CEM7 Evaluation and application of diverse mathematical and statistical theories, and available knowledge extraction and discovery processes, methods and techniques for large data volumes P
CEM8 Application of the theoretical and mathematical foundations of heterogeneous functions and data processing and analysis and evaluation and design of related methods for application in practical domains P

Learning Outcomes

Code, description and proficiency level for each subject learning outcome
Code Learning Outcome Associated competences Proficiency level
RA-APDI-19 Ability to proficiently apply a standard data mining process, including the business knowledge, data knowledge, data exploration analysis, modelling, evaluation and exploitation phases CEM2, CEM7 P
RA-APDI-20 Use software applications for data mining tasks CEM2, CEM7 P
RA-APDI-21 Understand the foundations and apply a broad and wide-ranging repertory of clustering, estimation, prediction and classification algorithms CEM2, CEM7 P
RA-APDI-22 Be familiar with examples of real applications and research trends and lines CEM2, CEM7 P

Learning Guide

Learning Guide: Concepts, Practice and Research Challenges in Data Mining