Concepts, Practice and Research Challenges in Data Mining
- Lecturer (Coordinator):
- Javier Segovia
- Ernestina Menasalvas
This subject will detail the techniques, development processes, models and challenges in data mining project development. An impressive 60% of business intelligence projects are abandoned or fail due to inadequate planning, incomplete tasks, missed deadlines, poor project management, business requirements non-delivery, poor quality deliverables.
Any business intelligence project involves the development of a data mining project designed to discover "business intelligence". Many data mining project development process models have been proposed. It is evident that, despite all the research and projects conducted, the way in which data mining projects are developed today is still more like an art than an engineering process. Data mining experts automatically translate business requirements into goals and data mining techniques. This means that projects are fully dependent on their developers. If the data mining expert leaves, the project will fail as he or she will not have stipulated or documented the steps to be taken.
The first question is, What methodology should be followed to transform business goals into data mining goals? Unfortunately there is no such methodology to date. To answer this question, we have to address issues, like How are business goals stated? What is a data mining goal? What types of problems can data mining solve? What do all the problems have in common? What are the requirements for successfully solving a given problem? This subject will deal with the many approaches that try to solve these problems. Converting business intelligence project development from an art into a full-blown engineering discipline entails applying methodologies that conform to this new type of projects. Traditional development practices are inadequate and inappropriate as business intelligence is an evolving area in all organizations, subject to continual changes and improvements based on business community feedback.
- Understand the importance of data mining projects and differences from other project types developed by organizations
- Analyse existing challenges for data mining project management
- Introduction to data engineering
- The tool: IBM SPSS modeler
- The process CRISP-DM
- Linear regression
- Logistic regression
- RFM analysis
- Decision trees
- Neural networks
- Nearest neighbour
- Association rules
- D. Hand: Principles of Data Mining (Adaptive Computation and Machine Learning), MIT Press, 2001
- Jiawei Han, Micheline Kamber: Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006
- Michael J. A. Berry, Gordon Linoff: Data Mining Techniques: Marketing, Sales and Customer Support, John Wiley & Sons, 1997
- Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Pearson Addison Wesley, 2005
- Ian Witten, Eibe Frank, Mark Hall: Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 2011
- Knowledge of database features, functionalities and structure and ability to properly use, and design, analyse and implement database applications
- Knowledge and application of the key principles and basic techniques of artificial intelligence and their practical application
- Skills to apply knowledge of statistics and optimization
The evaluation is based on the assignments and the final project.
Assignments and projects will be performed individually or by groups, depending on the size of the course.
To pass the course it is mandatory to present all the assignments and the final project, in any modality of evaluation.
Participation in class would give a 10% increase in the final score.
|CEM7||Evaluation and application of diverse mathematical and statistical theories, and available knowledge extraction and discovery processes, methods and techniques for large data volumes||P|
|CEM8||Application of the theoretical and mathematical foundations of heterogeneous functions and data processing and analysis and evaluation and design of related methods for application in practical domains||P|
|Code||Learning Outcome||Associated competences||Proficiency level|
|RA-APDI-19||Ability to proficiently apply a standard data mining process, including the business knowledge, data knowledge, data exploration analysis, modelling, evaluation and exploitation phases||CEM2, CEM7||P|
|RA-APDI-20||Use software applications for data mining tasks||CEM2, CEM7||P|
|RA-APDI-21||Understand the foundations and apply a broad and wide-ranging repertory of clustering, estimation, prediction and classification algorithms||CEM2, CEM7||P|
|RA-APDI-22||Be familiar with examples of real applications and research trends and lines||CEM2, CEM7||P|