暂无评论
图文详情
- ISBN:7115141444
- 装帧:暂无
- 册数:暂无
- 重量:暂无
- 开本:24cm
- 页数:516页
- 出版时间:2006-01-01
- 条形码:9787115141446 ; 978-7-115-14144-6
内容简介
本书对数据挖掘进行了全面介绍,旨在为读者提供将数据挖掘应用于实际问题所必需的知识。本书涵盖五个主题:数据、分类、关联分析、聚类和异常检测。除异常检测外,每个主题都有两章:前面一章讲述基本概念、代表性算法和评估技术,而后面一章较深入地讨论高级概念和算法。目的是在使读者透彻地理解数据挖掘基础的同时,还能了解更多重要的高级主题。此外,书中还提供了大量例子、图表和习题。
本书适合作为相关专业高年级本科生和研究生数据挖掘课程的教材,同时也可作为从事数据挖掘研究和应用开发工作的技术人员的参考书。
目录
1 Introduction 1
1.1 What Is Data Mining? 2
1.2 Motivating Challenges 3
1.3 The Origins of Data Mining 4
1.4 Data Mining Tasks 5
1.5 Scope and Organization of the Book 8
1.6 Bibliographic Notes 9
1.7 Exercises 12
2 Data 13
2.1 Types of Data 15
2.1.1 Attributes and Measurement 15
2.1.2 Types of Data Sets 20
2.2 Data Quality 25
2.2.1 Measurement and Data Collection Issues 26
2.2.2 Issues Related to Applications 31
2.3 Data Preprocessing 32
2.3.1 Aggregation 32
2.3.2 Sampling 34
2.3.3 Dimensionality Reduction 36
2.3.4 Feature Subset Selection 37
2.3.5 Feature Creation 39
2.3.6 Discretization and Binarization 41
2.3.7 Variable Transformation 45
2.4 Measures of Similarity and Dissimilarity 47
2.4.1 Basics 47
2.4.2 Similarity and Dissimilarity between Simple Attributes 49
2.4.3 Dissimilarities between Data Objects 50
2.4.4 Similarities between Data Objects 52
2.4.5 Examples of Proximity Measures 53
2.4.6 Issues in Proximity Calculation 58
2.4.7 Selecting the Right Proximity Measure 60
2.5 Bibliographic Notes 61
2.6 Exercises 64
3 Exploring Data 71
3.1 The Iris Data Set 71
3.2 Summary Statistics 72
3.2.1 Frequencies and the Mode 72
3.2.2 Percentiles 73
3.2.3 Measures of Location: Mean and Median 73
3.2.4 Measures of Spread: Range and Variance 75
3.2.5 Multivariate Summary Statistics 76
3.2.6 Other Ways to Summarize the Data 77
3.3 Visualization 77
3.3.1 Motivations for Visualization 77
3.3.2 General Concepts 78
3.3.3 Techniques 81
3.3.4 Visualizing Higher-Dimensional Data 90
3.3.5 Do's and Don'ts 94
3.4 OLAP and Multidimensional Data Analysis 95
3.4.1 Representing Iris Data as a Multidimensional Array 95
3.4.2 Multidimensional Data: The General Case 97
3.4.3 Analyzing Multidimensional Data 98
3.4.4 Final Comments on Multidimensional Data Analysis 101
3.5 Bibliographic Notes 102
3.6 Exercises 103
4 Classification: Basic Concepts, Decision Trees, and Model Evaluation 105
4.1 Preliminaries 105
4.2 General Approach to Solving a Classification Problem 107
4.3 Decision Tree Induction 108
4.3.1 How a Decision Tree Works 108
4.3.2 How to Build a Decision Tree 110
4.3.3 Methods for Expressing Attribute Test Conditions 112
4.3.4 Measures for Selecting the Best Split 114
4.3.5 Algorithm for Decision Tree Induction 119
4.3.6 An Example: Web Robot Detection 120
4.3.7 Characteristics of Decision Tree Induction 122
4.4 Model Overfitting 125
4.4.1 Overfitting Due to Presence of Noise 127
4.4.2 Overfitting Due to Lack of Representative Samples 129
4.4.3 Overfitting and the Multiple Comparison Procedure 129
4.4.4 Estimation of Generalization Errors 131
4.4.5 Handling Overfitting in Decision Tree Induction 134
4.5 Evaluating the Performance of a Classifier 135
4.5.1 Holdout Method 136
4.5.2 Random Subsampling 136
4.5.3 Cross-Validation 136
4.5.4 Bootstrap 137
4.6 Methods for Comparing Classifiers 137
4.6.1 Estimating a Confidence Interval for Accuracy 138
4.6.2 Comparing the Performance of Two Models 139
4.6.3 Comparing the Performance of Two Classifiers 140
4.7 Bibliographic Notes 141
4.8 Exercises 144
5 Classification: Alternative Techniques 151
5.1 Rule-Based Classifier 151
5.1.1 How a Rule-Based Classifier Works 153
5.1.2 Rule-Ordering Schemes 154
5.1.3 How to Build a Rule-Based Classifier 155
5.1.4 Direct Methods for Rule Extraction 155
5.1.5 Indirect Methods for Rule Extraction 161
5.1.6 Characteristics of Rule-Based Classifiers 163
5.2 Nearest-Neighbor classifiers 163
5.2.1 Algorithm 165
5.2.2 Characteristics of Nearest-Neighbor Classifiers 165
5.3 Bayesian Classifiers 166
5.3.1 Bayes Theorem 166
5.3.2 Using the Bayes Theorem for Classification 168
5.3.3 Na?ve Bayes Classifier 169
5.3.4 Bayes Error Rate 175
5.3.5 Bayesian Belief Networks
1.1 What Is Data Mining? 2
1.2 Motivating Challenges 3
1.3 The Origins of Data Mining 4
1.4 Data Mining Tasks 5
1.5 Scope and Organization of the Book 8
1.6 Bibliographic Notes 9
1.7 Exercises 12
2 Data 13
2.1 Types of Data 15
2.1.1 Attributes and Measurement 15
2.1.2 Types of Data Sets 20
2.2 Data Quality 25
2.2.1 Measurement and Data Collection Issues 26
2.2.2 Issues Related to Applications 31
2.3 Data Preprocessing 32
2.3.1 Aggregation 32
2.3.2 Sampling 34
2.3.3 Dimensionality Reduction 36
2.3.4 Feature Subset Selection 37
2.3.5 Feature Creation 39
2.3.6 Discretization and Binarization 41
2.3.7 Variable Transformation 45
2.4 Measures of Similarity and Dissimilarity 47
2.4.1 Basics 47
2.4.2 Similarity and Dissimilarity between Simple Attributes 49
2.4.3 Dissimilarities between Data Objects 50
2.4.4 Similarities between Data Objects 52
2.4.5 Examples of Proximity Measures 53
2.4.6 Issues in Proximity Calculation 58
2.4.7 Selecting the Right Proximity Measure 60
2.5 Bibliographic Notes 61
2.6 Exercises 64
3 Exploring Data 71
3.1 The Iris Data Set 71
3.2 Summary Statistics 72
3.2.1 Frequencies and the Mode 72
3.2.2 Percentiles 73
3.2.3 Measures of Location: Mean and Median 73
3.2.4 Measures of Spread: Range and Variance 75
3.2.5 Multivariate Summary Statistics 76
3.2.6 Other Ways to Summarize the Data 77
3.3 Visualization 77
3.3.1 Motivations for Visualization 77
3.3.2 General Concepts 78
3.3.3 Techniques 81
3.3.4 Visualizing Higher-Dimensional Data 90
3.3.5 Do's and Don'ts 94
3.4 OLAP and Multidimensional Data Analysis 95
3.4.1 Representing Iris Data as a Multidimensional Array 95
3.4.2 Multidimensional Data: The General Case 97
3.4.3 Analyzing Multidimensional Data 98
3.4.4 Final Comments on Multidimensional Data Analysis 101
3.5 Bibliographic Notes 102
3.6 Exercises 103
4 Classification: Basic Concepts, Decision Trees, and Model Evaluation 105
4.1 Preliminaries 105
4.2 General Approach to Solving a Classification Problem 107
4.3 Decision Tree Induction 108
4.3.1 How a Decision Tree Works 108
4.3.2 How to Build a Decision Tree 110
4.3.3 Methods for Expressing Attribute Test Conditions 112
4.3.4 Measures for Selecting the Best Split 114
4.3.5 Algorithm for Decision Tree Induction 119
4.3.6 An Example: Web Robot Detection 120
4.3.7 Characteristics of Decision Tree Induction 122
4.4 Model Overfitting 125
4.4.1 Overfitting Due to Presence of Noise 127
4.4.2 Overfitting Due to Lack of Representative Samples 129
4.4.3 Overfitting and the Multiple Comparison Procedure 129
4.4.4 Estimation of Generalization Errors 131
4.4.5 Handling Overfitting in Decision Tree Induction 134
4.5 Evaluating the Performance of a Classifier 135
4.5.1 Holdout Method 136
4.5.2 Random Subsampling 136
4.5.3 Cross-Validation 136
4.5.4 Bootstrap 137
4.6 Methods for Comparing Classifiers 137
4.6.1 Estimating a Confidence Interval for Accuracy 138
4.6.2 Comparing the Performance of Two Models 139
4.6.3 Comparing the Performance of Two Classifiers 140
4.7 Bibliographic Notes 141
4.8 Exercises 144
5 Classification: Alternative Techniques 151
5.1 Rule-Based Classifier 151
5.1.1 How a Rule-Based Classifier Works 153
5.1.2 Rule-Ordering Schemes 154
5.1.3 How to Build a Rule-Based Classifier 155
5.1.4 Direct Methods for Rule Extraction 155
5.1.5 Indirect Methods for Rule Extraction 161
5.1.6 Characteristics of Rule-Based Classifiers 163
5.2 Nearest-Neighbor classifiers 163
5.2.1 Algorithm 165
5.2.2 Characteristics of Nearest-Neighbor Classifiers 165
5.3 Bayesian Classifiers 166
5.3.1 Bayes Theorem 166
5.3.2 Using the Bayes Theorem for Classification 168
5.3.3 Na?ve Bayes Classifier 169
5.3.4 Bayes Error Rate 175
5.3.5 Bayesian Belief Networks
展开全部
本类五星书
浏览历史
本类畅销
-
硅谷之火-人与计算机的未来
¥13.7¥39.8 -
造神:人工智能神话的起源和破除 (精装)
¥32.7¥88.0 -
专业导演教你拍好短视频
¥13.8¥39.9 -
数学之美
¥41.0¥69.0 -
系统性创新手册(管理版)
¥42.6¥119.0 -
计算机网络技术
¥24.1¥33.0 -
Excel函数.公式与图表
¥16.4¥48.0 -
.NET安全攻防指南(下册)
¥89.0¥129.0 -
.NET安全攻防指南(上册)
¥89.0¥129.0 -
人工智能的底层逻辑
¥55.3¥79.0 -
数据挖掘技术与应用
¥52.0¥75.0 -
计算
¥92.2¥128.0 -
MIDJOURNEY AI绘画从入门到精通
¥71.5¥98.0 -
商业产品分析:从用户数据获得商业洞见的数据科学方法
¥89.0¥129.0 -
老年人学电脑
¥34.9¥49.9 -
超简单:用python+ ChatGPT让excel飞起来
¥48.4¥79.0 -
人工智能AI摄影与后期修图从小白到高手:MIDJOURNEY+PHOTOSHOP
¥56.9¥98.0 -
人月神话(纪念典藏版)
¥68.6¥98.0 -
十堂极简人工智能课
¥31.9¥49.0 -
精益数据分析 珍藏版
¥69.5¥119.8