Pattern Classification 2nd Edition by Richard Duda, Peter Hart, David Stork – Ebook PDF Instant Download/Delivery. 0471056693, 9780471056690
Full download Pattern Classification 2nd Edition after payment
Product details:
ISBN 10: 0471056693
ISBN 13: 9780471056690
Author: Richard O. Duda; Peter E. Hart; David G. Stork
The first edition, published in 1973, has become a classic reference in the field. Now with the second edition, readers will find information on key new topics such as neural networks and statistical pattern recognition, the theory of machine learning, and the theory of invariances. Also included are worked examples, comparisons between different methods, extensive graphics, expanded exercises and computer project topics.
Pattern Classification 2nd Table of contents:
1 INTRODUCTION
1.1 Machine Perception
1.2 An Example
1.2.1 Related Fields
1.3 Pattern Recognition Systems
1.3.1 Sensing
1.3.2 Segmentation and Grouping
1.3.3 Feature Extraction
1.3.4 Classification
1.3.5 Post Processing
1.4 The Design Cycle
1.4.1 Data Collection
1.4.2 Feature Choice
1.4.3 Model Choice
1.4.4 Training
1.4.5 Evaluation
1.4.6 Computational Complexity
1.5 Learning and Adaptation
1.5.1 Supervised Learning
1.5.2 Unsupervised Learning
1.5.3 Reinforcement Learning
1.6 Conclusion
Summary by Chapters
Bibliographical and Historical Remarks
Bibliography
2 BAYESIAN DECISION THEORY
2.1 Introduction
2.2 Bayesian Decision Theory—Continuous Features
2.2.1 Two-Category Classification
2.3 Minimum-Error-Rate Classification
2.3.1 Minimax Criterion
2.3.2 Neyman-Pearson Criterion
2.4 Classifiers, Discriminant Functions, and Decision Surfaces
2.4.1 The Multicategory Case
2.4.2 The Two-Category Case
2.5 The Normal Density
2.5.1 Univariate Density
2.5.2 Multivariate Density
2.6 Discriminant Functions for the Normal Density
2.6.1 Case 1: Σi- = σ21
2.6.2 Case 2: Σi- = Σ
2.6.3 Case 3: Σi = arbitrary
Example 1 Decision Regions for Two-Dimensional Gaussian Data
2.7 Error Probabilities and Integrals
2.8 Error Bounds for Normal Densities
2.8.1 Chernoff Bound
2.8.2 Bhattacharyya Bound
Example 2 Error Bounds for Gaussian Distributions
2.8.3 Signal Detection Theory and Operating Characteristics
2.9 Bayes Decision Theory—Discrete Features
2.9.1 Independent Binary Features
Example 3 Bayesian Decisions for Three-Dimensional Binary Data
2.10 Missing and Noisy Features
2.10.1 Missing Features
2.10.2 Noisy Features
2.11 Bayesian Belief Networks
Example 4 Belief Network for Fish
2.12 Compound Bayesian Decision Theory and Context
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
3 MAXIMUM-LIKELIHOOD AND BAYESIAN PARAMETER ESTIMATION
3.1 Introduction
3.2 Maximum-Likelihood Estimation
3.2.1 The General Principle
3.2.2 The Gaussian Case: Unknown μ
3.2.3 The Gaussian Case: Unknown μ and Σ
3.2.4 Bias
3.3 Bayesian Estimation
3.3.1 The Class-Conditional Densities
3.3.2 The Parameter Distribution
3.4 Bayesian Parameter Estimation: Gaussian Case
3.4.1 The Univariate Case: p(μD)
3.4.2 The Univariate Case: p(xD)
3.4.3 The Multivariate Case
3.5 Bayesian Parameter Estimation: General Theory
Example 1 Recursive Bayes Learning
3.5.1 When Do Maximum-Likelihood and Bayes Methods Differ?
3.5.2 Noninformative Priors and Invariance
3.5.3 Gibbs Algorithm
3.6 Sufficient Statistics
3.6.1 Sufficient Statistics and the Exponential Family
3.7 Problems of Dimensionality
3.7.1 Accuracy, Dimension, and Training Sample Size
3.7.2 Computational Complexity
3.7.3 Overfitting
3.8 Component Analysis and Discriminants
3.8.1 Principal Component Analysis (PCA)
3.8.2 Fisher Linear Discriminant
3.8.3 Multiple Discriminant Analysis
3.9 Expectation-Maximization (EM)
Example 2 Expectation-Maximization for a 2D Normal Model
3.10 Hidden Markov Models
3.10.1 First-Order Markov Models
3.10.2 First-Order Hidden Markov Models
3.10.3 Hidden Markov Model Computation
3.10.4 Evaluation
Example 3 Hidden Markov Model
3.10.5 Decoding
Example 4 HMM Decoding
3.10.6 Learning
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
4 NONPARAMETRIC TECHNIQUES
4.1 Introduction
4.2 Density Estimation
4.3 Parzen Windows
4.3.1 Convergence of the Mean
4.3.2 Convergence of the Variance
4.3.3 Illustrations
4.3.4 Classification Example
4.3.5 Probabilistic Neural Networks (PNNs)
4.3.6 Choosing the Window Function
4.4 kn–Nearest-Neighbor Estimation
4.4.1 kn–Nearest-Neighbor and Parzen-Window Estimation
4.4.2 Estimation of A Posteriori Probabilities
4.5 The Nearest-Neighbor Rule
4.5.1 Convergence of the Nearest Neighbor
4.5.2 Error Rate for the Nearest-Neighbor Rule
4.5.3 Error Bounds
4.5.4 The k-Nearest-Neighbor Rule
4.5.5 Computational Complexity of the k–Nearest-Neighbor Rule
4.6 Metrics and Nearest-Neighbor Classification
4.6.1 Properties of Metrics
4.6.2 Tangent Distance
4.7 Fuzzy Classification
4.8 Reduced Coulomb Energy Networks
4.9 Approximations by Series Expansions
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
5 LINEAR DISCRIMINANT FUNCTIONS
5.1 Introduction
5.2 Linear Discriminant Functions and Decision Surfaces
5.2.1 The Two-Category Case
5.2.2 The Multicategory Case
5.3 Generalized Linear Discriminant Functions
5.4 The Two-Category Linearly Separable Case
5.4.1 Geometry and Terminology
5.4.2 Gradient Descent Procedures
5.5 Minimizing the Perceptron Criterion Function
5.5.1 The Perceptron Criterion Function
5.5.2 Convergence Proof for Single-Sample Correction
5.5.3 Some Direct Generalizations
5.6 Relaxation Procedures
5.6.1 The Descent Algorithm
5.6.2 Convergence Proof
5.7 Nonseparable Behavior
5.8 Minimum Squared-Error Procedures
5.8.1 Minimum Squared-Error and the Pseudoinverse
Example 1 Constructing a Linear Classifier by Matrix Pseudoinverse
5.8.2 Relation to Fisher’s Linear Discriminant
5.8.3 Asymptotic Approximation to an Optimal Discriminant
5.8.4 The Widrow-Hoff or LMS Procedure
5.8.5 Stochastic Approximation Methods
5.9 The Ho-Kashyap Procedures
5.9.1 The Descent Procedure
5.9.2 Convergence Proof
5.9.3 Nonseparable Behavior
5.9.4 Some Related Procedures
5.10 Linear Programming Algorithms
5.10.1 Linear Programming
5.10.2 The Linearly Separable Case
5.10.3 Minimizing the Perceptron Criterion Function
5.11 Support Vector Machines
5.11.1 SVM Training
Example 2 SVM for the XOR Problem
5.12 Multicategory Generalizations
5.12.1 Kesler’s Construction
5.12.2 Convergence of the Fixed-Increment Rule
5.12.3 Generalizations for MSE Procedures
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
6 MULTILAYER NEURAL NETWORKS
6.1 Introduction
6.2 Feedforward Operation and Classification
6.2.1 General Feedforward Operation
6.2.2 Expressive Power of Multilayer Networks
6.3 Backpropagation Algorithm
6.3.1 Network Learning
6.3.2 Training Protocols
6.3.3 Learning Curves
6.4 Error Surfaces
6.4.1 Some Small Networks
6.4.2 The Exclusive-OR (XOR)
6.4.3 Larger Networks
6.4.4 How Important Are Multiple Minima?
6.5 Backpropagation as Feature Mapping
6.5.1 Representations at the Hidden Layer—Weights
6.6 Backpropagation, Bayes Theory and Probability
6.6.1 Bayes Discriminants and Neural Networks
6.6.2 Outputs as Probabilities
6.7 Related Statistical Techniques
6.8 Practical Techniques for Improving Backpropagation
6.8.1 Activation Function
6.8.2 Parameters for the Sigmoid
6.8.3 Scaling Input
6.8.4 Target Values
6.8.5 Training with Noise
6.8.6 Manufacturing Data
6.8.7 Number of Hidden Units
6.8.8 Initializing Weights
6.8.9 Learning Rates
6.8.10 Momentum
6.8.11 Weight Decay
6.8.12 Hints
6.8.13 On-Line, Stochastic or Batch Training?
6.8.14 Stopped Training
6.8.15 Number of Hidden Layers
6.8.16 Criterion Function
6.9 Second-Order Methods
6.9.1 Hessian Matrix
6.9.2 Newton’s Method
6.9.3 Quickprop
6.9.4 Conjugate Gradient Descent
Example 1 Conjugate Gradient Descent
6.10 Additional Networks and Training Methods
6.10.1 Radial Basis Function Networks (RBFs)
6.10.2 Special Bases
6.10.3 Matched Filters
6.10.4 Convolutional Networks
6.10.5 Recurrent Networks
6.10.6 Cascade-Correlation
6.11 Regularization, Complexity Adjustment and Pruning
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
7 STOCHASTIC METHODS
7.1 Introduction
7.2 Stochastic Search
7.2.1 Simulated Annealing
7.2.2 The Boltzmann Factor
7.2.3 Deterministic Simulated Annealing
7.3 Boltzmann Learning
7.3.1 Stochastic Boltzmann Learning of Visible States
7.3.2 Missing Features and Category Constraints
7.3.3 Deterministic Boltzmann Learning
7.3.4 Initialization and Setting Parameters
7.4 Boltzmann Networks and Graphical Models
7.4.1 Other Graphical Models
7.5 Evolutionary Methods
7.5.1 Genetic Algorithms
7.5.2 Further Heuristics
7.5.3 Why Do They Work?
7.6 Genetic Programming
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
8 NONMETRIC METHODS
8.1 Introduction
8.2 Decision Trees
8.3 CART
8.3.1 Number of Splits
8.3.2 Query Selection and Node Impurity
8.3.3 When to Stop Splitting
8.3.4 Pruning
8.3.5 Assignment of Leaf Node Labels
Example 1 A Simple Tree
8.3.6 Computational Complexity
8.3.7 Feature Choice
8.3.8 Multivariate Decision Trees
8.3.9 Priors and Costs
8.3.10 Missing Attributes
Example 2 Surrogate Splits and Missing Attributes
8.4 Other Tree Methods
8.4.1 ID3
8.4.2 C4.5
8.4.3 Which Tree Classifier Is Best?
8.5 Recognition with Strings
8.5.1 String Matching
8.5.2 Edit Distance
8.5.3 Computational Complexity
8.5.4 String Matching with Errors
8.5.5 String Matching with the “Don’t-Care” Symbol
8.6 Grammatical Methods
8.6.1 Grammars
8.6.2 Types of String Grammars
Example 3 A Grammar for Pronouncing Numbers
8.6.3 Recognition Using Grammars
8.7 Grammatical Inference
Example 4 Grammatical Inference
8.8 Rule-Based Methods
8.8.1 Learning Rules
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
9 ALGORITHM-INDEPENDENT MACHINE LEARNING
9.1 Introduction
9.2 Lack of Inherent Superiority of Any Classifier
9.2.1 No Free Lunch Theorem
Example 1 No Free Lunch for Binary Data
9.2.2 Ugly Duckling Theorem
9.2.3 Minimum Description Length (MDL)
9.2.4 Minimum Description Length Principle
9.2.5 Overfitting Avoidance and Occam’s Razor
9.3 Bias and Variance
9.3.1 Bias and Variance for Regression
9.3.2 Bias and Variance for Classification
9.4 Resampling for Estimating Statistics
9.4.1 Jackknife
Example 2 Jackknife Estimate of Bias and Variance of the Mode
9.4.2 Bootstrap
9.5 Resampling for Classifier Design
9.5.1 Bagging
9.5.2 Boosting
9.5.3 Learning with Queries
9.5.4 Arcing, Learning with Queries, Bias and Variance
9.6 Estimating and Comparing Classifiers
9.6.1 Parametric Models
9.6.2 Cross-Validation
9.6.3 Jackknife and Bootstrap Estimation of Classification Accuracy
9.6.4 Maximum-Likelihood Model Comparison
9.6.5 Bayesian Model Comparison
9.6.6 The Problem-Average Error Rate
9.6.7 Predicting Final Performance from Learning Curves
9.6.8 The Capacity of a Separating Plane
9.7 Combining Classifiers
9.7.1 Component Classifiers with Discriminant Functions
9.7.2 Component Classifiers without Discriminant Functions
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
10 UNSUPERVISED LEARNING AND CLUSTERING
10.1 Introduction
10.2 Mixture Densities and Identifiability
10.3 Maximum-Likelihood Estimates
10.4 Application to Normal Mixtures
10.4.1 Case 1: Unknown Mean Vectors
10.4.2 Case 2: All Parameters Unknown
10.4.3 k-Means Clustering
10.4.4 Fuzzy k-Means Clustering
10.5 Unsupervised Bayesian Learning
10.5.1 The Bayes Classifier
10.5.2 Learning the Parameter Vector
Example 1 Unsupervised Learning of Gaussian Data
10.5.3 Decision-Directed Approximation
10.6 Data Description and Clustering
10.6.1 Similarity Measures
10.7 Criterion Functions for Clustering
10.7.1 The Sum-of-Squared-Error Criterion
10.7.2 Related Minimum Variance Criteria
10.7.3 Scatter Criteria
Example 2 Clustering Criteria
10.8 Iterative Optimization
10.9 Hierarchical Clustering
10.9.1 Definitions
10.9.2 Agglomerative Hierarchical Clustering
10.9.3 Stepwise-Optimal Hierarchical Clustering
10.9.4 Hierarchical Clustering and Induced Metrics
10.10 The Problem of Validity
10.11 On-line clustering
10.11.1 Unknown Number of Clusters
10.11.2 Adaptive Resonance
10.11.3 Learning with a Critic
10.12 Graph-Theoretic Methods
10.13 Component Analysis
10.13.1 Principal Component Analysis (PCA)
10.13.2 Nonlinear Component Analysis (NLCA)
10.13.3 Independent Component Analysis (ICA)
10.14 Low-Dimensional Representations and Multidimensional Scaling (MDS)
10.14.1 Self-organizing Feature Maps
10.14.2 Clustering and Dimensionality Reduction
Summary
Bibliographical and Historical Remarks
Problems
Computer exercises
Bibliography
A MATHEMATICAL FOUNDATIONS
A.1 Notation
A.2 Linear Algebra
A.2.1 Notation and Preliminaries
A.2.2 Inner Product
A.2.3 Outer Product
A.2.4 Derivatives of Matrices
A.2.5 Determinant and Trace
A.2.6 Matrix Inversion
A.2.7 Eigenvectors and Eigenvalues
A.3 Lagrange Optimization
A.4 Probability Theory
A.4.1 Discrete Random Variables
A.4.2 Expected Values
A.4.3 Pairs of Discrete Random Variables
A.4.4 Statistical Independence
A.4.5 Expected Values of Functions of Two Variables
A.4.6 Conditional Probability
A.4.7 The Law of Total Probability and Bayes Rule
A.4.8 Vector Random Variables
A.4.9 Expectations, Mean Vectors and Covariance Matrices
A.4.10 Continuous Random Variables
A.4.11 Distributions of Sums of Independent Random Variables
A.4.12 Normal Distributions
A.5 Gaussian Derivatives and Integrals
A.5.1 Multivariate Normal Densities
A.5.2 Bivariate Normal Densities
A.6 Hypothesis Testing
A.6.1 Chi-Squared Test
A.7 Information Theory
A.7.1 Entropy and Information
A.7.2 Relative Entropy
A.7.3 Mutual Information
A.8 Computational Complexity
People also search for Pattern Classification 2nd:
pattern classification duda 2nd edition pdf
what is pattern classification
what is a pattern rule grade 2
what is a pattern grade 2