Preface
follow
index.html
1 Introduction
follow
intro-intro.html
1.1 A Simple Example
follow
a-simple-example.html
1.2 Important Concepts
follow
important-concepts.html
1.2.1 Overfitting
follow
important-concepts.html#overfitting
1.2.2 Supervised and Unsupervised Procedures
follow
important-concepts.html#supervised-and-unsupervised-procedures
1.2.3 No Free Lunch
follow
important-concepts.html#no-free-lunch
1.2.4 The Model versus the Modeling Process
follow
important-concepts.html#the-model-versus-the-modeling-process
1.2.5 Model Bias and Variance
follow
important-concepts.html#model-bias-and-variance
1.2.6 Experience-Driven Modeling and Empirically Driven Modeling
follow
important-concepts.html#experience-driven-modeling-and-empirically-driven-modeling
1.2.7 Big Data
follow
important-concepts.html#big-data
1.3 A More Complex Example
follow
a-more-complex-example.html
1.4 Feature Selection
follow
feature-selection.html
1.5 An Outline of the Book
follow
an-outline-of-the-book.html
1.6 Computing
follow
intro-computing.html
2 Illustrative Example: Predicting Risk of Ischemic Stroke
follow
stroke-tour.html
2.1 Splitting
follow
splitting.html
2.2 Preprocessing
follow
stroke-preprocessing.html
2.3 Exploration
follow
stroke-exploration.html
2.4 Predictive Modeling Across Sets
follow
predictive-modeling-across-sets.html
2.5 Other Considerations
follow
other-considerations.html
2.6 Computing
follow
stoke-computing.html
3 A Review of the Predictive Modeling Process
follow
review-predictive-modeling-process.html
3.1 Illustrative Example: OkCupid Profile Data
follow
okc-intro.html
3.2 Measuring Performance
follow
measuring-performance.html
3.2.1 Regression Metrics
follow
measuring-performance.html#reg-metrics
3.2.2 Classification Metrics
follow
measuring-performance.html#class-metrics
3.2.3 Context-Specific Metrics
follow
measuring-performance.html#context-specific-metrics
3.3 Data Splitting
follow
data-splitting.html
3.4 Resampling
follow
resampling.html
3.4.1 V-Fold Cross-Validation and Its Variants
follow
resampling.html#cv
3.4.2 Monte Carlo Cross-Validation
follow
resampling.html#monte-carlo-cross-validation
3.4.3 The Bootstrap
follow
resampling.html#the-bootstrap
3.4.4 Rolling Origin Forecasting
follow
resampling.html#rolling-origin-forecasting
3.4.5 Validation Sets
follow
resampling.html#validation-sets
3.4.6 Variance and Bias in Resampling
follow
resampling.html#resample-var-bias
3.4.7 What Should Be Included Inside of Resampling?
follow
resampling.html#inside-resampling
3.5 Tuning Parameters and Overfitting
follow
tuning.html
3.6 Model Optimization and Tuning
follow
model-optimization-and-tuning.html
3.7 Comparing Models Using the Training Set
follow
review-model-comparisons.html
3.8 Feature Engineering Without Overfitting
follow
feature-engineering-without-overfitting.html
3.9 Summary
follow
summary.html
3.10 Computing
follow
review-computing.html
4 Exploratory Visualizations
follow
exploratory-visualizations.html
4.1 Introduction to the Chicago Train Ridership Data
follow
chicago-intro.html
4.2 Visualizations for Numeric Data: Exploring Train Ridership Data
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html
4.2.1 Box Plots, Violin Plots, and Histograms
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#box-plots-violin-plots-and-histograms
4.2.2 Augmenting Visualizations through Faceting, Colors, and Shapes
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#augmenting-visualizations-through-faceting-colors-and-shapes
4.2.3 Scatter Plots
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#scatter-plots
4.2.4 Heatmaps
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#heatmaps
4.2.5 Correlation Matrix Plots
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#correlation-matrix-plots
4.2.6 Line Plots
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#line-plots
4.2.7 Principal Components Analysis
follow
visualizations-for-numeric-data-exploring-train-ridership-data.html#eda-pca
4.3 Visualizations for Categorical Data: Exploring the OkCupid Data
follow
visualizations-for-categorical-data-exploring-the-okcupid-data.html
4.3.1 Visualizing Relationships between Outcomes and Predictors
follow
visualizations-for-categorical-data-exploring-the-okcupid-data.html#visualizing-relationships-between-outcomes-and-predictors
4.3.2 Exploring Relationships Between Categorical Predictors
follow
visualizations-for-categorical-data-exploring-the-okcupid-data.html#exploring-relationships-between-categorical-predictors
4.4 Post Modeling Exploratory Visualizations
follow
post-modeling.html
4.5 Summary
follow
summary-1.html
4.6 Computing
follow
eda-computing.html
5 Encoding Categorical Predictors
follow
encoding-categorical-predictors.html
5.1 Creating Dummy Variables for Unordered Categories
follow
creating-dummy-variables-for-unordered-categories.html
5.2 Encoding Predictors with Many Categories
follow
encoding-predictors-with-many-categories.html
5.3 Approaches for Novel Categories
follow
approaches-for-novel-categories.html
5.4 Supervised Encoding Methods
follow
categorical-supervised-encoding.html
5.5 Encodings for Ordered Data
follow
encodings-for-ordered-data.html
5.6 Creating Features from Text Data
follow
text-data.html
5.7 Factors versus Dummy Variables in Tree-Based Models
follow
categorical-trees.html
5.8 Summary
follow
summary-2.html
5.9 Computing
follow
categorical-computing.html
6 Engineering Numeric Predictors
follow
engineering-numeric-predictors.html
6.1 1:1 Transformations
follow
numeric-one-to-one.html
6.2 1:Many Transformations
follow
numeric-one-to-many.html
6.2.1 Nonlinear Features via Basis Expansions and Splines
follow
numeric-one-to-many.html#numeric-basis-functions
6.2.2 Discretize Predictors as a Last Resort
follow
numeric-one-to-many.html#binning
6.3 Many:Many Transformations
follow
numeric-many-to-many.html
6.3.1 Linear Projection Methods
follow
numeric-many-to-many.html#linear-projection-methods
6.3.2 Autoencoders
follow
numeric-many-to-many.html#autoencoders
6.3.3 Spatial Sign
follow
numeric-many-to-many.html#spatial-sign
6.3.4 Distance and Depth Features
follow
numeric-many-to-many.html#distance-and-depth-features
6.4 Summary
follow
summary-3.html
6.5 Computing
follow
numeric-computing.html
7 Detecting Interaction Effects
follow
detecting-interaction-effects.html
7.1 Guiding Principles in the Search for Interactions
follow
interactions-guiding-principles.html
7.2 Practical Considerations
follow
practical-considerations.html
7.3 The Brute-Force Approach to Identifying Predictive Interactions
follow
complete-enumeration.html
7.3.1 Simple Screening
follow
complete-enumeration.html#complete-enumeration-simple-screening
7.3.2 Penalized Regression
follow
complete-enumeration.html#penalized-regression
7.4 Approaches when Complete Enumeration is Practically Impossible
follow
approaches-when-complete-enumeration-is-practically-impossible.html
7.4.1 Guiding Principles and Two-stage Modeling
follow
approaches-when-complete-enumeration-is-practically-impossible.html#guiding-principles-and-two-stage-modeling
7.4.2 Tree-based Methods
follow
approaches-when-complete-enumeration-is-practically-impossible.html#tree-based-methods
7.4.3 The Feasible Solution Algorithm
follow
approaches-when-complete-enumeration-is-practically-impossible.html#the-feasible-solution-algorithm
7.5 Other Potentially Useful Tools
follow
other-potentially-useful-tools.html
7.6 Summary
follow
summary-4.html
7.7 Computing
follow
interaction-computing.html
8 Handling Missing Data
follow
handling-missing-data.html
8.1 Understanding the Nature and Severity of Missing Information
follow
understanding-the-nature-and-severity-of-missing-information.html
8.2 Models that are Resistant to Missing Values
follow
models-that-are-resistant-to-missing-values.html
8.3 Deletion of Data
follow
deletion-of-data.html
8.4 Encoding Missingness
follow
encoding-missingness.html
8.5 Imputation methods
follow
imputation-methods.html
8.6 Special Cases
follow
special-cases.html
8.7 Summary
follow
summary-5.html
8.8 Computing
follow
missing-computing.html
9 Working with Profile Data
follow
profile-data.html
9.1 Illustrative Data: Pharmaceutical Manufacturing Monitoring
follow
illustrative-data-pharmaceutical-manufacturing-monitoring.html
9.2 What are the Experimental Unit and the Unit of Prediction?
follow
what-are-the-experimental-unit-and-the-unit-of-prediction.html
9.3 Reducing Background
follow
reducing-background.html
9.4 Reducing Other Noise
follow
reducing-other-noise.html
9.5 Exploiting Correlation
follow
exploiting-correlation.html
9.6 Impacts of Data Processing on Modeling
follow
impacts-of-data-processing-on-modeling.html
9.7 Summary
follow
summary-6.html
9.8 Computing
follow
profile-computing.html
10 Feature Selection Overview
follow
selection.html
10.1 Goals of Feature Selection
follow
goals-of-feature-selection.html
10.2 Classes of Feature Selection Methodologies
follow
classes-of-feature-selection-methodologies.html
10.3 Effect of Irrelevant Features
follow
feature-selection-simulation.html
10.4 Overfitting to Predictors and External Validation
follow
selection-overfitting.html
10.5 A Case Study
follow
a-case-study.html
10.6 Next Steps
follow
next-steps.html
10.7 Computing
follow
selection-computing.html
11 Greedy Search Methods
follow
greedy-search.html
11.1 Illustrative Data: Predicting Parkinson’s Disease
follow
illustrative-data-predicting-parkinsons-disease.html
11.2 Simple Filters
follow
greedy-simple-filters.html
11.2.1 Simple Filters Applied to the Parkinson’s Disease Data
follow
greedy-simple-filters.html#simple-filters-applied-to-the-parkinsons-disease-data
11.3 Recursive Feature Elimination
follow
recursive-feature-elimination.html
11.4 Stepwise Selection
follow
greedy-stepwise-selection.html
11.5 Summary
follow
summary-7.html
11.6 Computing
follow
greedy-computing.html
12 Global Search Methods
follow
global.html
12.1 Naive Bayes Models
follow
naive-bayes.html
12.2 Simulated Annealing
follow
simulated-annealing.html
12.2.1 Selecting Features without Overfitting
follow
simulated-annealing.html#selecting-features-without-overfitting
12.2.2 Application to Modeling the OkCupid Data
follow
simulated-annealing.html#application-to-modeling-the-okcupid-data
12.2.3 Examining Changes in Performance
follow
simulated-annealing.html#examining-changes-in-performance
12.2.4 Grouped Qualitative Predictors Versus Indicator Variables
follow
simulated-annealing.html#grouped-qualitative-predictors-versus-indicator-variables
12.2.5 The Effect of the Initial Subset
follow
simulated-annealing.html#the-effect-of-the-initial-subset
12.3 Genetic Algorithms
follow
genetic-algorithms.html
12.3.1 External Validation
follow
genetic-algorithms.html#external-validation
12.3.2 Coercing Sparsity
follow
genetic-algorithms.html#coercing-sparsity
12.4 Test Set Results
follow
test-set-results.html
12.5 Summary
follow
summary-8.html
12.6 Computing
follow
global-computing.html
References
follow
references.html
Errata and Version History
follow
errata-and-version-history.html
Feature Engineering and Selection: A Practical Approach for Predictive Models
follow
./
Errata
follow
https://bookdown.org/max/FES/errata-and-version-history.html
Amazon
follow
https://www.amazon.com/gp/product/1138079227/ref=as_li_tl?ie=UTF8&tag=apm0a-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1138079227&linkId=c801e78acfc3bc022dbed02af4851962
Taylor & Francis
follow
https://www.crcpress.com/Feature-Engineering-and-Selection-A-Practical-Approach-for-Predictive-Models/Kuhn-Johnson/p/book/9781138079229
2010
follow
references.html#ref-Raimondi2010
2016
follow
references.html#ref-jahani2016comparison
2016
follow
references.html#ref-luo2016automatically
2015
follow
references.html#ref-stankovic2015investment
2011
follow
references.html#ref-thomson2011not
https://github.com/topepo/FES
follow
https://github.com/topepo/FES
https://bookdown.org/max/FES
follow
https://bookdown.org/max/FES
@alexpghayes
follow
https://github.com/alexpghayes
@AllardJM
follow
https://github.com/AllardJM
@AndrewKostandy
follow
https://github.com/AndrewKostandy
@bashhwu
follow
https://github.com/bashhwu
@btlois
follow
https://github.com/btlois
@cdr6934
follow
https://github.com/cdr6934
@danielwo
follow
https://github.com/danielwo
@davft
follow
https://github.com/davft
@draben
follow
https://github.com/draben
@eddelbuettel
follow
https://github.com/eddelbuettel
@endore
follow
https://github.com/endore
@feinmann
follow
https://github.com/feinmann
@gtesei
follow
https://github.com/gtesei
@ifellows
follow
https://github.com/ifellows
@JohnMount
follow
https://github.com/JohnMount
@jonimatix
follow
https://github.com/jonimatix
@jrfiedler
follow
https://github.com/jrfiedler
@juliasilge
follow
https://github.com/juliasilge
@jwillage
follow
https://github.com/jwillage
@kaliszp
follow
https://github.com/kaliszp
@KevinBretonnelCohen
follow
https://github.com/KevinBretonnelCohen
@kieroneil
follow
https://github.com/kieroneil
@KnightAdz
follow
https://github.com/KnightAdz
@kransom14
follow
https://github.com/kransom14
@LG-1
follow
https://github.com/LG-1
@LluisRamon
follow
https://github.com/LluisRamon
@LoweCoryr
follow
https://github.com/LoweCoryr
@lpatruno
follow
https://github.com/lpatruno
@mlduarte
follow
https://github.com/mlduarte
@monogenea
follow
https://github.com/monogenea
@mpettis
follow
https://github.com/mpettis
@Nathan-Furnal
follow
https://github.com/Nathan-Furnal
@nazareno
follow
https://github.com/nazareno
@PedramNavid
follow
https://github.com/PedramNavid
@r0f1
follow
https://github.com/r0f1
@Ronen4321
follow
https://github.com/Ronen4321
@shinhongwu
follow
https://github.com/shinhongwu
@stecaron
follow
https://github.com/stecaron
@StefanZaaiman
follow
https://github.com/StefanZaaiman
@treysp
follow
https://github.com/treysp
@uwesterr
follow
https://github.com/uwesterr
@van1991
follow
https://github.com/van1991
https://twitter.com/hashtag/rstats?src=hash
follow
https://twitter.com/hashtag/rstats?src=hash
↩
follow
index.html#fnref1
https://stackoverflow.com/questions/tagged/r
follow
https://stackoverflow.com/questions/tagged/r
↩
follow
index.html#fnref2
https://community.rstudio.com/
follow
https://community.rstudio.com/
↩
follow
index.html#fnref3