Normalized nonconformity measures for automated valuation models

Experiments on house price
Normalization Techniques and model averaging

Conformal Prediction

Given an input feature vector $x$, a traditional regression algorithm outputs a point prediction $\hat{y}$ that represents the algorithm’s best estimate of the true label $y$. However, traditional regression algorithms typically lack a measure of confidence in their predictions. Specifically, they do not indicate how close $\hat{y}$ is likely to be to the true label $y$, or how much $\hat{y}$ might deviate from $y$.

Conformal prediction (CP) addresses this by supplementing a traditional regression algorithm with a measure of confidence. CP takes the point predictions produced by a traditional regression algorithm and outputs a prediction region that is valid at a user-defined confidence level, such as $(1 - \epsilon) \times 100\%$.

Settings

Training Examples: $T^* = \{(X_1, Y_1), (X_2, Y_2)...\}$, where $i = 1, ..., l, X_i\in \mathbb{R}^d, Y_i \in \mathbb{R}$.

Point Prediction: $\hat{Y}{l+1}:=f{T^}(X_{l+1})$, where $f_{T^}: \mathbb{R}^d \rightarrow \mathbb{R}$

In conformal prediction, we make set predictions instead of point predictions. Given a significance level $\epsilon \in (0,1)$, a CP outputs a prediction set $\Gamma^\epsilon(T^*, X_{l+1}) \subseteq \mathbb{R}$ such that

$$ P \left( Y_{l+1} \in \Gamma_\epsilon(T^*, X_{l+1}) \right) \geq 1 - \epsilon $$

More precisely, the conformal prediction framework only requires the data sequence to be exchangeable, of which i.i.d. is a special case.

Inductive Conformal Prediction

The residuals on the training set $R_i = |Y_i - f_{T^}(X_i)|$, individually, each residual gives us the magnitude of the prediction error on a training example when $f_{T^}(X_i)$ is used as the prediction function. Together, they form an empirical distribution of errors, which allows us to estimate quantities such as the empirical quantile. Hence, it seems reasonable to construct a prediction set as

$$ \Gamma_\epsilon (T^, X_{l+1}) = \left[ f_{T^} (X_{l+1}) - q_\epsilon, f_{T^*} (X_{l+1}) + q_\epsilon \right] $$

where $q_\epsilon$ is the $(1-\epsilon)$-empirical quantile of values $R_i$.

Intuition: The empirical quantile $q_\epsilon$ is an estimation of how wrong the prediction $f_{T^}(X_{l+1})$ can be in $(1-\epsilon) \times 100 \%$ of all cases. In order to ensure the prediction set $\Gamma^\epsilon(T^, X_{l+1})$ covers the true label $Y_{l+1}$ with the probability of $1-\epsilon$, we simply use $q_\epsilon$ as the margin of error for the prediction. However, in general, the prediction set is not valid. This is because the regression function is specifically trained to minimize the residuals. Thus, the empirical quantile of residuals is likely to be too optimistic of an error bound for a new unseen example $X_{l+1}$.