Recent value-at-risk (VaR) models based on historical simulation often incorporate approaches where the volatility of the historical sample is rescaled or filtered to better reflect current market conditions. These filtered historical simulation (FHS) VaR models are now widely used in the industry and, as is usually the case with VaR models, they are validated through backtesting. However, while backtesting is a natural way of testing a percentile forecast, it is not specifically designed to capture other features of the model, such as its efficiency in adapting to new volatility conditions. In this paper, we discuss the limitations of backtesting as a tool to assess the performance of FHS models and, using a Monte Carlo simulation framework, we examine whether incorporating information about the size of the breaches (through the use of score functions, for example) can improve the efficiency of these tests. The results show that, even when incorporating the size of the VaR violations, tests based solely on the breaches generally fail as a tool to discriminate between different calibrations of the decay factor; they also tend to be biased. Among the alternative tests considered, the asymmetric piecewise linear score performs best overall, followed by the dynamic quantile test. We conclude by considering some empirical examples.