Outlier Prediction and Training Set Modification to Reduce Catastrophic Outlier Redshift Estimates in Large-Scale Surveys.




We present results of using individual galaxies' probability distribution over redshift as a method of identifying potential catastrophic outliers in empirical photometric redshift estimation. In the course of developing this approach we develop a method of modification of the redshift distribution of training sets to improve both the baseline accuracy of high redshift (z > 1.5) estimation as well as catastrophic outlier mitigation. We demonstrate these using two real test data sets and one simulated test data set spanning a wide redshift range (0 < z < 4). Results presented here inform an example "prescription" that can be applied as a realistic photometric redshift estimation scenario for a hypothetical large-scale survey. We find that with appropriate optimization, we can identify a significant percentage (>30%) of catastrophic outlier galaxies while simultaneously incorrectly flagging only a small percentage (<7% and in many cases <3%) of non-outlier galaxies as catastrophic outliers. We find also that our training set redshift distribution modification results in a significant (>10) percentage point decrease of outlier galaxies for z > 1.5 with only a small (<3) percentage point increase of outlier galaxies for z < 1.5 compared to the unmodified training set. In addition, we find that this modification can in some cases cause a significant (∼20) percentage point decrease of galaxies which are non-outliers but which have been incorrectly identified as outliers, while in other cases cause only a small (<1) increase in this metric.

Document Type


Publication Date


Publisher Statement

Copyright © 2021, IOP Science.

DOI: https://doi.org/10.1088/1538-3873/abe5fb.