What criteria are, or reasonably might be, used by cosmologists to decide whether or not the assumption that curvature equals zero produces a better cosmological model than one with a non-zero curvature? I read (as best as I could) Section 6.2.4. Curvature, pp 37-39 of http://planck.caltech.edu/pub/2015results/Planck_2015_Results_XIII_Cosmological_Parameters.pdf . The following is a summary of what I interpreted Section 6.2.4 to be saying. Several cosmological models were discussed, all of which seems to have the following parameters : H0, Ωm, ΩΛ, and Ωk. For some of the datasets there may also have been some additional model parameters that I did not understand. A variety of combined data sets were used to create the various models. I did not understand the labels used to describe the combinations of datasets : 1. Planck TT+lowP posterior (Figure 25) 2. Planck TT,TE,EE+lowP (Figure 26) 3. Planck TT,TE,EE+lowP+lensing (Figure 26) 4. Planck TT,TE,EE+lowP+lensing+BAO (Figure 26)The text then gives the following respective values for Ωk, each with a 2 sigma, 95% confidence level error range. 1. (Equation 47) -0.053 +0.049 -0.055 2. (Equation 48) -0.040 +0.038 -0.041 3. (Equation 49) +0.005 +0.016 -0.017 4. (Equation 50) +0.000 +/-0.005The text adds, “We adopt Eq. (50) as our most reliable constraint on spatial curvature. Our universe appears to be spatially flat to an accuracy of 0.5%” As best as I can tell, the text seems to be saying that the choice of Equation 50 is based on the fact that the error range is the smallest. I would appreciate it if someone can authoritatively say whether or not this is the case. If this is the case, I have some concerns about the protocol. In my work for several years before I retired, I was involved in the development of data mining software, and became aware of the phenomenon of over-fitting a model to a dataset. This means continuing to “improve” a model's figure of merit past the optimum point of the model's ability to make good predictions on data that was not used to build the model. The concept is that although the figure of merit improved, what was happening was that the specifics of the training data set, including any statistical anomalies or outliers, were being modeled rather than the the general characteristics of the population from which the dataset was a sample. A protocol commonly used to avoid over-fitting is to divide the dataset into two subsets. One subset is used to build the model, and the second is used to determine how good a predictor that model is. I am unable to tell whether such a protocol was used in developing the models described in the article.