An interesting point on compression: we were trying to ground Vitani and Li's claim that the most compressed (or compressible) description is most likely to be true (this is "Minimum Description Length" theory). Taking a page from Baum, I suggested:
- Think of the phenomenon we're observing as a process with inputs and outputs, existing in an environment of finite resources.
- Two processes may exist that have identical inputs and outputs, but which may differ in the process by which the output is produced.
- If one process takes longer (ie: if the program is larger), then there will be proportionally more outputs produced by the faster process per unit of time, until the input resources are expended.
- Therefore, there's a higher probability that the output we're observing has been produced by the simpler process.
Later, We were trying to figure out why a model with fewer parameters gives larger priors. I pointed out that when you add parameters, you loose degrees of freedom, which is comparable to reducing the sample size, and thus increasing the variance. This (I guess?) would have the effect of reducing the prior. I might be blowing smoke.
Skipper was also intrigued by likelihood ratios: I explained that the "likelihood" of a model for a data set is the ratio of the product of the observed data (fit to the model) to the product of the theoretical values if the model was true. This is, at least, roughly how it works.