Concentration of measures:
Talagrand's "work illustrates the idea that the interplay of many random events can, counter-intuitively, lead to outcomes that are more predictable, and gives estimates for the extent to which the uncertainty is reigned in."
Marianne Freiberger: https://plus.maths.org/content/abel-prize-2024 @data @mathematics
"Majorizing measures provide bounds for the supremum of stochastic processes. They represent the most general possible form of the chaining argument".
Michel Talagrand, 1996, https://projecteuclid.org/journals/annals-of-probability/volume-24/issue-3/Majorizing-measures-the-generic-chaining/10.1214/aop/1065725175.full
In 2016, the American Statistical Association #ASA made a formal statement that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result".
It also stated that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone".
@maugendre P-values are abused far and wide. This has reminded me that I should add "ranting about p-values" to the list of things I rant about to high school maths and physics textbook publishers, teachers, curriculum writers and exam setters.
There even wikipedia on the "Misuse of p-values": https://en.wikipedia.org/wiki/Misuse_of_p-values
I therefore am adding to my guidelines: "Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. […]
Before we can improve our statistical inferences, we need to improve our statistical questions."
Excerpt from Daniël Lakens (2021) https://journals.sagepub.com/doi/10.1177/1745691620958012
"In #probability theory, a log-normal (or #lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution."
"It is a convenient and useful model for measurements in exact and engineering sciences, as well as medicine, economics […], energies, concentrations, lengths, prices".
Surveys, coincidences, statistical significance
"What Educated Citizens Should Know About Statistics and Probability"
By Jessica Utts, in 2003: https://ics.uci.edu/~jutts/AmerStat2003.pdf via @hrefna
"In real life, we weigh the anticipated consequences of the decisions that we are about to make. That approach is much more rational than limiting the percentage of making the error of one kind in an artificial (null hypothesis) setting or using a measure of evidence for each model as the weight."
Longford (2005) http://www.stat.columbia.edu/~gelman/stuff_for_blog/longford.pdf
Redressing #Bias: "Correlation Constraints for Regression Models":
Treder et al (2021) https://doi.org/10.3389/fpsyt.2021.615754
How to assess a statistical model?
How to choose between variables?
Pearson's #correlation is irrelevant if you suspect that the relationship is not a straight line.
If monotonic relationship:
"#Spearman’s rho is particularly useful for small samples where weak correlations are expected, as it can detect subtle monotonic trends." It is "widespread across disciplines where the measurement precision is not guaranteed".
"#Kendall’s Tau-b is less affected [than Spearman’s rho] by outliers in the data, making it a robust option for datasets with extreme values."
Ref: https://statisticseasily.com/kendall-tau-b-vs-spearman/
Accuracy! To counter regression dilution, a method is to add a constraint on the statistical modeling.
Regression Redress restrains bias by segregating the residual values.
My article: http://data.yt/kit/regression-redress.html