Oct

12

Recommendations for an intro to multivariate statistics?

Bill Egan replies:

Here are four excellent multivariate statistics books I have used for many years. I suggest tackling them in this order.
1. Jerrold Zar - Biostatistical Analysis, 5th ed. (this is half univariate and half multivariate)
2. Neter, Kutner, Wasserman, Nachtsheim - Applied Linear Statistical Models, 4th ed (there is now a 5th ed and you can find the pdf by googling)
3. Alvin Rencher - Methods of Multivariate Analysis (there is now a 3rd ed.)
4. Mardia, Kent, Bibby - Multivariate Analysis (there is now a 2nd ed.)

You need to understand linear algebra to do this, e.g., at the level of Strang's Introduction to Linear Algebra, 6th ed. (his lectures are on MIT's opencourse website). Rencher, Neter, and Mardia all use that notation extensively. You also need to understand and be able to do univariate stats at the level of:
• Snedecor and Cochran - Statistical Methods, 8th ed.
• Riffenburgh and Gillen - Statistics in Medicine, 4th ed.

You will really learn multivariate methods only if you code them. Matlab is the best (Matlab Home is cheap), and yes, I coded everything in these books and a lot more work of my own invention in Matlab.

David Lillienfeld adds:

Snedecor and Cochran is the grand old lady of texts. Neter et al is still pretty popular on campuses.

Asindu Drileba asks:

Concerning statistical packages. I often hear some data science communities complain about how there are simply too many bugs & wrong implementations in the Python space. Maybe this is why you are recommending MATLAB? What do think of R or Julia?

Bill Egan responds:

I have used Matlab since 1993 for many things - research, papers, patents, commercial scientific software products. Matlab stands for matrix laboratory. The original data structure was scalar, vector, matrix. If you like to work in matrix/linear algebra notation, or need to, Matlab is the program to use. Other data structures have been added on, such as tables for mixed data types, but like al ladd-ons, this does not always work well. Quality control of the software is great. Very widely used by engineers. Very high level language, so you can see the algorithm without getting lost in the details like you do in C++.

R is not so good for linear algebra because the original data structure is a table for mixed data types. Matrix work is more difficult. Quality control of core R and major packages is good despite R being open source (although it has license restrictions) because it is used by many academic statisticians. I used R for analysis for a couple of years. Fairly high level language. Better for classical stats work where you make a table out of the data and have mixed data types.

Python is completely open source and the people who created and use it most have no knowledge of statistics and that shows. We used it primarily as a scripting/control language inside one of my software products. Available packages do have bugs/errors or are missing methods for stats. We tested them and could not use them; I had my guys code any stats related stuff from scratch. It is not as high level a language as R or Matlab, so you have to do more work. Do not recommend it.

I have no experience with Julia.


Comments

Name

Email

Website

Speak your mind

Archives

Resources & Links

Search