Wyoming | A part of this has to do with some R packages being ported to Python years ago, to allow them to be easier to hack and more flexible. The statistical mathematics behind some of these data analysis packages can become fairly intense, and it is easier to copy some ideas out of R rather than write them from scratch in Python.
R is more the language/package for the really heavy-duty data crunchers than Python, because it can do more crunching with less memory. Some datasets become quite large, and you start having run-time and memory considerations when working on them.
Python is quick, flexible, easier to learn, but you'll pay in terms of overall performance.
Now, the stats package that has been around the longest, and has the deepest support behind it, is SAS, from the SAS Institute:
https://www.sas.com/en_us/company-information.html
I first met SAS running on a IBM mainframe, under CMS running on VM/370. It was a powerful package even in 1984, and it's become only more powerful now. Where SAS would be preferred over R or Python is where there's "real money" riding on the results of the stats/data analysis. SAS has a large number of PhD-level mathematicians and the like who are writing their packages, testing them, etc. SAS has a verification suite for their stats/analysis mathematics, so they know their s/w package is giving you mathematically supported results.
The other neat thing about SAS is that, coming from their background of dealing with huge datasets on memory-limited systems (the IBM mainframe where I used it had only 8MB of memory installed - which was considered a huge amount of memory in that day), so that you can still take on huge datasets today, moving across the dataset piece-by-piece as you haul it into memory sequentially. Programs like R and Python are limited by the maximum amount of virtual memory you have configured on your system, and if you get to a point where you have a very large virtual memory size compared to your real, physical memory, your system is going to "thrash" as it pages back and forth. SAS manages the dataset storage themselves to avoid causing a demand-paged system to thrash.
R and the Python packages were developed as freeware in/around academia, and as such, they're favored by people who a) don't have much money to buy SAS, and b) don't have a lot of money riding on the results - so if they're sued for errors of analysis or conclusion, they don't need the PhD-level math backup that SAS can offer their customers. |