Why is Python a language of choice for data scientists?

Python is a fairly complete language with relatively concise syntax. If you were writing out an algorithm in pseudo code on a whiteboard, you could probably turn your text into compiling Python with very little work. Its string manipulation syntax is concise and natural (which comes up a lot in my experience), lists and dictionaries work with no overhead, and it has an interpreter for rapid iteration. You’re going to be a lot more productive in your first hour of a Python project than you will be in more powerful languages.

By comparison, the two dominant full-featured languages, Java and C++, expect you to learn a lot of additional syntax and write a lot more text to achieve the same goal. That’s fine for a full-time software engineer. Java and C++ give you a lot of speed and power in return for all the additional things you need to know. They force you to really think through exactly what you want to do, which isn’t good in the first hour but is great in the following days, weeks, and years. For example, both have strong typing which requires a lot of additional typing and thought but also greatly reduce the number of ways your program can bite your head off. The time spent learning all their intricacies isn’t worthwhile for a data scientist who is mostly writing terminal code (i.e. code that isn’t going to be referenced by other code).

Python has a great number of scientific computing libraries provided by the huge community around it. Have a quick look at PyPi, a repository of software for Python, and explore the full extent of what is being developed within the Python community. NumPy is a great example here –– it’s the core library for scientific computing in Python, established in 2006. Recently, NumPy raised a $645,000 grant, which will support its development.

Another good example is SciPy. This library can be used for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers, and other tasks popular in science and engineering. SciPy builds on the NumPy array object and is part of the NumPy stack, which includes tools such as Matplotlib, pandas, and SymPy, and an expanding set of scientific computing libraries.

Author: MahesH

Leave a Reply

Your email address will not be published. Required fields are marked *