Guide to scientific coding

Programming languages

When people with no common language need to communicate, they develop something called a pidgin—a sort of simplified combination of their native languages. At its core, a programming language is a pidgin: a way for people (who don’t speak machine code) to communicate with computers (which only speak machine code). I’ve found that this approach to programming really helps when things get frustrating. It can be very difficult to communicate complex ideas in a language with no native speakers.

There are really three ways to measure how “good” a programming language is: performance, support, and readability. Performance deals with how quickly the code runs and how much memory it uses. Support deals with how well the language is documented and maintained, as well as the scope of things it can do. Readability is a measure of how easy it is to understand what the code will do just by reading it. There are hundreds of languages that fall all over this spectrum, but one of the most common for scientific work is Python.

There are a few reasons why this is the case. First of all, Python was designed from square one with readability as the top priority. That means it’s easy to pick up, even if you’ve never coded before. For a scientist like you, it also means less time thinking about syntax and more time thinking about science! The second big reason to use Python is its ubiquity. MATLAB is great for math, and LabVIEW is great for device interfacing, but Python is great at both, and more! You can automate an entire experiment, check on it remotely through SSH, format the collected data into a compact file structure, then analyze that data in all sorts of ways, all without ever having to switch syntactic gears. Finally, Python (and nearly all of its add-on libraries) is completely free—free as in “free lunch,” but also free as in “free speech.” It belongs to a worldwide community of developers that is open to all.

That said, Python isn’t perfect. Its biggest drawbacks are in the performance department, speed in particular. But generally the hours gained in coding time are worth the seconds lost in run time.

How to code

Due to the large number of coding tutorials out there, it would be a waste of time for me to sit here and write my own. LearnPython.org is a great place to start if you’ve never programmed before. If you’re a little more experienced, the official Python tutorial is much more comprehensive. Even after you’ve mastered the basics, it’s probably a good idea to have the documentation for Python, NumPy, SciPy, and Matplotlib all bookmarked in your web browser, along with anything else you find yourself searching for a lot.

You can write code using any text editor, but there are certain apps (called integrated development environments or IDEs) that are designed to make it way easier. Python comes with its own basic IDE (called IDLE), but there are others out there as well, such as PyCharm, Spyder, Thonny, and many others. Each has its own benefits and drawbacks, so try a few before you settle on one.

Finally, physics majors are required to take CS 142, which covers the basics of computer programming in C++. Even though you probably won’t use C++ in the lab, it is really helpful to learn. Python hides a lot of things behind its simple language, and C++ is a bit more exposed.

How to code WELL

Once you've learned the basics, take some time and really read the Good enough practices for scientific computing (Wilson, 2017). Many of the principles and guidelines will seem excessive. They are not. If every physicist read and applied this one paper, we would have a fusion-powered settlement on Mars by now.^{[citation needed]} Yes, that's probably an exaggeration. It's not an exaggeration to say that sloppy code wastes far more of scientists' time than should be allowed.

Now, you might occasionally think, "This particular piece of code is too small and unimportant for any of this to matter." And that's probably true, to some extent. However, if you don't build these habits while it doesn't matter, you won't have them when it does. Then you'll be faced with a choice:

Put your work (and anyone else's work that depends on it) on hold for a few hours/days while you frantically try to get the code to work, or
Put your work (and anyone else's work that depends on it) on hold for a few weeks/months while you rewrite everything the right way.

Isn't it just so much easier to do it right from the beginning?

Git

If and when you start writing more complicated and collaborative code, you’ll want to use Git to keep it organized. At first glance, Git might just look like an overcomplicated Google Drive, but it’s much more than that.[1] It’s a powerful version control system designed entirely for programmers.

Git can look complicated and unapproachable if you’ve never used it before, but I promise you it is worth learning. Start by downloading a git client here. It’s free, and comes with its own command-line interface as well as a GUI. Do some Google research to learn the basic commands (clone, stage, commit, pull, push, branch, merge) and you’ll be good like 98% of the time. Also, many IDEs have built-in Git support, which makes managing code a lot easier.

GitHub has some really good learning resources for getting started.

Additional resources

Python tutorials

Notes

↑ One of the key differences is the concept of decentralization. When you’re typing a Google Doc, you’re working in the cloud, which means that every keystroke of every collaborator has to be updated in realtime. This requires powerful, expensive central servers (which Google has, and enjoys making money from). Git, on the other hand, does all of the hard work on the user’s end, and only submits the "committed" update, which requires the server to be slightly more powerful than a potato.