Python is an enormously powerful language with a large ecosystem that includes packages to solve nearly every conceivable type of problem. In addition to libraries, though, various "runtimes" have also evolved to solve specific types of problems. In this post, we'll dive into the Anaconda runtime and look at how it compares to the standard implementation (CPython).
Python's Reference Implementation: CPython
Before getting into too much detail about Anaconda, it is probably good to first introduce Python's reference implementation, CPython, and its associated ecosystem.
CPython is the version of Python developed and maintained by Guido Von Rossum (Python's creator) and the Python core team, the same people for all top-level decisions about the Python programming language. As might be implied from the name, it is written in C.
CPython is intended to be broadly used and compatible for essentially any use-case. Nearly all Python packages (unless specifically tailored for alternative runtimes, like PyPy or IronPython) will run inside of CPython.
Because it is the reference environment, CPython is also the most conservative in terms of extensions and optimizations. It doesn't include performance optimizations, such as the native JIT (just-in-time) compiler of PyPy, or integration into the Java (Jython) or .NET (IronPython) ecosystems. While many of the extensions can be added, they are not bundled by default and require additional work.
CPython is traditionally packaged into Linux and Unix-like operating systems by default. It uses pip as it's default package manager and the Python Package Index (PyPI) as the source of its packages.
The Anaconda Python distribution is a free and open-source software package that contains distributions of the Python and R programming languages for working with data science, machine learning applications, predictive analytics, and large-scale data processing machines.
Anaconda eases the development of scientific computing applications by providing a seamless way of managing virtual environments, packages, and other software deployments that are pre-configured and can be installed in a uniform and cross-platform manner. The Anaconda distribution comes with more than 1,500 packages, Conda package, a virtual environment manager, and a GUI Anaconda Navigator that provides a graphical alternative to the command line interface.
Put succinctly, Anaconda is an "all-in-one" cross-platform scientific computing software package. Because of its goal to be cross-platform and broadly stable, however, it provides its own set of utilities that are outside of CPython.
Conda Package Manager
The most important of these tool is the Conda Package manager.
One of the largest challenges that developers face is providing a stable and predictable development environment. As applications evolve and become increasingly complex, challenges associated with dependency conflicts, outdated packages, duplicated software, path-errors, become more prevalent.
Conda attempts to provide consistent sets of software which are more stable than what pip is able to provide. While management of sources with pip is often sufficient, it installs all required packages without checking for dependency conflicts with the packages you have previously installed. If care is not taken to carefully version software sources, it can be possible to update one package and in the process introduce errors with others.
Conda addresses these issues by providing "channels" of Python packages that are known to work together (much in the way that Linux distributions work). The Anaconda company and broader community create named software stacks with sets of packages that have been tested and robustly validated. When installing from one of these stacks, it is possible to further version limitations. The stacks include a description of how aggressively new software versions are adopted. If stability is required, then a "stable" stack might be used which includes older versions of software (with important stability and security fixes backported). If newer features are required a "beta" or "development" channel might be used.
If a dependency conflict should arise within a stack, conda is often able to provide information about the mismatches and provide guidance on what upgrade path might be used to return the application to sanity. In comparison, pip will just install all packages and dependencies, even if breakages will occur.
Even though Anaconda provides its own package management utility, pip is still available and can be used to download unverified sources from PyPI.
Underlying the conda package manager is Anaconda cloud. This service provides access to store and share packages in addition to public/private notebooks and environments and is where most Anaconda channels are stored.
Anaconda Navigator is a desktop application that provides a graphical user interface (GUI) for users to manage their anaconda environments. Included in the Anaconda distribution, comes the Navigator that gives users utility to launch applications, manage conda packages, environments and channels seamlessly. All of this can be done without using the traditional command-line interface (CLI). Navigator can search for packages and their dependencies in Anaconda Cloud or in a local repository, then install them to your desired Anaconda virtual environment. You can select what packages for Anaconda to run in your environment, and have Navigator keep them updated. Navigator is available for Windows, macOS and Linux.
Additional applications are available in Navigator to be synced up with your Anaconda environments and packages. The default applications available in Navigator include:
- JupyterLab: A next generation interface for Jupyter which includes extensions to bring it more into line with integrated development environments.
- Jupyter Notebooks
- QtConsole: A desktop based alternative to Jupyter which runs on top of the Qt desktop development framework.
- Spyder: Integrated development environment aimed at scientific computing problems.
- Glueviz. Visualization library that can be used to explore relationships within and between related datasets.
- Orange: Open source machine learning and data visualization library and user-interface.
- RStudio: Integrated development environment focused on the R programming language.