Macbook m1 and python libraries

wizarpy_vm picture wizarpy_vm · Dec 1, 2020 · Viewed 30.1k times · Source

Is new macbook m1 suitable for Data Science?

Do Data Science python libraries such as pandas, numpy, sklearn etc work on the macbook m1 (Apple Silicon) chip and how fast compared to the previous generation intel based macbooks?

Answer

jakub picture jakub · Dec 1, 2020

This GitHub repository has lots of useful information about the Apple M1 chip and data science in Python https://github.com/neurolabusc/AppleSiliconForNeuroimaging. I have included representative quotes below. This repository focuses on software for brain imaging analysis, but the takeaways are broad.

Updated on 27 September 2021.

TL;DR

Unless you are a developer, I would strongly discourage scientists from purchasing an Apple Silicon computer in the short term. Productive work will require core tools to be ported. In the longer term, this architecture could have a profound impact on science. In particular if Apple develops servers that exploit the remarkable power efficiency of their CPUs (competing with AWS Graviton) and leverage the Metal language and GPUs for compute tasks (competing with NVidia's Tesla products and CUDA language).

Limitations facing Apple Silicon

The infrastructure scientists depend on is not yet available for this architecture. Here are some of the short term limitations:

  • Native R can use the unstable R-devel 4.1. However, RStudio will require gdb.
  • Julia does not yet natively support Apple Silicon.
  • Python natively supports Apple Silicon. However, some modules have issues or are slow. See the NiBabel section below.
  • Scientific modules of Python, R, and Julia require a Fortran compiler, which is currently only available in experimental form.
  • While Apple's C Clang compiler generates fast native code, many scientific tools will need to wait until gcc and gFortran compilers are available.
  • Tools like VirtualBox, VMware Fusion, Boot Camp and Parallels do not yet support Apple Silicon. Many users rely on these tools for using Windows and Linux programs on their macOS computers.
  • Docker can support Apple Silicon. However, attempts to run Intel-based containers on Apple Silicon machines can crash as QEMU sometimes fails to run the container. These containers are popular with many neuroimaging tools.
  • Homebrew 3.0 supports Apple Silicon. However, many homebrew components do not support Apple Silicon. MATLAB is used by many scientific tools, including SPM. While Matlab works in translation, it is not yet available natively (and mex files will need to be recompiled).
  • FSL and AFNI do not yet natively support this architecture. While code may work in translation, creating some native tools must wait for compilers and libraries to be updated. This will likely require months.
  • The current generation M1 only has four high performance cores. Most neuroimaging pipelines combine sequential tasks that only require a single core (where the M1 excels) as well as parallel tasks. Those parallel tasks could exploit a CPU with more cores (as shown in the pigz and niimath tests below). Bear in mind that this mixture of serial and parallel code faces Amdahls law, with diminishing returns for extra cores.
  • The current generation M1 has a maximum of 16 Gb of RAM. Neuroimaging datasets often have large memory demands (especially multi-band accelerated functional, resting-state and diffusion datasets).
  • In general, the M1 and Intel-based Macs have identical OpenGL compatibility, with the M1 providing better performance than previous integrated solutions. However, there are corner cases that may break OpenGL tools. Here I describe four limitations. First, OpenGL geometry shaders are not supported (there is no Metal equivalent). Second, the new retina displays support wide color with 16 bitsPerSample that can cause issues for code that assumes 32-bit RGBA textures (such as the text in this Apple example code). Third, textures can be handled differently. Fourth, use of the GL_INT_2_10_10_10_REV data type will cripple performance (tested on macOS 11.2). This is unfortunate, as Apple advocated for this datatype once upon a time. In this case, code msut be changed to use the less compact GL_HALF_FLOAT which is natively supported by the M1 GPU. This impacts neuroimaging scientists visualizing DTI tractography where GPU resources can be overwhelmed.