Star-P for Python and NumPy

Python is growing in popularity, as are the set of Python extensions contained in the NumPy module. Interactive Supercomputing offers a Python client as part of the Star-P suite of parallelization tools. This Python client works in a way which will feel comfortable to users of NumPy. It enables NumPy users to run programs on a parallel supercomputer with minor syntax changes.

If you are a Python user doing numerical computing on large matrix and array objects, The Star-P Python client supports desktop code development and then when more performance is needed for larger datasets, Star-P allows you to export your calculations to a server, cluster or off site resource while still controlling the computation from the interactive session running on the desktop.

Star-P allows scientists and engineers to write number crunching programs in a comfortable high-level language, and then immediately and seamlessly run their code on a parallel computer. Star-P renders it unnecessary to re-code an application into C, C++ or FORTRAN to scale up the processing power of a numerical application. Star-P handles the details of data distribution, task scheduling and low level MPI based communications. The user remains focused on their application and algorithm and is not distracted by the complicated low-level details of parallel computing. In particular, Star-P delivers interactive performance by automatically:

  • Sending computations to the parallel server
  • Providing access to built-in world-class parallel processing libraries
  • Managing inter-processor communication on the parallel server
  • Managing memory allocation for large datasets
  • Bundling and returning data from the server supercomputer to the desktop for further analysis and visualization

Data Parallel and Task Parallel Computing

Leveraging both data- and task-parallel computing is necessary in many scientific and technical simulations. Star-P enables users to work in both modes and to seamlessly interoperate between the two.

  • Star-P's data-parallel mode enables algorithms requiring large-scale memory access for distributed arrays and inter-processor communication, often called "global array computing", such as those found in matrix manipulation and signal processing applications.
  • Star-P's task-parallel mode is ideally suited for parallelization of algorithms often called "embarrassingly parallel," where computations can be naturally broken up into largely independent processes such as Monte Carlo simulation, or parallelization of For-loops.

Star-P for Python: Invoking Data-Parallel Functionality

Star-P's Python client builds on functionality in the NumPy module. Codes targeted to use NumPy may be ported to the Star-P platform by simply replacing numpy calls with starp calls, like this:

# NumPy version import numpy a = numpy.random.rand(5, 5) b = numpy.linalg.inv(a) c = a*b # Star-P version import starp starp.defaultConnect('hostname', '/path/to/starp/installation') # Establish connection to server a = starp.numpy.random.rand(5, 5) b = starp.numpy.linalg.inv(a) c = a*b

NumPy functions have Star-P equivalents which are called in the same way. Therefore, besides the call to "starp.defaultConnect()", NumPy programs can become Star-P programs simply by replacing the word "numpy" with "starp.numpy". Or one could simply "import starp as numpy." The difference is that Star-P operates on parallelized arrays held on the parallel server!

Star-P for Python: Task Parallel Operation

Star-P also supports task-parallel computations, which refers to performing longer, non-communicating computations in parallel, and then gathering the results at the computation's end. Task parallel operations could sensibly run on separate threads (or separate processes) on a serial computer. Parallelized Monte-Carlo simulation is a classic example of task-parallelism. Star-P's Python client supports task-parallel computations explicitly by introducing a new function used to invoke parallel execution of any desired function. That new function is called "starp.ppeval()". Starp.ppeval handles the job of splitting up data, oversees execution of the function in parallel and gathers the returned results together.