GA divides logically shared data structures into "local" and "remote" portions. It recognizes variable data transfer costs required to access the data depending on the proximity attributes. A local portion of the shared memory is assumed to be faster to access and the remainder (remote portion) is considered slower to access. These differences do not hinder the ease-of-use since the library provides uniform access mechanisms for all the shared data regardless where the referenced data is located. In addition, any processes can access a local portion of the shared data directly/in-place like any other data in process local memory. Access to other portions of the shared data must be done through the GA library calls.
GA was designed to complement rather than substitute the message-passing model, and it allows the user to combine shared-memory and message-passing styles of programming in the same program. GA inherits an execution environment from a message-passing library (w.r.t. processes, file descriptors etc.) that started the parallel program.
GA is implemented as a library with C and Fortran-77 bindings, and there have been also a Python and C++ interfaces (included starting with the release 3.2) developed. Therefore, explicit library calls are required to use the GA model in a parallel C/Fortran program.
A disk extension of the Global Array library is supported by its companion library called Disk Resident Arrays (DRA). DRA maintains array objects in secondary storage and allows transfer of data to/from global arrays.
A programmer in the GA program has a full control over the distribution of global arrays. Both regular and irregular distributions are supported, see Section 3 for details.
The GA data transfer operations use an array index-based interface rather than addresses of the shared data. Unlike other systems based on global address space that support remote memory (put/get) operations, GA does not require the user to specify the target process/es where the referenced shared data resides -- it simply provides a global view of the data structures. The higher level array oriented API (application programming interface) makes GA easier to use, at the same time without compromising data locality control. The library internally performs global array index-to-address translation and then transfers data between appropriate processes. If necessary, the programmer is always able to inquire:
The supported array dimensions range from one to seven. This limit follows the Fortran convention. The library can be reconfigured to support more than 7-dimensions but only through the C interface.
The data distribution and locality control are provided to the programmer.
The data locality information for the shared data is also available. The
library offers a set of operations for management of its data structures,
one-sided data transfer operations, and supportive operations for data
locality control and queries. The GA shared memory consistency model is
a result of a compromise between the ease of use and a portable performance.
The load and store operations are guaranteed to be ordered with
respect to each other only if they target overlapping memory locations.
The store operations (put, scatter) and accumulate complete
locally before returning i.e., the data in the user local buffer has been
copied out but not necessarily completed at the remote side. The memory
consistency is only guaranteed for:
The data-parallel model is supported by a set of collective functions that operate on global arrays or their portions. Underneath, if any interprocessor communication is required, the library uses remote memory copy (most often) or collective message-passing operations.
When to use GA:
Algorithmic ConsiderationsWhen not to use GA:Usability Considerations
- applications with dynamic and irregular communication patterns
- for calculations driven by dynamic load balancing
- need 1-sided access to shared data structures
- need high-level operations on distributed arrays and/or for out-of-core array-based algorithms (GA + DRA)
- data locality must be explicitly available
- when coding in message passing becomes too complicated
- when portable performance is important
- need object orientation without the overhead of C++
Algorithmic ConsiderationsUsability Considerations
- for systolic, or nearest neighbor communications with regular communication patters
- when synchronization associated with cooperative point-to-point message passing is needed (e.g., Cholesky factorization in Scalapack)
- when interprocedural analysis and compiler parallelization is more effective
- a parallel language support is sufficient and robust compilers available