GA programs require message-passing and Memory Allocator (MA) libraries to work. Global Arrays is an extension to the message-passing interface. GA internally does not allocate local memory from the operating system - all dynamically allocated local memory comes from MA. We will describe the details of memory allocation later in this section.
The GA toolkit needs the following functionality from any message-passing library it runs with:
GA provides two functions ga_nnodesand ga_nodeidthat return the number of processes and the calling process id in a parallel program. Starting with release 3.0, these functions return the same values as their message-passing counterparts. In earlier releases of GA on clusters of workstations, the mapping between GA and message-passing process ids were nontrivial. In these cases, the ga_list_nodeidfunction (now obsolete) was used to describe the actual mapping.
Although message-passing libraries offer their own barrier (global synchronization) function, this operation does not wait for completion of the outstanding GA communication operations. The GA toolkit offers a ga_syncoperation that can be used for synchronization, and it has the desired effect of waiting for all the outstanding GA operations to complete.
There are two flavors of dynamically allocated memory in GA: shared memory and local memory. Shared memory is a special type of memory allocated from the operating system (UNIX and Windows) that can be shared between different user processes (MPI tasks). A process that attaches to a shared memory segment can access it as if it was local memory. All the data in shared memory is directly visible to every process that attaches to that segment. On shared memory systems and clusters of SMP (symmetritc multiprocessor) nodes, shared memory is used to store global array data and is allocated by the Global Arrays run-time system called ARMCI. ARMCI uses shared memory to optimize performance and avoid explicit interprocessor communication within a single shared memory system or an SMP node. ARMCI allocates shared memory from the operating system in large segments and then manages memory in each segment in response to the GA allocation and deallocation calls. Each segment can hold data in many small global arrays. ARMCI does not return shared memory segments to the operating system until the program terminates (calls ga_terminate).
On systems that do not offer shared-memory capabilities or when a program is executed in a serial mode, GA uses local memory to store data in global arrays.
All of the dynamically allocated local memory in GA comes from its companion library, the Memory Allocator (MA) library. MA allocates and manages local memory using stack and heap disciplines. Any buffer allocated and deallocated by a GA operation that needs temporary buffer space comes from the MA stack. Memory to store data in global arrays comes fromheap. MA has additional features useful for program debugging such as:
Based on this information, a programmer who cares about the efficient usage of memory has to consider the amount of memory per single process (MPI task) needed to store data in global arrays to set the heap size argument value in ma_init. The amount of stack space depends on the GA operations used by the program (for example ga_mulmat_patch orga_dgemmneed several MB of buffer space to deliver good performance) but it probably should not be less than 4MB. The stack space is only used when a GA operaion is executing and it is returned to MA when it completes.
Fortran
subroutine
ga_initialize()
C
void GA_Initialize()
C++
void GA::Initialize(int
argc, char **argv)
and
Fortran
subroutine
ga_initialize_ltd(limit)
C
void GA_Initialize_ltd(size_t
limit)
C++
void GA::Initialize(int
argc, char **argv, size_t limit)
The first interface allows GA to consume as much memory as the application needs to allocate new arrays. The latter call allows the programmer to establish and enforce a limit within GA on the memory usage.
Note: In GA++, there is an additional functionality as
follows:
C++
void GA::Initialize(int
argc, char *argv[], unsigned long heapSize, unsigned long stackSize,
int type,
size_t limit=0)
It is encouraged that the user choose the first option, even though the user can intialize the GA normally and set the memory limit later.
Example: Initialization of MA and setting GA memory limits
Fortran
subroutine
ga_terminate()
C
void GA_Terminate()
C++
void GA::Terminate()
The programmer can also abort a running program for example as part of handling a programmatically detected error condition by calling the function
Fortran
subroutine
ga_error(message,
code)
C
void GA_Error(char
*message, int code)
C++
void GA::GAServices::error(char
*message, int code)
1. From scratch, for regular distribution, using
n-d Fortran logical function nga_create(type,
ndim, dims, array_name,
chunk, g_a)
2-d Fortran logical function ga_create(type,
dim1, dim2, array_name,
chunk1, chunk2, g_a)
C
int NGA_Create(int
type, int ndim, int dims[], char *array_name,
int chunk[])
C++
GA::GlobalArray* GA::GAServices::createGA(int
type, int ndim,
int dims[], char *array_name, int chunk[])
or for regular distribution, using
n-d Fortran logical function nga_create_irreg(type,
ndim, dims, array_name,
map, nblock, g_a)
2-d Fortran logical function ga_create_irreg(type,
dim1, dim2, array_name,
map1, nblock1, map2, nblock2, g_a)
C
int NGA_Create_irreg(int
type, int ndim, int dims[],
C++
GA::GlobalArray* GA::GAServices::createGA(int
type, int ndim,
int dims[], char *array_name, int map[], int block[])
2. Based on a template (an existing array) with the function
Fortran
logical
function ga_duplicate(g_a,
g_b, array_name)
C
int GA_Duplicate(int
g_a, char *array_name)
C++
int GA::GAServices::duplicate(int
g_a, char *array_name) - or -
C++
GA::GlobalArray* GA::GAServices::createGA(int
g_a, char *array_name)
3. Refer "Creating arrays - II" section.
In this case, the new array inherits all the properties such as distribution, datatype and dimensions from the existing array.
With the regular distribution, the programmer can specify block size
for
none or any dimension. If block size is not specified the library will
create
a distribution that attempts to assign the same number of elements to
each
processor (for static load balancing purposes). The actual algorithm
used
is based on heuristics.

With the irregular distribution, the programmer specifies
distribution
points for every dimension using map array argument. The
library creates
an array with the overall distribution that is a Cartesian product of
distributions
for each dimension. A specific example is given in the documentation.

If an array cannot be created, for example due to memory shortages or an enforced memory consumption limit, these calls return failure status. Otherwise an integer handle is returned. This handle represents a global array object in all operations involving that array. This is the only piece of information the programmer needs to store for that array. All the properties of the object (data type, distribution data, name, number of dimensions and values for each dimension) can be obtained from the library based on the handle at any time, see Section 7.4. It is not necessary to keep track of this information explicitly in the application code.
Note that regardless of the distribution type at most one block can
be
owned/assigned to a processor.
n-d Fortran
logical function nga_create_ghosts(type,
dims, width, array_name,
chunk,
g_a)
C
int int
NGA_Create_ghosts(int type, int ndim, int dims[], int width[],
char
*array_name, int chunk[])
C++
int GA::GAServices::createGA_Ghosts(int
type, int ndim, int dims[],
int
width[], char *array_name, int chunk[])
n-d Fortran
logical function nga_create_ghosts_irreg(type,
dims, width,
array_name,
map, block, g_a)
C
int int
NGA_Create_ghosts_irreg(int type, int ndim, int dims[],
int width[], char *array_name, int map[], int block[])
C++
int GA::GAServices::createGA_Ghosts(int
type, int ndim, int dims[],
int
width[], char *array_name, int map[], int block[])

For a global array with ghost cells, the data distribution can be visualized as follows:

As mentioned in the previous section ("Creating arrays - I"), there are 3 ways to create arrays. This section describes method #3 to create arrays. Because of the increasingly varied ways that global arrays can be configured, a set of new interfaces for creating global arrays has been created. This interface supports all the configurations that were accessible via the old ga_create_XXX calls, as well as new options that can only be accessed using the new interface. Creating global arrays using the new interface starts by a call to ga_create_handle that returns the user a new global array handle. The user then calls several ga_set_XXX calls to assign properties to this handle. These properties include the dimension of the array, the data type, the size of the array, and any other properties that may be relevant. At present, the available ga_set_XXX calls largely reflect properties that are accessible via the nga_create_XXX calls, however, it is anticipated that the range of properties that can be set using these calls will expand considerably in the future. After all the properties have been set, the user calls ga_allocate on the array handle and memory is allocated for the array. The array can now be used in exactly the same way as arrays created using the traditional ga_create_XXX calls. The calls for obtaining a new global array handle are
n-d Fortran integer
function ga_create_handle()
C
int GA_Create_handle()
Properties of the
global arrays can be set using the
ga_set_XXX calls. Note that the only required call is to ga_set_data.
The
others are all optional.
n-d Fortran
subroutine ga_set_data(g_a,
ndim, dims, type)
C
void GA_Set_data(int
g_a, int ndim, int *dims, int
type)
The argument g_a is
the global array handle, ndim
is the
dimension of the array, dims
is an array of ndim numbers
containing the
dimensions of the array, and type
is the data type as defined in either the
macdecls.h or mafdecls.h files. Other options that can be set using
these
subroutines are:
n-d Fortran subroutine
ga_set_array_name(g_a, array_name)
C
void GA_Set_array_name(int
g_a, char
*array_name)
This subroutine assigns a character string as an array name
to the global array.
The ga_set_irreg_distr subroutine can be used to
specify the
distribution of data among processors. The block array contains the
processor
grid used to lay out the global array and the map array contains a list
of the
first indices of each block along each of the array axes. If the first
value in
the block array is M, then the first M values in the map array are the
first
indices of each data block along the first axis in the processor grid.
Similarly, if the second value in the block array is N, then the values
in the
map array from M+1 to M+N are the first indices of the each data block
along
the second axis and so on through the D dimensions of the global array.
n-d Fortran
subroutine
ga_set_ghosts(g_a, width)
C
void
GA_Set_ghosts(int g_a, int *width)
This call can be used to set the ghost cell width along each of the array dimensions.
n-d Fortran
subroutine
ga_set_pgroup(g_a, p_group)
C
void
ga_set_pgroup(int g_a, int p_group)
This call assigns a processor group to the global array. If no processor group is assigned to the global array, it is assumed that the global array is created on the default processor group.
n-d Fortran logical
function ga_allocate(g_a)
C
int
GA_Allocate(int
g_a)
This function returns a logical variable that is true if the global array was successfully allocated and false otherwise.
Fortran
logical
ga_destroy(g_a)
C
void GA_Destroy(int
g_a)
C++
void GA::GlobalArray::destroy()
that takes as its argument a handle representing a valid global array. It is a fatal error to call ga_destroy with a handle pointing to an invalid array.
All active global arrays are destroyed implicitly when the user callsga_terminate.