subroutine ga_initialize()Allocate and initialize internal data structures for Global Arrays.
subroutine ga_initialize_ltd(limit)
integer limit - amount of memory in bytes perprocess [input]
Allocate and initialize internal data structures for global arrays
and set limit for memory used in global arrays.
limit < 0 means "allow unlimited memory usage"
This is a collective operation.
integer function ga_pgroup_create(list, size)This command is used to create a processor group. At present, it must be invoked by all processors in the current default processor group. The list of processors use the indexing scheme of the default processor group. If the default processor group is the world group, then these indices are the usual processor indices. This function returns a process group handle that can be used to reference this group by other functions.
integer size - number of processors in group [input]
integer list(size) - list of processors in processor
group [input]
This is a collective operation on the
default
processor group.
logical function ga_pgroup_destroy(p_handle)This command is used to free up a processor group handle. It returns false if the processor group handle was not previously active.
integer p_handle - processor group handle [input]
subroutine ga_pgroup_set_default(p_handle)This subroutine can be used to reset the default processor group on a collection of processors. All processors in the group referenced by p_handle must make a call to this subroutine. Any standard global array call that is made resetting the default processor group will be restricted to processors in that group. Global arrays that are created after resetting the default processor group will only be defined on that group and global operations such as ga_sync or ga_igop will be restricted to processors in that group. The ga_pgroup_set_default call can be used to rapidly convert large applications, written with GA, into routines that run on processor groups.
integer p_handle - processor group handle [input]
The default processor group can be overridden by using GA calls that require an explicit group handle as one of the arguments.
This is a collective operation on the
group
represented by the handle p_handle.
logical function ga_create(type, dim1, dim2, array_name, chunk1, chunk2, g_a)
character*(*) array_name - a unique character string [input]
integer type - MA type [input]
integer dim1/2 - array(dim1,dim2) as in FORTRAN [input]
integer chunk1/2 - minimum size that dimensions should
be chunked up into [input]
integer g_a - handle for future references [output]
Creates a 2-dimensional global array.
Setting chunk1=dim1 gives distribution by vertical strips (chunk2*columns); setting chunk2=dim2 gives distribution by horizontal strips (chunk1*rows). Actual chunks will be modified so that they are at least the size of the minimum and each process has either zero or one chunk. Specifying chunk1/2 as < 1 will cause that dimension to be distributed evenly.
Return value: .true. means the call was succesful.This is a collective
operation.
logical function nga_create(type, ndim, dims, array_name, chunk, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer chunk(ndim) - array of chunks, each element specifies
minimum size that given dimensions should be
chunked up into [input]
integer g_a - integer handle for future references [output]
Creates an ndim-dimensional array using the regular distribution
model and returns an integer handle representing the array.
The array can be distributed evenly or not evenly. The control over the distribution is accomplished by specifying chunk (block) size for all or some of array dimensions. For example, for a 2-dimensional array, setting chunk(1)=dim(1) gives distribution by vertical strips (chunk(1)*dims(1)); setting chunk(2)=dim(2) gives distribution by horizontal strips (chunk(2)*dims(2)). Actual chunks will be modified so that they are at least the size of the minimum and each process has either zero or one chunk. Specifying chunk(i) as <1 will cause that dimension (i-th) to be distributed evenly.
Return value: .true. means the call was succesful.This is a collective
operation.
logical function nga_create_config(type, ndim, dims, array_name, chunk,
p_handle, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer chunk(ndim) - array of chunks, each element specifies
minimum size that given dimensions should be
chunked up into [input]
integer p_handle - processor group handle [input]
integer g_a - integer handle for future references [output]
Creates an ndim-dimensional array using the regular distribution
model but with an explicitly specified processor group handle and
returns
an integer handle representing the array.
This call is essentially the same as the nga_create call except for the processor group handle p_handle. It can be used to create mirrored arrays.
Return value: .true. means the call was succesful.This is a collective
operation.
logical function nga_create_ghosts(type, ndim, dims, width, array_name, chunk, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer width(ndim) - array of ghost cell widths [input]
integer chunk(ndim) - array of chunks, each element specifies
minimum size that given dimensions should be
chunked up into [input]
integer g_a - integer handle for future references [output]
Creates an ndim-dimensional array with a layer of ghost cells around
the visible data on each processor using the regular distribution model
and returns an integer handle representing the array.
The array can be distributed evenly or not evenly. The control over the distribution is accomplished by specifying chunk (block) size for all or some of the array dimensions. For example, for a 2-dimensional array, setting chunk(1)=dim(1) gives distribution by vertical strips (chunk(1)*dims(1)); setting chunk(2)=dim(2) gives distribution by horizontal strips (chunk(2)*dims(2)). Actual chunks will be modified so that they are at least the size of the minimum and each process has either zero or one chunk. Specifying chunk(i) as <1 will cause that dimension (i-th) to be distributed evenly. The width of the ghost cell layer in each dimension is specified using the array width(). The local data of the global array residing on each processor will have a layer width(n) ghosts cells wide on either side of the visible data along the dimension n.
Return value: .true. means the call was succesful.This is a collective
operation.
logical function nga_create_ghosts_config(type, ndim, dims, width, array_name,
chunk, p_handle, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer width(ndim) - array of ghost cell widths [input]
integer chunk(ndim) - array of chunks, each element specifies
minimum size that given dimensions should be
chunked up into [input]
integer p_handle - processor group handle [input]
integer g_a - integer handle for future references [output]
Creates an ndim-dimensional array with a layer of ghost cells around
the visible data on each processor using the regular distribution model
but with an explicitly specified processor group handle and returns an
integer handle representing the array.
This call is essentially the same as the nga_create_ghosts call except for the processor group handle p_handle. It can be used to create mirrored arrays.
Return value: .true. means the call was succesful.This is a collective
operation.
logical function ga_create_irreg(type, dim1, dim2, array_name, map1, nblock1, map2, nblock2, g_a)Creates an array by following the user-specified distribution.
character*(*) array_name - a unique character string [input]
integer type - MA type [input]
integer dim1/2 - array(dim1,dim2) as in FORTRAN [input]
integer nblock1 - no. of blocks first dimension is divided into [input]
integer nblock2 - no. of blocks second dimension is divided into[input]
integer map1(*) - ilo for each block [input]
integer map2(*) - jlo for each block [input]
integer g_a - integer handle for future references [output]
Return value: .true. means the call was succesful.This is a collective
operation.
logical function nga_create_irreg(type, ndim, dims, array_name, map, nblock, g_a)Creates an array by following the user-specified distribution and returns integer handle representing the array.
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer nblock(ndim) - no. of blocks each dimension is divided into[input]
integer map(s) - starting index for for each block; the size
s is a sum of all elements of nblock array [input]
integer g_a - integer handle for future references [output]
The distribution is specified as a Cartesian product of
distributions
for each dimension. For example, the following figure demonstrates
distribution
of a 2-dimensional array 8x10 on 6
(or more) processors. nblock(2)={3,2}, the size of map
array
is s=5 and array map contains the following elements map={1,3,7,
1,
6}. The distribution is nonuniform because, P1 and P4 get 20
elements
each and processors P0,P2,P3, and P5 only 10 elements each.
|
|
|
|
|
|
|
2 |
|
|
|
4 |
|
|
|
2 |
Return value: .true. means the call was succesful.
This is a collective operation.
logical function nga_create_irreg_config(type, ndim, dims, array_name, map,Creates an array by following the user-specified distribution but with an explicitly specified processor group and returns an integer handle representing the array.
nblock, p_handle, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer nblock(ndim) - no. of blocks each dimension is divided into[input]
integer map(s) - starting index for for each block; the size
s is a sum of all elements of nblock array [input]
integer p_handle - processor group handle [input]
integer g_a - integer handle for future references [output]
This call is essentially the same as the nga_create_irreg call except for the processor group handle p_handle. It can be used to create mirrored arrays.
This is a collective operation.
logical function nga_create_ghosts_irreg(type, ndim, dims, width, array_name,Creates an array with ghost cells by following the user-specified distribution and returns integer handle representing the array.
map, nblock, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer width(ndim) - array of ghost cell widths [input]
integer nblock(ndim) - no. of blocks each dimension is divided into[input]
integer map(s) - starting index for for each block; the size
s is a sum of all elements of nblock array [input]
integer g_a - integer handle for future references [output]
The distribution is specified as a Cartesian product of
distributions
for each dimension. For example, the following figure demonstrates
distribution
of a 2-dimensional array 8x10 on 6
(or more) processors. nblock(2)={3,2}, the size of map
array
is s=5 and array map contains the following elements map={1,3,7,
1,
6}. The distribution is nonuniform because, P1 and P4 get 20
elements
each and processors P0,P2,P3, and P5 only 10 elements each.
|
|
|
|
|
|
|
2 |
|
|
|
4 |
|
|
|
2 |
The array width() is used to control the width of the ghost cell boundary around the visible data on each processor. The local data of the global array residing on each processor will have a layer width(n) ghosts cells wide on either side of the visible data along the dimension n.
Return value: .true. means the call was succesful.
This is a collective operation.
logical function nga_create_ghosts_irreg_config(type, ndim, dims, width,Creates an array with ghost cells by following the user-specified distribution and an explicitly specified processor group and returns an integer handle representing the array.
array_name,map, nblock, p_handle, g_a)
character*(*) array_name - a unique character string [input]
integer type - data type (MT_DBL,MT_INT,MT_DCPL) [input]
integer ndim - number of array dimensions [input]
integer dims(ndim) - array of dimensions [input]
integer width(ndim) - array of ghost cell widths [input]
integer nblock(ndim) - no. of blocks each dimension is divided into[input]
integer map(s) - starting index for for each block; the size
s is a sum of all elements of nblock array [input]
integer p_handle - processor group handle [input]
integer g_a - integer handle for future references [output]
This call is essentially the same as the nga_create_ghosts_irreg call except for the processor group handle p_handle. It can be used to create mirrored arrays.
This is a collective operation.
integer function ga_create_handle()This function returns a global array handle that can then be used to create a new global array. This is part of a new API for creating global arrays that is designed to replace the old interface built around the ga_create_xxx calls. The sequence of operations is to begin with a call to ga_create_handle to get a new array handle. The attributes of the array, such as dimension, size, type, etc. can then be set using successive calls to the nga_set_xxx subroutines. When all array attributes have been set, the ga_allocate subroutine is called and the global array is actually created and memory for it is allocated.
This is a collective operation.
subroutine ga_set_array_name(g_a, name)This subroutine can be used to assign a unique character string name to a global array handle that was obtained using the nga_create_handle function.
integer g_a - global array handle [input]
character*(*) name - a unique character string [input]
This is a collective operation.
subroutine ga_set_data(g_a, ndim, dims, type)This subroutine can be used to set the array dimension, coordinate dimenstions, and data type assigned to a global array handle that was obtained using the nga_create_handlefunction.
integer g_a - global array handle [input]
integer ndim - dimension of array [input]
integer dims(ndim) - array dimensions [input]
integer type - data type (MT_DBL,MT_INT,etc.) [input]
This is a collectiveoperation.
subroutine ga_set_irreg_distr(g_a, mapc, nblock)This subroutine can be used to partition the array data among the individual processors for a global array handle obtained using
integer g_a - global array handle [input]
integer map(s) - starting index for for each
block; the size s is a sum of
all elements of nblock array [input]
integer nblock(ndim) - no. of blocks each dimension is
divided into [input]
The distribution is specified as a Cartesian product of
distributions
for each dimension. For example, the following figure demonstrates
distribution
of a 2-dimensional array 8x10 on 6
(or more) processors. nblock(2)={3,2}, the size of map
array
is s=5 and array map contains the following elements map={1,3,7,
1,
6}. The distribution is nonuniform because, P1 and P4 get 20
elements
each and processors P0,P2,P3, and P5 only 10 elements each.
|
|
|
|
|
|
|
2 |
|
|
|
4 |
|
|
|
2 |
The array width() is used to control the width of the ghost cell boundary around the visible data on each processor. The local data of the global array residing on each processor will have a layer width(n) ghosts cells wide on either side of the visible data along the dimension n.
This is a collective operation.
subroutine ga_set_pgroup(g_a, p_handle)This subroutine can be used to set the processor configuration assigned to a global array handle that was obtained using the ga_create_handle function. It can also be used to create mirrored arrays by using the mirrored array processor configuration in this function call. It can also be used to create an array on a processor group by using a processor group handle in this call.
integer g_a - global array handle [input]
integer p_handle - processor group handle [input]
This is a collective operation.
subroutine ga_set_restricted(g_a, list, nproc)This subroutine restricts data in the global array g_a to only the nproc processors listed in the array list. The value of nproc must be less than or equal to the number of available processors. If this call is used in conjunction with ga_set_irreg_distr, then the decomposition in the ga_set_irreg_distr call must be done assuming the number of processors is equal to nproc. The data that ordinarily will get mapped to process 0 will get mapped to the processor in list(1), the data that would be mapped to process 1 will get mapped to the processor in list(2), etc. This can be used to remap the data distribution to different processors, even if nproc equals the number of available processors.
integer g_a - global array handle [input]
integer list(nproc) - list of processor IDs that
contain data [input]
integer nproc - number of processors that
contain data [input]
This is a collective operation.
subroutine ga_set_restricted_range(g_a, lo_proc, hi_proc)This subroutine restricts data in the global array g_a to the range of processors beginning with lo_proc and ending with hi_proc. Both lo_proc and hi_proc must be less than or equal to the total number of processors minus one (e.g. in the range [0,N-1], where N is the total number of processors) and lo_proc must be less than or equal to hi_proc. If lo_proc=0 and hi_proc=N-1 then this call has no effect on the data distribution.
integer g_a - global array handle [input]
integer lo_proc, hi_proc - range of processors
(inclusive) that contain
data [input]
subroutine ga_set_ghosts(g_a, width)This subroutine can be used to set the ghost cell widths for a global array handle that was obtained using the ga_create_handle function. The ghost cell widths indicate how many ghost cells are used to pad the locally held global array data along each dimension. The padding can be set independently for coordinate direction.
integer g_a - global array handle [input]
integer width(ndim)- array of ghost cell widths [input]
This is a collective operation.
subroutine ga_set_chunk(g_a, chunk)This subroutine is used to set the chunk array for a global array handle obtained using the ga_create_handle function. The chunk array is used to determine the minimum number of array elements that are assigned to each processor along each coordinate direction.
integer g_a - global array handle [input]
integer chunk(ndim)- array of chunk widths [input]
This is a collective operation
subroutine ga_set_block_cyclic(g_a, dims)This subroutine is used to create a global array with a simple block-cyclic data distribution. The array is broken up into blocks of size dims and each block is numbered sequentially using a column major indexing scheme. The blocks are then assigned in a simple round-robin fashion to processors. This is illustrated in the figure below for an array containing 25 blocks distributed on 4 processors. Blocks at the edge of the array may be smaller than the block size specified in dims. In the example below, blocks 4,9,14,19,20,21,22,23, and 24 might be smaller thatn the remaining blocks. Most global array operations are insensitive to whether or not a block-cyclic data distribution is used, although performance may be slower in some cases if the global array is using a block-cyclic data distribution. Individual data blocks can be accessesed using the block-cyclic access functions.
integer g_a - global array handle [input]
integer dims(ndim)- array of block dimensions [input]
| 0 P0 |
5 P1 |
10 P2 |
15 P3 |
20 P0 |
| 1 P1 |
6 P2 |
11 P3 |
16 P0 |
21 P1 |
| 2 P2 |
7 P3 |
12 P0 |
17 P1 |
22 P2 |
| 3 P3 |
8 P0 |
13 P1 |
18 P2 |
23 P3 |
| 4 P0 |
9 P1 |
14 P2 |
19 P3 |
24 P4 |
subroutine ga_set_block_cyclic_proc_grid(g_a, dims, proc_grid)This subroutine is used to create a global array with a SCALAPACK-type block cyclic data distribution. The user specifies the dimensions of the processor grid in the array proc_grid. The product of the processor grid dimensions must equal the number of total number of processors and the number of dimensions in the processor grid must be the same as the number of dimensions in the global array. The data blocks are mapped onto the processor grid in a cyclic manner along each of the processor grid axes. This is illustrated below for an array consisting of 25 data blocks disributed on 6 processors. The 6 processors are configured in a 3 by 2 processor grid. Blocks at the edge of the array may be smaller than the block size specified in dims. Most global array operations are insensitive to whether or not a block-cyclic data distribution is used, although performance may be slower in some cases if the global array is using a block-cyclic data distribution. Individual data blocks can be accessesed using the block-cyclic access functions.
integer g_a - global array handle [input]
integer dims(ndim) - array of block dimensions [input]
integer proc_grid(ndim)- processor grid dimensions [input]
| (0,0) P0 |
(0,1) P3 |
(0,0) P0 |
(0,1) P3 |
(0,0) P0 |
| (1,0) P1 |
(1,1) P4 |
(1,0) P1 |
(1,1) P4 |
(1,0) P1 |
| (2,0) P2 |
(2,1) P5 |
(2,0) P2 |
(2,1) P5 |
(2,0) P2 |
| (0,0) P0 |
(0,1) P3 |
(0,0) P0 |
(0,1) P3 |
(0,0) P0 |
| (1,0) P1 |
(1,1) P4 |
(1,0) P1 |
(1,1) P4 |
(1.0) P1 |
logical function ga_allocate(g_a)This function allocates memory for the global array handle originally obtained using the ga_create_handle function and returns true if the allocation was successful. At a minimum, the nga_set_data subroutine must be called before the memory is allocated. Other nga_set_xxx functions can also be called before invoking this function.
integer g_a - global array handle [input]
This is a collective operation.
subroutine ga_update_ghosts(g_a)This call updates the ghost cell regions on each processor with the corresponding neighbor data from other processors. The operation assumes that all data is wrapped around using periodic boundary data so that ghost cell data that goes beyound an array boundary is wrapped around to the other end of the array.
integer g_a [input]
This is a collective operation.
logical function nga_update_ghost_dir(g_a,dimension,idir,cflag)This function can be used to update the ghost cells along individual directions. It is designed for algorithms that can overlap updates with computation. The variable dimension indicates which coordinate direction is to be updated (e.g. dimension = 2 would correspond to the y axis in a two or three dimensional system), the variable idir can take the values +/-1 and indicates whether the side that is to be updated lies in the positive or negative direction, and cflag indicates whether or not the corners on the side being updated are to be included in the update. The following calls would be equivalent to a call to ga_update_ghosts for a 2-dimensional system:
integer g_a [input]
integer dimension - array dimension that is to be
updated [input]
integer idir - direction of update (+/- 1) [input]
logical cflag - flag to include corners in update
[input]
status = nga_update_ghost_dir(g_a,1,-1,.true.)The variable cflag is set equal to .true. in the first two calls so that the corner ghost cells are update, it is set equal to .false. in the second two calls to avoid redundant updates of the corners. Note that updating the ghosts cells using several independent calls to the nga_update_ghost_dir functions is generally not as efficient as using ga_update_ghosts unless the individual calls can be effectively overlapped with computation.
status = nga_update_ghost_dir(g_a,1,1,.true.)
status = nga_update_ghost_dir(g_a,2,-1,.false.)
status = nga_update_ghost_dir(g_a,2,1,.false.)
This is a collective operation.
logical function ga_has_ghosts(g_a)This function returns true if the global array has some dimensions for which the ghost cell width is greater than zero, it returns false otherwise.
integer g_a [input]
subroutine nga_access_ghosts(g_a, dims, index, ld)
integer g_a [input]
integer dims(ndim) - array of dimensions of local
patch, including ghost
cells [output]
integer index - returns an index corresponding
to the origin the global array
patch held locally on the
processor [output]
integer ld(ndim) - physical dimenstions of the
local array patch, including
ghost cells [output]
Provides access to the local patch of the global array. Returns
leading dimension ld and and MA-like index for the data. This
routine
will provide access to the ghost cell data residing on each processor.
Calls to
nga_access_ghosts should normally follow a call to ga_distribution
that returns coordinates of the visible data patch associated with a
processor.
You need to make sure that the coordinates of the patch are valid (test
values returned from nga_distribution).
Your code should include a MA include file, mafdecls.h. The addressing convention:
refers the first element of the patch. However, you can only pass that reference to another subroutine where it could be used like a normal array, see the following example. This constraint caused by the HP fortran compiler inability to reference shared memory data properly. The C interface has no such restrictions.
dbl_mb(index) - for double precision data
int_mb(index) - for integer data
dcpl_mb(index) - for double complex data
Example
For a given global array in 2 dimensions with a local data patch whose characteristics are
number of rows of visible data : irow
number of columns of visible data : jcol
width of ghost cell data in first dimension: iwidth
width of ghost cell data in second dimension: jwidth
a call to ga_access_ghosts returns
dims(1) = irow + 2*iwidthAfter choosing the variables
dims(2) = jrow + 2*jwidth
ld(1) = irow + 2*iwidth
nrows = dims(1)the ghost cell data can be accessed creating a dummy routine
ncols = dims(2)
lda = ld(1)
subroutine foo(A, nrows, ncols lda)you can reference A(1:irow+2*iwidth,1:jrow+2*jwidth) in the following way:
double precision A(lda,*)
integer nrows, ncols
....
end
call foo(dbl_mb(index), nrows, ncols, lda)It will be up to the programmer to keep track of the fact that the visible portion of the global array data corresponds to the ranges (iwidth+1:irow+iwidth,jwidth+1:jcol+jwidth). Correctly accessing this data may require additional arguments in the call to foo.
NOTE: You have to worry about mutual exclusion in simultaneous overlapping access to the data by multiple processors.
Each call to ga_access_ghosts has to be followed by a call
to
either ga_release or ga_release_update.
You can access only local data.
This is a local operation.
subroutine nga_access_ghost_element(g_a, index, subscript, ld)This function can be used to return an index to any data element in the locally held portion of the global array and can be used to directly access ghost cell data. The array subscript refers to the local index of the element relative to the origin of the local patch (which is assumed to be indexed by (1,1,...)).
integer g_a [input]
integer index - index pointing to location of element
indexed by subscript() [output]
integer subscript(ndim) - array of integers that index desired
element [input]
integer ld(ndim-1) - array of strides for local data patch.
These include ghost cell widths.
[output]
integer function ga_total_blocks(g_a)This function returns the total number of blocks contained in a global array with a block-cyclic data distribution. This is a local operation.
integer g_a [input]
subroutine ga_get_block_info(g_a, num_blocks, block_dims)This subroutine returns information about the block-cyclic distribution associated with global array g_a. The number of blocks along each of the array axes are returned in the array num_blocks and the dimensions of the individual blocks, specified in the ga_set_block_cyclic or ga_set_block_cyclic_proc_grid subroutine, are returned in block_dims. This is a local function.
integer g_a [input]
integer num_blocks(ndim) - number of blocks along each axis [output]
integer block_dims(ndim) - dimensions of block [output]
logical function ga_duplicate(g_a, g_b, array_name)Creates a new array by applying all the properties of another existing array.
integer g_a, g_b;
character*(*) array_name;
array_name - a character string [input]
g_a - Integer handle for reference array [input]
g_b - Integer handle for new array [output]
logical function ga_destroy(g_a)Deallocates the array and frees any associated resources.
integer g_a [input]
subroutine ga_terminate()Delete all active arrays and destroy internal data structures.
subroutine ga_sync()Synchronize processes (a barrier) and ensure that all global array operations are complete (or at least appear complete).
subroutine ga_mask_sync(first,last)
logical first - mask for prior internal synchronization [input]This subroutine can be used to remove synchronization calls from around collective operations. Setting the parameter first = .false. removes the synchronization prior to the collective operation, setting last = .false. removes the synchronization call after the collective operation. This call is applicable to all collective operations. It most be invoked before each collective operation.
logical last - mask for post internal synchronization [input]
subroutine ga_zero(g_a)Sets value of all elements in the array to zero.
integer g_a [input]
subroutine ga_fill(g_a, s)Assign a single value to all elements in the array.
integer g_a [input]
double precision/complex/integer s [input]
double precision function ga_ddot(g_a, g_b)Computes the element-wise dot product of the two arrays which must be double precision, the same shape and identically aligned:
integer g_a, g_b [input]
ga_ddot = SUM_ij a(i,j)*b(i,j)This is a collective operation.
double complex function ga_zdot(g_a, g_b)Computes the element-wise dot product of the two arrays which must be double complex, the same shape and identically aligned:
integer g_a, g_b [input]
ga_zdot = SUM_ij a(i,j)*b(i,j)This is a collective operation.
double precision function ga_ddot_patch(g_a, ta, ailo, aihi, ajlo, ajhi,Computes the element-wise dot product of the two (possibly transposed) patches which must be double precision and have the same number of elements.
g_b, tb, bilo, bihi, bjlo, bjhi)
integer g_a, g_b [input]
integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates
character*1 ta, tb [input] transpose flags
double precision function nga_ddot_patch(g_a, ta, alo, ahi,Computes the element-wise dot product of the two (possibly transposed) patches which must be double precision and have the same number of elements.
g_b, tb, bio, bhi)
integer g_a, g_b [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
integer blo(ndim), bhi(ndim) [input] g_b patch coordinates
character*1 ta, tb [input] transpose flags
double complex function ga_zdot_patch(g_a, ta, ailo, aihi, ajlo, ajhi,Computes the element-wise dot product of the two (possibly transposed) patches which must be double complex and have the same number of elements.
g_b, tb, bilo, bihi, bjlo, bjhi)
integer g_a, g_b [input]
integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates
character*1 ta, tb [input] transpose flags
double complex function nga_zdot_patch(g_a, ta, alo, ahi,Computes the element-wise dot product of the two (possibly transposed) patches which must be double complex and have the same number of elements.
g_b, tb, blo, bhi)
integer g_a, g_b [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
integer blo(ndim), bhi(ndim) [input] g_b patch coordinates
character*1 ta, tb [input] transpose flags
subroutine ga_matmul_patch(transa, transb, alpha, beta,ga_matmul_patch is a patch version of ga_dgemm:
g_a, ailo, aihi, ajlo, ajhi,
g_b, bilo, bihi, bjlo, bjhi,
g_c, cilo, cihi, cjlo, cjhi)
integer g_a, ailo, aihi, ajlo, ajhi [input] patch of g_a
integer g_b, bilo, bihi, bjlo, bjhi [input] patch of g_b
integer g_c, cilo, cihi, cjlo, cjhi [input] patch of g_c
double precision/complex alpha, beta [input]
character*1 transa, transb [input]
C[cilo:cihi,cjlo:cjhi] := alpha* AA[ailo:aihi,ajlo:ajhi] *where AA = op(A), BB = op(B), and op( X ) is one of
BB[bilo:bihi,bjlo:bjhi] ) + beta*C[cilo:cihi,cjlo:cjhi],
op( X ) = X or op( X ) = X',Valid values for transpose arguments: 'n', 'N', 't', 'T'. It works for both double precision and double complex data tape.
subroutine nga_matmul_patch(transa, transb, alpha, beta,nga_matmul_patch is a n-dimensional patch version of ga_dgemm:
g_a, alo, ahi,
g_b, blo, bhi,
g_c, clo, chi)
integer g_a, alo, ahi [input] patch of g_a
integer g_b, blo, bhi [input] patch of g_b
integer g_c, clo, chi [input] patch of g_c
double precision/complex alpha, beta [input]
character*1 transa, transb [input]
C[clo[]:chi[]] := alpha* AA[alo[]:ahi[]] *where AA = op(A), BB = op(B), and op( X ) is one of
BB[blo[]:bhi[]] ) + beta*C[clo[]:chi[]],
op( X ) = X or op( X ) = X',Valid values for transpose arguments: 'n', 'N', 't', 'T'. It works for both double precision and double complex data tape.
subroutine ga_scale(g_a, s)Scales an array by the constant s.
integer g_a [input]
double precision/complex/integer s [input]
subroutine nga_zero_patch(g_a, alo, ahi)Set all the elements in the patch to zero.
integer g_a [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
subroutine ga_scale_patch(g_a, ailo, aihi, ajlo, ajhi, s)Scales a patch of an array by the constant s.
integer g_a [input]
double precision/complex/integer s [input]
integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
subroutine nga_scale_patch(g_a, alo, ahi, s)Scales a patch of an array by the constant s.
integer g_a [input]
double precision/complex/integer s [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
subroutine ga_add(alpha, g_a, beta, g_b, g_c)The arrays (which must be the same shape and identically aligned) are added together element-wise
integer g_a, g_b, g_c [input]
double precision/complex/integer alpha, beta [input]
c = alpha * a + beta * b.The result (c) may replace one of the input arrays (a/b).
subroutine ga_add_patch (alpha, g_a, ailo, aihi, ajlo, ajhi,Patches of arrays (which must have the same number of elements) are added together element-wise.
beta, g_b, bilo, bihi, bjlo, bjhi,
g_c, cilo, cihi, cjlo, cjhi)
integer g_a, g_b, g_c [input]
double precision/complex/integer alpha, beta [input]
integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates
integer cilo, cihi, cjlo, cjhi [input] g_c patch coordinates
subroutine nga_add_patch (alpha, g_a, alo, ahi, beta, g_b, blo, bhiPatches of arrays (which must have the same number of elements) are added together element-wise.
g_c, clo, chi)
integer g_a, g_b, g_c [input]
double precision/complex/integer alpha, beta [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
integer blo(ndim), bhi(ndim) [input] g_b patch coordinates
integer clo(ndim), chi(ndim) [input] g_c patch coordinates
This is a collective operation.
subroutine ga_fill_patch(g_a, ailo, aihi, ajlo, ajhi, s)Fills the patch of an array with value s. Type of s and array must match.
integer g_a [input]
double precision/complex/integer s [input]
integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
This is a collective operation.
subroutine nga_fill_patch(g_a, alo, ahi, s)
integer g_a [input]
double precision/complex/integer s [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
Fills the patch of an array with value s. Type of s
and array must match.
This is a collective operation.
subroutine nga_select_elem(g_a, op, val, index)
integer g_a - array handle Control [input]Returns the value and index for an element that is selected by the specified operator in a global array corresponding to g_a handle.
character* op - operator {'min','max'} [input]
val - address where selected value should be stored [output]
index[ndim] - array index for the selected element [output]
subroutine ga_summarize(verbose)Prints info about allocated arrays.
logical verbose [input]! If true print
distribution info
subroutine ga_symmetrize(g_a)Symmetrizes matrix A represented with handle g_a: A:= .5 * (A+A').
integer g_a [input/output]
subroutine ga_transpose(g_a, g_b)Transposes a matrix: B = A', where A and B are represented by handles g_a and g_b.
integer g_a [input] ! remains unchanged
integer g_b [output]
subroutine ga_diag(g_a, g_s, g_v, eval)Solve the generalized eigen-value problem returning all eigen-vectors and values in ascending order. The input matrices are not overwritten or destroyed.
integer g_a [input] ! Matrix to diagonalize
integer g_s [input] ! Metric
integer g_v [output] ! Global matrix to return evecs
double precision eval(*) [output] ! Local array to return evals
subroutine ga_diag_reuse(control, g_a, g_s, g_v, eval)Solve the generalized eigen-value problem returning all eigen-vectors and values in ascending order. Recommended for REPEATED calls if g_s is unchanged. Values of the control flag:
integer control [input] ! Control flag
integer g_a [input] ! Matrix to diagonalize
integer g_s [input] ! Metric
integer g_v [output] ! Global matrix to return evecs
double precision eval(*) [output] ! Local array to return evals
value action/purposeThe input matrices are not destroyed.
0 indicates first call to the eigensolver
>0 consecutive calls (reuses factored g_s)
<0 only erases factorized g_s; g_v and eval unchanged
(should be called after previous use if another
eigenproblem, i.e., different g_a and g_s, is to
be solved)
subroutine ga_diag_std(g_a, g_v, eval)Solve the standard (non-generalized) eigenvalue problem returning all eigenvectors and values in the ascending order. The input matrix is neither overwritten nor destroyed.
integer g_a [input] ! Matrix to diagonalize
integer g_v [output] ! Global matrix to return evecs
double precision eval(*) [output] ! Local array to return evals
This is a collective operation.
subroutine ga_lu_solve(trans, g_a, g_b)Solve the system of linear equations op(A)X = B based on the LU factorization.
character trans [input] ! transpose or not transpose
integer g_a [input] ! coefficient matrix
integer g_b [output/output] ! rhs/solution matrix
op(A) = A or A' depending on the parameter trans:
This is a collective operation.
integer function ga_spd_invert(g_a)It computes the inverse of a double precision using the Cholesky factorization of a NxN double precision symmetric positive definite matrix A stored in the global array represented by g_a. On successful exit, A will contain the inverse.
integer g_a [input/output] ! matrix
It returns
= 0 : successful exitThis is a collective operation.
> 0 : the leading minor of this order is not positive
definite and the factorization could not be completed
< 0 : it returns the index i of the (i,i)
element of the factor L/U that is zero and
the inverse could not be computed
integer function ga_llt_solve(g_a, g_b)Solves a system of linear equations
integer g_a [input] ! coefficient matrix
integer g_b [output/output] ! rhs/solution matrix
A * X = Busing the Cholesky factorization of an NxN double precision symmetric positive definite matrix A (epresented by handle g_a). On successful exit B will contain the solution X. It returns:
= 0 : successful exitThis is a collective operation.
> 0 : the leading minor of this order is not positive
definite and the factorization could
not be completed
integer function ga_solve(g_a, g_b)Solves a system of linear equations
integer g_a [input] ! coefficient matrix
integer g_b [output/output] ! rhs/solution matrix
A * X = BIt first will call the Cholesky factorization routine and, if sucessfully, will solve the system with the Cholesky solver. If Cholesky will be not be able to factorize A, then it will call the LU factorization routine and will solve the system with forward/backward substitution. On exit B will contain the solution X.
It returns
= 0 : Cholesky factoriztion was succesfulThis is a collective operation.
> 0 : the leading minor of this order
is not positive definite, Cholesky factorization
could not be completed and LU factoriztion was used
Performs one of the matrix-matrix operations:
subroutine ga_dgemm(transa, transb, m, n, k, alpha, g_a, g_b, beta, g_c )
subroutine ga_sgemm(transa, transb, m, n, k, alpha, g_a, g_b, beta, g_c )
subroutine ga_zgemm(transa, transb, m, n, k, alpha, g_a, g_b, beta, g_c )
Character*1 transa, transb [input]
Integer m, n, k [input]
Double Precision alpha, beta [input] (DGEMM)
Double Complex alpha, beta [input] (ZGEMM)
Integer g_a, g_b, [input]
Integer g_c [output]
C := alpha*op( A )*op( B ) + beta*C,where op( X ) is one of
op( X ) = X or op( X ) = X',alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
On entry, transa specifies the form of op( A ) to be used in the matrix multiplication as follows:
transa = 'N' or 'n', op( A ) = A.This is a collective operation.
transa = 'T' or 't', op( A ) = A'.
m - On entry, m specifies the number of rows of the matrix
op( A ) and of the matrix C. m must be at least zero.
n - On entry, n specifies the number of columns of the matrix
op( B ) and the number of columns of the matrix C. n must be
at least zero.
k - On entry, k specifies the number of columns of the matrix
op( A ) and the number of rows of the matrix op( B ). K must
be at least zero.
subroutine ga_copy(g_a, g_b)Copies elements in array represented by g_a into the array represented by g_b. The arrays must be the same type, shape, and identically aligned.
integer g_a, g_b [input]
This is a collective operation.
subroutine ga_copy_patch(trans, g_a, ailo, aihi, ajlo, ajhi,Copies elements in a patch of one array into another one. The patches of arrays should not overloap (when g_a=g_b) and must have the same number of elements.
g_b, bilo, bihi, bjlo, bjhi)
character trans [input] transpose operator
integer g_a, g_b [input]
integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates
This is a collective operation.
subroutine nga_copy_patch(trans, g_a, alo, ahi, g_b, blo, bhi)Copies elements in a patch of one array into another one. The patches of arrays should not overloap (when g_a=g_b) and must have the same number of elements.
character trans [input] transpose operator
integer g_a, g_b [input]
integer ndim number of dimensions
integer alo(ndim), ahi(ndim) [input] g_a patch coordinates
integer blo(ndim), bhi(ndim) [input] g_b patch coordinates
subroutine ga_get(g_a, ilo, ihi, jlo, jhi, buf, ld)Performs the equivalent of the following Fortran-90.
integer g_a [input]
integer ilo, ihi, jlo, jhi [input]
double precision/complex/integer buf [output]
integer ld [input]
dimension A(1:dim1,1:dim2)where g_a represents a handle to the array A. Array bounds are always checked.
dimension buf(1:ld, 1:*)
buf(1:ihi-ilo+1, 1:jhi-jlo+1) = A(ilo:ihi, jlo:jhi)
subroutine nga_get(g_a, lo, hi, buf, ld)
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
<type> buf - local buffer array where the data goes to [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
Copies data from global array section to the local array buffer.
The
local array is assumed to be have the same number of dimensions
as the global array. Any detected inconsitencies/errors in the input
arguments
are fatal.
Example:
For ga_get operation transfering data from the (11:15,1:5) section
of 2-dimensional 15x10 global array into local buffer 5x10 array we
have:
lo={11,1}, hi={15,5}, ld={10}
| 15 | ||||
|
5
|
||||
| 10 | 10 | |||
This is a one-sided/independent operation.
Same as nga_get except the indices can extend beyond the array
boundary/dimensions
in which case the library wraps them around.
This is a one-sided/independent operation.
subroutine nga_strided_get(g_a, lo, hi, skip, buf, ld)This operation is the same as nga_get, except that the values corresponding to dimension n in buf correspond to every skip(n) values of the global array g_a. This is a onesided/independent operation.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
integer skip(ndim) - array of strides for each dimension [input]
<type> buf - local buffer array where the data comes from [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
subroutine ga_put(g_a, ilo, ihi, jlo, jhi, buf, ld)where g_a represents a handle to the array A. Array bounds are always checked.
integer g_a [input]
integer ilo, ihi, jlo, jhi [input]
double precision/complex/integer buf [output]
integer ld [input]
Performs the equivalent of the following Fortran-90:
dimension g_a(1:dim1,1:dim2)
dimension buf(1:ld, 1:*)
g_a(ilo:ihi, jlo:jhi) = buf(1:ihi-ilo+1, 1:jhi-jlo+1)
Copies data from local array buffer to the global array section . The local array is assumed to be have the same number of dimensions as the global array. Any detected inconsitencies/errors in input arguments are fatal.
subroutine nga_put(g_a, lo, hi, buf, ld)
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
<type> buf - local buffer array where the data comes from [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
subroutine nga_periodic_put(g_a, lo, hi, buf, ld)Same as nga_put except the indices can extend beyond the array boundary/dimensions in which case the library wraps them around.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
<type> buf - local buffer array where the data comes from [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
subroutine nga_strided_put(g_a, lo, hi, skip, buf, ld)This operation is the same as nga_put, except that the values corresponding to dimension n in buf are copied to every skip(n) value of the global array g_a. This is a onesided/independent operation.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
integer skip(ndim) - array of strides for each dimension [input]
<type> buf - local buffer array where the data comes from [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
subroutine ga_acc(g_a, ilo, ihi, jlo, jhi, buf, ld, alpha)Performs the equivalent of the following Fortran-90. Array bounds are always checked. Double precision/complex types are supported. The operation is atomic.
integer g_a [input]
integer ilo, ihi, jlo, jhi [input]
double precision/complex buf [input]
integer ld [input]
double precision/complex alpha [input]
dimension g_a(1:dim1,1:dim2)where g_a represents a handle to the array A.
dimension buf(1:ld, 1:*)
A(ilo:ihi, jlo:jhi) = A(ilo:ihi, jlo:jhi) +
alpha*buf(1:ihi-ilo+1, 1:jhi-jlo+1)
This is a one-sided/independent and atomic operation.
Combines data from local array buffer with data in the global array section. The local array is assumed to be have the same number of dimensions as the global array.
subroutine nga_acc(g_a, lo, hi, buf, ld, alpha)
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
<type> buf - local buffer array where the local data is [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
<type> alpha - scale argument for accumulate [input]
global array section (lo(),hi()) += alpha * buffer()This is a one-sided/independent and atomic operation.
subroutine nga_periodic_acc(g_a, lo, hi, buf, ld, alpha)Same as nga_acc except the indices can extend beyond the array boundary/dimensions in which case the library wraps them around.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
<type> buf - local buffer array where the local data is [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
<type> alpha - scale argument for accumulate [input]
subroutine nga_strided_acc(g_a, lo, hi, skip, buf, ld, alpha)This operation is the same as nga_acc, except that the values corresponding to dimension n in buf are accumulated to every skip(n) value of the global array g_a. This is a onesided/independent operation.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo(ndim) - array of starting indices for global array section [input]
integer hi(ndim) - array of ending indices for global array section [input]
integer skip(ndim) - array of strides for each dimension [input]
<type> buf - local buffer array where the data comes from [output]
integer ld(ndim-1) - array specifying leading dimensions for buffer array [input]
<type> alpha - scale argument for accumulate [input]
subroutine ga_distribution(g_a, iproc, ilo, ihi, jlo, jhi)Returns the range of a global array held by a specified process. If no array elements are owned by process iproc, the range is returned as [0, -1] for the i and [0,-1] for thej dimensions.
integer g_a - array handle [input]
integer iproc - process number [input]
integer ilo,ihi,jlo,jhi - range held by process iproc [output]
subroutine nga_distribution(g_a, iproc, lo, hi)Returns the range of a global array held by a specified process. If no array elements are owned by process iproc, the range is returned as lo(i) =0 and hi(i)= -1 for i=1,ndim dimensions. This is the n-dimensional version of ga_distribution.
integer g_a - array handle [input]
integer iproc - process number [input]
integer ndim - number of dimensions
integer lo(ndim),hi(ndim) - range held by process iproc [output]
This is a local operation.
logical function ga_compare_distr(g_a, g_b)Compares distributions of two global arrays. Returns .TRUE. if distributions are identical.
integer g_a, g_b [input]
subroutine ga_access(g_a, ilo, ihi, jlo, jhi, index, ld)Provides access to the specified patch of array. Returns leading dimension ld and and MA-like index for the data. This routine is intended for writing new GA operations. Call to ga_access should normally follow a call to ga_distribution that returns coordinates of the patch associated with a processor. You need to make sure that the coordinates of the patch are valid (test values returned from ga_distribution).
integer g_a [input]
integer ilo, ihi, jlo, jhi [input]
integer index [output]
integer ld [output]
Your code should include a MA include file, mafdecls.h. The addressing convention:
refers the first element (ilo,jlo) of the patch. However, you can only pass that reference to another subroutine where it could be used like a normal array, see the following example. This constraint caused by the HP fortran compiler inability to reference shared memory data properly. The C interface has no such restrictions.
dbl_mb(index) - for double precision data
int_mb(index) - for integer data
dcpl_mb(index) - for double complex data
Example
For a given subroutine:
subroutine foo(A, nrows, ncols lda)you can reference A(ilo:ihi,jlo:jhi) in the following way:
double precision A(lda,*)
integer nrows, ncols
....
end
call foo(dbl_mb(index), ihi-ilo+1, jhi-jlo+1, lda)NOTE: You have to worry about mutual exclusion in simultaneous overlapping access to the data by multiple processors.
Each call to ga_access has to be followed by a call to
either
ga_release
or ga_release_update. You can access
only
local
data.
This is a local operation.
subroutine nga_access(g_a, lo, hi, index, ld)An n-dimensional version of ga_access. This is a local operation.
integer g_a array handle [input]
integer ndim number of array dimensions
integer lo(ndim),hi(ndim) patch specification [input]
integer index reference to local data [output]
integer ld(ndim-1) array of leading dimensions [output]
subroutine nga_access_block_segment(g_a, proc, index, len)This function can be used to gain access to the all the locally held data on a particular processor that is associated with a block-cyclic distributed array. Once the index has been returned, local data can be accessed as described in the documentation for ga_access. The parameter len is the number of data elements that are held locally. The data inside this segment has a lot of additional structure so this function is not generally useful to developers. It is primarily used inside the GA library to implement other GA routines. Each call to ga_access_block_segment should be followed by a call to either nga_release_block_segment or nga_release_update_block_segment.
integer g_a array handle [input]
integer proc processor ID [input]
integer index reference to local data [output]
integer len length of data on processor [output]
subroutine nga_access_block(g_a, idx, index, ld)This function can be used to gain direct access to the data represented by a single block in a global array with a block-cyclic data distribution. The index idx is the index of the block in the array assuming that blocks are numbered sequentially in a column-major order. A quick way of determining whether a block with index idx is held locally on a processor is to calculate whether mod(idx,nproc) equals the processor ID, where nproc is the total number of processors. Once the index has been returned, local data can be accessed as described in the documentation for ga_access. Each call to ga_access_block should be followed by a call to either nga_release_block or nga_release_update_block.
integer g_a array handle [input]
integer ndim number of array dimensions
integer idx block index [input]
integer index reference to local data [output]
integer ld(ndim-1) array of leading dimensions [output]
subroutine nga_access_block_grid(g_a, subscript, index, ld)This function can be used to gain direct access to the data represented by a single block in a global array with a SCALAPACK block-cyclic data distribution that is based on an underlying processor grid. The subscript array contains the subscript of the block in the array of blocks. This subscript is based on the location of the block in a grid, each of whose dimensions is equal to the number of blocks that fit along that dimension. Once the index has been returned, local data can be accessed as described in the documentation for ga_access. Each call to ga_access_block_grid should be followed by a call to either nga_release_block_grid or nga_release_update_block_grid.
integer g_a array handle [input]
integer ndim number of array dimensions
integer subscript(ndim) subscript of block in array [input]
integer index reference to local data [output]
integer ld(ndim-1) array of leading dimensions [output]
subroutine ga_release(g_a, ilo, ihi, jlo, jhi)Releases access to a global array when the data was read only.
integer g_a [input]
integer ilo, ihi, jlo, jhi [input]
Your code should look like:
call ga_distribution(g_a, myproc, ilo,ihi,jlo,jhi)NOTE: see restrictions specified for ga_access. This is a local operation.
call ga_access(g_a, ilo, ihi, jlo, jhi, index, ld)
< operate on the data >
call ga_release(g_a, ilo, ihi, jlo, jhi)
subroutine nga_release(g_a, lo, hi)Releases access to local elements of global array when the data was accessed as read only.This is a local operation.
integer g_a array handle [input]
integer ndim number of array dimensions
integer lo(ndim),hi(ndim) patch specification [input]
subroutine ga_release_update(g_a, ilo, ihi, jlo, jhi)Releases access to the data. It must be used if the data was accessed for writing. NOTE: see restrictions specified for ga_access.
integer g_a [input]
integer ilo, ihi, jlo, jhi [input]
subroutine nga_release_update(g_a, lo, hi)Releases access to local elements of global array when the data was accessed in read -write mode.
integer g_a array handle [input]
integer ndim number of array dimensions
integer lo(ndim),hi(ndim) patch specification [input]
subroutine nga_release_block(g_a, index)Releases access to the block of data specified by the integer index when data was accessed as read only. This is only applicable to block-cyclic data distributions created using the simple block-cyclic distribution. This is a local operation.
integer g_a array handle [input]
integer index block index [input]
subroutine nga_release_update_block(g_a, index)Releases access to the block of data specified by the integer index when data was accessed in read-write mode. This is only applicable to block-cyclic data distributions created using the simple block-cyclic distribution. This is a local operation.
integer g_a array handle [input]
integer index block index [input]
subroutine nga_release_block_grid(g_a, subscript)Releases access to the block of data specified by the subscript array when data was accessed as read only. This is only applicable to block-cyclic data distributions created using the SCALAPACK data distribution. This is a local operation.
integer g_a array handle [input]
integer subscript(ndim) indices of block in array [input]
subroutine nga_release_update_block_grid(g_a, subscript)Releases access to the block of data specified by the subscript array when data was accessed in read-write mode. This is only applicable to block-cyclic data distributions created using the SCALAPACK data distribution. This is a local operation.
integer g_a array handle [input]
integer subscript(ndim) indices of block in array [input]
subroutine nga_release_block_segment(g_a, iproc)Releases access to the block of locally held data for a block-cyclic array, when data was accessed as read-only. This is a local operation.
integer g_a array handle [input]
integer iproc processor ID [input]
subroutine nga_release_update_block_segment(g_a, iproc)Releases access to the block of locally held data for a block-cyclic array, when data was accessed in read-write mode. This is a local operation.
integer g_a array handle [input]
integer iproc processor ID [input]
subroutine nga_release_ghost_element(g_a, subscript)Releases access to the locally held data for an array with ghost elements, when data was accessed as read-only. This is a local operation.
integer g_a array handle [input]
integer subscript(ndim) element subscript [input]
subroutine nga_release_update_ghost_element(g_a, subscript)Releases access to the locally held data for an array with ghost elements, when data was accessed in read-write mode. This is a local operation.
integer g_a array handle [input]
integer subscript(ndim) element subscript [input]
subroutine nga_release_ghosts(g_a)Releases access to the locally held block of data containing ghost elements, when data was accessed as read-only. This is a local operation.
integer g_a array handle [input]
subroutine nga_release_update_ghosts(g_a)Releases access to the locally held block of data containing ghost elements, when data was accessed in read-write mode. This is a local operation.
integer g_a array handle [input]
integer function ga_read_inc(g_a, i, j, inc)Atomically read and increment an element in an integer array.
integer g_a [input]
integer i, j, inc [input]
*BEGIN CRITICAL SECTION*
return value = a(i,j)
a(i,j) += inc
*END CRITICAL SECTION*
This is a one-sided/independent operation.
integer function nga_read_inc(g_a, subscript, inc)Atomically read and increment an element in an integer array. An n-dimensional version of ga_read_inc.
integer g_a [input]
integer i, j, inc [input]
This is a one-sided/independent operation.
subroutine ga_scatter(g_a, v, i, j, n)Scatters the array elements into the array. The contents of the input arrays (v, i, j) are preserved, but their contents might be (consistently) shuffled on return.
integer g_a [input]
double precision v(n) [input]
integer i(n), j(n), n [input]
do k = 1, nvThis is a one-sided/independent operation.
a(i(k),j(k)) = v(k)
enddo
subroutine nga_scatter(g_a, v, subsArray, n)
integer g_a - global array handle [input]
integer n - number of elements [input]
<type> v(n) - array containing values [input]
integer ndim - number of array dimensions
subsArray (ndim,n) - array of subscripts for each element [input]
Scatters array elements into a global array. The contents of the
input arrays (v,subscrArray) are preserved, but their contents
might
be (consistently) shuffled on return.
do k = 1, nThis is a one-sided/independent operation.
a(subsArray(1,k),subsArray(2,k), ...) = v(k)
enddo
subroutine ga_gather(g_a, v, i, j, n)Gathers specified elements (i,j) from the global array into a local single-dimesional array. The contents of the arrays (v, i, j) might be (consistently) shuffled on return.
integer g_a [input]
double precision v(n) [output]
integer i(n), j(n), n [input]
do k = 1, nvThis is a one-sided/independent operation.
v(k) = a(i(k),j(k))
enddo
Gathers array elements from a global array into a local array. The contents of the input arrays (v, subscrArray) are preserved, but their contents might be (consistently) shuffled on return.
subroutine nga_gather(g_a, v, subsArray, n)
integer g_a - global array handle [input] [output]
integer n - number of elements [input]
<type> v(n) - array containing values [output]
integer ndim - number of array dimensions
subsArray (ndim,n) - array of subscripts for each element [input]
do k = 1, nThis is a one-sided/independent operation.
v(k) = a(subsArray(1,k),subsArray(2,k), ...)
enddo
subroutine ga_scatter_acc(g_a, v, i, j, n, alpha)Scatters array elements from a local array into a global array. Adds values from the local array to existing values in the global array after multiplying by alpha. The contents of the arrays v, i, j may be (consistently) shuffled on return.
integer g_a - global array handle [input]
integer n - number of elements [input]
<type> v(n) - array containing value [input]
integer i(n),j(n) - arrays of indices [input]
double precision/complex alpha - multiplicative value [input]
subroutine nga_scatter_acc(g_a, v, subsArray, n, alpha)Scatters array elements from a local array into a global array. Adds values from the local array to existing values in the global array after multiplying by alpha. The contents of v and subsArray may be (consistently) shuffled on return.
integer g_a - global array handle [input]
integer n - number of elements [input]
<type> v(n) - array containing value [input]
integer ndim - number of array dimensions
subsArray(ndim,n) - array of subscripts [input]
double precision/complex alpha - multiplicative value [input]
subroutine ga_error(message, code)To be called in case of an error. Print an error message and an integer value that is intended to be the error code. Releases some system resources. This is the recomended way of aborting the program execution.
character*1 message(*) [input]
integer code [input]
logical function ga_locate(g_a, i, j, owner)Return in owner the GA compute process id that 'owns' the data. If i/j are out of bounds .FALSE. is returned, otherwise .TRUE..
integer g_a array handle [input]
integer i, j element subscript [input]
integer owner process id [output]
logical function nga_locate(g_a, subscript, owner)Return in owner the GA compute process id that 'owns' the array element. If subscript contains values out of bounds .FALSE. is returned, otherwise .TRUE.
integer g_a array handle [input]
integer subscript element subscript [input]
integer owner process id [output]
logical function ga_locate_region(g_a, ilo, ihi, jlo, jhi, map, np )Parts of the specified patch might be actually 'owned' by several (precisely np) processes. Return the list of the GA processes id that 'own' the data. If i/j are out of bounds .FALSE. is returned, otherwise .TRUE..
integer g_a, ilo, ihi, jlo, jhi [input]
integer map(5,*), np [output]
map(1,*) - ilospecify coordinates of a subpatch 'held' by the process which number is stored in map(5,*)
map(2,*) - ihi
map(3,*) - jlo
map(4,*) - jhi
logical function nga_locate_region(g_a, lo, hi, map, proclist, np )An n-dimensional version of ga_locate_region. The list of processes that contain parts for the region are in proclist. Array map is defined as
integer g_a array handle [input]
integer ndim number of dimensions
integer lo(ndim),hi(ndim) region(patch) specifications [input]
integer map(2*ndim,*) patch ownership array [output]
integer proclist(np) list of processes [output]
integer np number of processes [output]
map(1:ndim,i)
-
contains lower bound dimensions for part owned by process proclist(i)
map(ndim+1:2*ndim,i)
-
contains upper bound dimensions for part owned by process proclist(i)
This is a local operation.
subroutine ga_inquire(g_a, type, dim1, dim2)Returns type and dimensions of array. This is a local operation.
integer g_a [input]
integer type [output]
integer dim1 [output]
integer dim2 [output]
subroutine nga_inquire(g_a, type, ndim, dims)Returns type and dimensions of an n-dimensional array. This is a local operation.
integer g_a array handle [input]
integer type data type id [output]
integer ndim number of dimensions [output]
integer dims(ndim) array of dimensions [output]
integer function ga_inquire_memory()Returns amount of memory (in bytes) used in the allocated global arrays on the calling processor. This is a local operation.
subroutine ga_inquire_name(g_a, array_name)Returns the name of an array represented by the handle g_a. This is a local operation.
integer g_a [input]
character*(*) array_name [output]
integer function ga_ndim(g_a)Returns the number ot dimensions in an array represented by the handle g_a. This is a local operation.
integer g_a [input]
subroutine ga_nblock(g_a, nblock)Given a distribution of an array represented by the handle g_a, returns the number of partitions of each array dimension.
integer g_a - array handle [input]
integer nblock[ndim] - number of partitions for each dimension [output]
integer function ga_memory_avail()Returns amount of memory (in bytes) left for allocation of new global arrays on the calling processor.
Note: If ga_uses_ma returns true, then ga_memory_avail returns the lesser of the amount available under the GA limit and the amount available from MA (according to ma_inquire_avail operation). If no GA limit has been set, it returns what MA says is available.
If ( ! ga_uses_ma() && ! ga_memory_limited()
) returns < 0, indicating that the bound on currently available
memory
cannot be determined.
This is a local operation.
logical function ga_uses_ma()Returns .true. if memory in global arrays comes from the Memory Allocator (MA). .false. means that memory comes from another source, for example System V shared memory is used.
logical function ga_memory_limited()Indicates if limit is set on memory usage in Global Arrays on the calling processor.
subroutine ga_set_memory_limit(limit)
integer limit [input]Sets the amount of memory (in bytes) to be used per process.
subroutine ga_proc_topology(g_a, proc, prow, pcol)Based on the distribution of the array associated with handle g_a, determines block row and column coordinates for the array section 'owned' by specified processor. The numbering starts from 0. The value of -1 means that the processor doesn't 'own' any section of array represented by g_a.
integer g_a [input]
integer proc [input]
integer prow, pcol [output]
subroutine ga_print_patch(g_a,ilo,ihi,jlo,jhi,pretty)Prints a patch of g_a array to the standard output. If pretty has the value 0 then output is printed in a dense fashion. If pretty has the value 1 then output is formatted and rows/columns labeled.
integer g_a [input]
integer ilo,ihi,jlo,jhi [input] coordinates of the patch
integer pretty [input]
subroutine ga_print(g_a)Prints an entire array to the standard output.
integer g_a [input]
subroutine ga_print_stats()This non-collective (MIMD) operation prints information about:
subroutine ga_print_distribution(g_a)Prints array distribution.
integer g_a [input]
This is a collective operation.
subroutine ga_check_handle(g_a, string)Check that the global array handle g_a is valid ... if not call ga_error with the string provided and some more info.
integer g_a [input]
character *(*) string [input]
subroutine ga_init_fence()Initializes tracing of completion status of data movement operations. This is a local operation.
subroutine ga_fence()Blocks the calling process until all the data transfers corresponding to GA operations called after ga_init_fence complete. For example, since ga_put might return before the data reaches the final destination ga_init_fence and ga_fence allow process to wait until the data is actually moved:
call ga_init_fence()ga_fence must be called after ga_init_fence. A barrier, ga_sync, assures completion of all data transfers and implicitly cancels outstandingga_init_fence. ga_init_fence and ga_fence must be used in pairs, multiple calls to ga_fence require the same number of corresponding ga_init_fence calls. ga_init_fence/ga_fence pairs can be nested.
call ga_put(g_a, ...)
call ga_fence()
ga_fence works for multiple GA operations. For example:
call ga_init_fence()The calling process will be blocked until data movements initiated by two calls to ga_put and one ga_scatter complete.
call ga_put(g_a, ...)
call ga_scatter(g_a, ...)
call ga_put(g_b, ...)
call ga_fence()
logical function ga_create_mutexes(number)Creates a set containing the number of mutexes. Returns .true. if the opereation succeeded or .false. when failed. Mutex is simple synchronization object used to protect Critical Section. Only one set of mutexes can exist at a time. Mutexes can be created and destroyed as many times as needed.
integer number [input]
Mutexes are numbered: 0, ..., number -1.
This is a collective operation.
logical function ga_destroy_mutexes()Destroys the set of mutexes created with ga_create_mutexes. Returns .true. if the operation succeeded or .false. when failed.
This is a collective operation.
subroutine ga_lock(mutex)
integer mutex [input] ! mutex idLocks a mutex object identified by the mutex number. It is a fatal error for a process to attempt to lock a mutex which was already locked by this process.
subroutine ga_unlock(mutex)Unlocks a mutex object identified by the mutex number. It is a fatal error for a process to attempt to unlock a mutex which has not been locked by this process.
integer mutex [input] ! mutex id
integer function ga_nodeid()Returns the GA process id (0, ..., ga_nnodes()-1) of the requesting compute process. This is a local operation.
NOTE: the GA process id might not be the same as
message-passing
node id.
integer function ga_nnodes()Returns the number of the GA compute (user) processes.
NOTE: the number of the GA processes might not be the same as the
number
of the message-passing nodes in releases <3.0 of the GA package.
This is a local operation.
subroutine ga_brdcst(type, buf, lenbuf, root)Broadcast from process root to all other processes a message of length lenbuf.
integer type [input] ! message type for broadcast
byte buf(lenbuf) [input/output]
integer lenbuf [input]
integer root [input]
This is a convenience operation available regardless of the message-passing library that GA is running with.
This is a collective operation.
subroutine ga_dgop(type, x, n, op)Double Global OPeration.
integer type [input]
double precision x(n) [input/output]
character*(*) op [input]
X(1:N) is a vector present on each process. DGOP 'sums' elements of X accross all nodes using the commutative operator OP. The result is broadcast to all nodes. Supported operations include '+', '*', 'max', 'min', 'absmax', 'absmin'. The use of lowerecase for operators is necessary.
This is a convenience operation, available regardless of the message-passing library that GA is running with.
This is a collective operation.
subroutine ga_igop(type, x, n, op)Integer Global OPeration. The integer version of ga_dgop described above, also include the bitwise OR operation.
integer type [input]
integer x(n) [input/output]
character*(*) OP [input]
This is a convenience operation available regardless of the message-passing library that GA is running with.
This is a collective operation.
integer function ga_cluster_nnodes()
This functions returns the total number of nodes that the program
is running on. On SMP architectures, this will be less than or equal to
the total number of processors.
This is a local
operation.
integer function ga_cluster_nodeid()This function returns the node ID of the process. On SMP architectures with more than one processor per node, several processes may return the same node id.
This is a local
operation.
integer function ga_cluster_proc_nodeid(proc)This function returns the node ID of the specified process proc. On SMP architectures with more than one processor per node, several processes may return the same node id.
integer proc [input]
This is a local
operation.
integer function ga_cluster_nprocs(inode)This function returns the number of processors available on node inode.
integer inode [input]
This is a local
operation.
integer function ga_cluster_procid(inode,iproc)This function returns the processor id associated with node inode and the local processor id iproc. If node inode has N processors, then the value of iproc lies between 0 and N-1.
integer inode,iproc [input]
This is a local
operation.
subroutine ga_list_nodeid(list, n)Returns message-passing process ID (rank in MPI) for the GA processes in the range 0 .. n-1. For MPI, the ranks are in MPI_COMM_WORLD.
integer n [input]
integer list(n) [output]
void ga_mpi_communicator(GA_COMM)Available in C only if MPI is used. Returns communicator handle for GA processes (group). Obsolete as of release 3.0.
MPI_Comm *GA_COMM; [output]
subroutine ga_abs_value(g_a)
integer g_a - array handle [input]
Take element-wise absolute value of the array.
This is a collective
operation.
subroutine ga_abs_value_patch(g_a, lo, hi)
integer g_a - array handle [input]
integer lo(ndim), hi(ndim) - g_a patch coordinates [input]
Take element-wise absolute value of the patch.
This is a collective
operation.
subroutine ga_add_constant(g_a, alpha)
integer g_a - array handle [input]
double/complex/integer/float alpha [input]
Add the contant pointed by alpha to each element of the array.
This is a collective
operation.
subroutine ga_add_constant_patch(g_a, lo, hi, alpha)
integer g_a - array handle [input]
integer ndim - number of dimensions [input]
integer lo(ndim), hi(ndim) - patch coordinates [input]
double/complex/integer/float alpha [input]
Add the contant pointed by alpha to each element of the patch.
This is a collective
operation.
subroutine ga_recip(g_a)
integer g_a - array handle [input]
Take element-wise reciprocal of the array.
This is a collective
operation.
subroutine ga_recip_patch(g_a, lo, hi)
integer g_a - array handle [input]
integer ndim - number of dimensions [input]
integer lo(ndim), hi(ndim) - patch coordinates [input]
Take element-wise reciprocal of the patch.
This is a collective operation.
subroutine ga_elem_multiply(g_a, g_b, g_c)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
Computes the element-wise product of the two arrays
which must be of the same types and same number of
elements. For two-dimensional arrays,
c(i, j) = a(i,j)*b(i,j)
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_elem_multiply_patch(g_a, alo, ahi, g_b, blo, bhi, g_c, clo, chi)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
integer ndim - number of dimensions [input]
integer alo(ndim), ahi(ndim) - g_a patch dimensions [input]
integer blo(ndim), bhi(ndim) - g_b patch dimensions [input]
integer clo(ndim), chi(ndim) - g_c patch dimensions [input]
Computes the element-wise product of the two patches
which must be of the same types and same number of
elements. For two-dimensional arrays,
c(i, j) = a(i,j)*b(i,j)
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_elem_divide(g_a, g_b, g_c)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
Computes the element-wise quotient of the two arrays
which must be of the same types and same number of
elements. For two-dimensional arrays,
c(i, j) = a(i,j)/b(i,j)
The result (c) may replace one of the input arrays (a/b). If one of the elements of array g_b is zero, the quotient for the element of g_c will be set to GA_NEGATIVE_INFINITY.
This is a collectiveoperation.
subroutine ga_elem_divide_patch(g_a, alo, ahi, g_b, blo, bhi, g_c, clo, chi)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
integer ndim - number of dimensions [input]
integer alo(ndim), ahi(ndim) - g_a patch dimensions [input]
integer blo(ndim), bhi(ndim) - g_b patch dimensions [input]
integer clo(ndim), chi(ndim) - g_c patch dimensions [input]
Computes the element-wise quotient of the two patches
which must be of the same types and same number of
elements. For two-dimensional arrays,
c(i, j) = a(i,j)/b(i,j)
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_elem_maximum(g_a, g_b, g_c)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
Computes the element-wise maximum of the two arrays
which must be of the same types and same number of
elements. For two dimensional arrays,
c(i, j) = max{a(i,j), b(i,j)}
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_elem_maximum_patch(g_a, alo, ahi, g_b, blo, bhi, g_c, clo, chi)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
integer ndim - number of dimensions [input]
integer alo(ndim), ahi(ndim) - g_a patch dimensions [input]
integer blo(ndim), bhi(ndim) - g_b patch dimensions [input]
integer clo(ndim), chi(ndim) - g_c patch dimensions [input]
Computes the element-wise maximum of the two patches
which must be of the same types and same number of
elements. For two-dimensional of noncomplex arrays,
c(i, j) = max{a(i,j), b(i,j)}
If the data type is complex, then
c(i, j).real = max{ |a(i,j)|, |b(i,j)|} while c(i,j).image = 0.
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_elem_minimum(g_a, g_b, g_c)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
Computes the element-wise minimum of the two arrays
which must be of the same types and same number of
elements. For two dimensional arrays,
c(i, j) = min{a(i,j), b(i,j)}
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_elem_minimum_patch(g_a, alo, ahi, g_b, blo, bhi, g_c, clo, chi)
integer g_a, g_b - array handles [input]
integer g_c - array handle [output]
integer ndim - number of dimensions [input]
integer alo(ndim), ahi(ndim) - g_a patch dimensions [input]
integer blo(ndim), bhi(ndim) - g_b patch dimensions [input]
integer clo(ndim), chi(ndim) - g_c patch dimensions [input]
Computes the element-wise minimum of the two patches
which must be of the same types and same number of
elements. For two-dimensional of noncomplex arrays,
c(i, j) = min{a(i,j), b(i,j)}
If the data type is complex, then
c(i, j).real = min{ |a(i,j)|, |b(i,j)|} while c(i,j).image = 0.
The result (c) may replace one of the input arrays (a/b).
This is a collective
operation.
subroutine ga_shift_diagonal(g_a, c)
integer g_a - array handle [input]
double/complex/integer/float [input]
Adds this constant to the diagonal elements of the matrix.
This is a collective
operation.
subroutine ga_set_diagonal(g_a, g_v)
integer g_a, g_v - array handles [input]
Sets the diagonal elements of this matrix g_a with the elements of
the
vector g_v.
This is a collective
operation.
subroutine ga_zero_diagonal(g_a)
integer g_a - array handle [input]
Sets the diagonal elements of this matrix g_a with zeros.
This is a collective
operation.
subroutine ga_add_diagonal(g_a, g_v)
integer g_a, g_v - array handles [input]
Adds the elements of the vector g_v to the diagonal of this matrix
g_a.
This is a collective
operation.
subroutine ga_get_diag(g_a, g_v)
integer g_a - array handle [input]
integer g_v - array handle [input]
Inserts the diagonal elements of this matrix g_a into the vector
g_v.
This is a collective
operation.
subroutine ga_scale_rows(g_a, g_v)
integer g_a, g_v - array handles [input]
Scales the rows of this matrix g_a using the vector g_v.
This is a collective
operation.
subroutine ga_scale_cols(g_a, g_v)
integer g_a, g_v - array handles [input]
Scales the columns of this matrix g_a using the vector g_v.
This is a collective
operation.
subroutine ga_norm1(g_a, nm)
integer g_a - array handle [input]
double precision nm - matrix/vector 1-norm value [output]
Computes the 1-norm of the matrix or vector g_a.
This is a collective
operation.
subroutine ga_norm_infinity(g_a, nm)
integer g_a - array handle [input]
double precision nm - matrix/vector infinity-norm value [output]
Computes the infinity-norm of the matrix or vector g_a.
This is a collective
operation.
subroutine ga_median(g_a, g_b, g_c, g_m)
integer g_a, g_b, g_c - array handles [input]
integer g_m - array handle [output]
Computes the componentwise median of three arrays g_a, g_b, and g_c,
and
stores the result in this array g_m. The result (m) may replace
one of the input arrays (a/b/c).
This is a collective
operation.
subroutine ga_median_patch(g_a,alo,ahi,g_b,blo,bhi,g_c,clo,chi,g_m,mlo,mhi)
integer g_a, g_b, g_c - array handles [input]
integer g_m - array handle [output]
integer ndim - number of dimensions [input]
integer alo(ndim), ahi(ndim) - g_a patch dimensions [input]
integer blo(ndim), bhi(ndim) - g_b patch dimensions [input]
integer clo(ndim), chi(ndim) - g_c patch dimensions [input]
integer mlo(ndim), mhi(ndim) - g_m patch dimensions [input]
Computes the componentwise median of three patches g_a, g_b, and
g_c,
and
stores the result in this patch g_m. The result (m) may replace
one of the input patches (a/b/c).
This is a collective
operation.
subroutine ga_step_max(g_a, g_b, step)
integer g_a, g_b - array handles [input]
double precision step - the maximum step [output]
Calculates the largest multiple of a vector g_b that can be added
to this vector g_a while keeping each element of this vector
nonnegative.
This is a collective
operation.
subroutine ga_step_max2(g_xx, g_vv, g_xxll, g_xxuu, step2)where
integer g_xx, g_vv, g_xxll, g_xxuu - array handles [input]
double precision step2 - the maximum step size [output]
g_vv - the step direction
g_xxll - lower bounds
g_xxuu - upper bounds
Calculates the largest step size that should be used in a projected
bound line search.
This is a collective
operation.
subroutine ga_step_max_patch(g_a, alo, ahi, g_b, blo, bhi, step)
integer g_a, g_b - array handles where g_b is step direction [input]
integer alo, ahi, blo, bhi - patch coordinates of g_a and g_b [input]
double precision step - the maximum step [output]
Calculates the largest multiple of a vector g_b that can be added
to this vector g_a while keeping each element of this vector
nonnegative.
This is a collective
operation.
subroutine ga_step_max2_patch(g_xx, xxlo, xxhi, g_vv, vvlo, vvhi, g_xxll,
xxlllo, xxllhi, g_xxuu, xxuulo, xxuuhi, step2)
integer g_xx, g_vv, g_xxll, g_xxuu - array handles [input]
integer xxlo, xxhi, vvlo, vvhi - patch coordinates [input]
integer xxlllo, xxllhi, xxuulo, xxuuhi - patch coordinates [input]
double precision step2 - the maximum step size [output]
where
g_vv - the step direction
g_xxll - lower bounds
g_xxuu - upper bounds
Calculates the largest step size that should be used in a projected
bound line search.
This is a collective
operation.
integer function ga_pgroup_get_default()This function will return a handle to the default processor group which can then be used to create a global array using one of the nga_create_*_config or ga_set_pgroup calls.
This is a local operation.
integer function ga_pgroup_get_mirror()This function will return a handle to the mirrored processor group, which can then be used to create a mirrored global array using one of the nga_create_*_config or ga_set_pgroup calls.
This is a local
operation.
integer function ga_pgroup_get_world()This function will return a handle to the world group, i.e. the group containing all processors that the code is running on. This function can be used to get access to the world group if the default group has been set to a subset of processors.
This is a local operation.
subroutine ga_pgroup_sync(p_handle)This operation executes a synchronization group across the processors in the processor group specified by p_handle. Nodes outside this group are unaffected.
integer p_handle - processor group handle [input]
This is a collective operation on the processor
group
specified by p_handle.
subroutine ga_pgroup_brdcst(p_handle, type, buf, lenbuf, root)Broadcast data from processor specified by root to all other processors in the processor group specified by p_handle. The length of the message in bytes is specified by lenbuf. Type is an arbitrary index assigned to each call to distinguish if from other broadcast calls.
integer p_handle - processor group handle [input]
integer type - message index [input]
byte buf(lenbuf) - local message buffer [input/output]
integer lenbuf - length of message [input]
integer root - processor sending message [input]
This is a collective operation on the processor
group
specified by p_handle.
subroutine ga_pgroup_dgop(p_handle, type, buf, n, op)buf(n) is a double precision array present on each processor in the processor group p_handle. The ga_pgroup_dgop 'sums' all elements in buf(n) across all processors in the group specified by p_handle using the commutative operation specified by the character string op. The result is broadcast to all processor in p_handle. Allowed strings are '+', '*', 'max', 'min', 'absmax', 'absmin'. The use of lowerecase for operators is necessary.
integer p_handle - processor group handle [input]
integer type - message index [input]
double precision buf(n) - double precision array [input/output]
integer n - number elements in array [input]
character*(*) op - operation on data [input]
This is a collective operation on the processor
group
specifed by p_handle.
subroutine ga_pgroup_igop(p_handle, type, buf, n, op)buf(n) is an integer array present on each processor in the processor group p_handle. The ga_pgroup_igop 'sums' all elements in buf(n) across all processors in the group specified by p_handle using the commutative operation specified by the character string op. The result is broadcast to all processor in p_handle. Allowed strings are '+', '*', 'max', 'min', 'absmax', 'absmin'. The use of lowerecase for operators is necessary.
integer p_handle - processor group handle [input]
integer type - message index [input]
integer buf(n) - integer array [input/output]
integer n - number elements in array [input]
character*(*) op - operation on data [input]
This is a collective operation on the processor
group
specified by p_handle.
subroutine ga_pgroup_sgop(p_handle, type, buf, n, op)buf(n) is a single precsion array present on each processor in the processor group p_handle. The ga_pgroup_sgop 'sums' all elements in buf(n) across all processors in the group specified by p_handle using the commutative operation specified by the character string op. The result is broadcast to all processor in p_handle. Allowed strings are '+', '*', 'max', 'min', 'absmax', 'absmin'. The use of lowerecase for operators is necessary.
integer p_handle - processor group handle [input]
integer type - message index [input]
real buf(n) - single precision array [input/output]
integer n - number elements in array [input]
character*(*) op - operation on data [input]
This is a collective operation on the processor
group
specified by p_handle.
integer function ga_pgroup_nnodes(p_handle)This function returns the number of processors contained in the group specifed by p_handle.
integer p_handle - processor group handle [input]
This is a local operation.
integer function ga_pgroup_nodeid(p_handle)This function returns the relative index of the processor in the processor group specified by p_handle. This index will generally differ from the absolute processor index returned by ga_nodeid if the processor group is not the world group.
integer p_handle - processor group handle [input]
This is a local operation.
subroutine ga_merge_mirrored(g_a)This subroutine merges mirrored arrays by adding the contents of each array across nodes. The result is that the each mirrored copy of the array represented by g_a is the sum of the individual arrays before the merge operation. After the merge, all mirrored arrays are equal.
integer g_a [input]
This is a collective
operation.
integer ga_is_mirrored(g_a)This subroutine checks if the array is mirrored array or not. Returns 1 if it is a mirrored array, else returns 0.
integer g_a array handle [input]
This is a local
operation.
integer nga_merge_distr_patch(g_a, alo, ahi, g_b, blo, bhi)This function merges all copies of a patch of a mirrored array (g_a) into a patch in a distributed array (g_b).
integer g_a, g_b - array handles [input]
integer ndim - number of dimensions of the global array
integer alo(ndim), ahi(ndim) - g_a patch coordinates [input]
integer blo(ndim), bhi(ndim) - g_b patch coordinates [input]
This is a collective
operation.
subroutine ga_nbget(g_a, ilo, ihi, jlo, jhi, buf, ld, nbhandle)Non-blocking version of the blocking get operation. The get operation can be completed locally by making a call to the wait (e.g.NGA_NbWait) routine.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer ilo, ihi - starting indices for global array section [input]
integer jlo, jhi - ending indices for global array section [input]
<type> buf - local buffer array where the data goes [output]
integer ld - leading dimension/stride/extent for buffer array [input]
integer nbhandle - non-blocking request handle [input]
This is a non-blocking one-sided
operation.
subroutine nga_nbget(g_a, lo, hi, buf, ld, nbhandle)Non-blocking version of the blocking get operation. The get operation can be completed locally by making a call to the wait (e.g.NGA_NbWait) routine.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo[ndim] - array of starting indices for global array section [input]
integer hi[ndim] - array of ending indices for global array section [input]
<type> buf - local buffer array where the data goes [output]
integer ld[ndim-1] - array specifying leading dimensions/strides/extents for buffer array [input]
integer nbhandle - non-blocking request handle [input]
This is a non-blocking one-sided
operation.
subroutine ga_nbput(g_a, ilo, ihi, jlo, jhi, buf, ld, nbhandle)Non-blocking version of the blocking put operation. The put operation can be completed locally by making a call to the wait (e.g.NGA_NbWait) routine.
integer g_a - global array handle [input]
integer ilo, ihi - starting indices for global array section [input]
integer jlo, jhi - ending indices for global array section [input]
<type>buf - local buffer array where the data is [input]
integer ld - leading dimension/stride/extent for buffer array [input]
integer nbhandle - non-blocking request handle [input]
This is a non-blocking one-sided
operation.
subroutine nga_nbput(g_a, lo, hi, buf, ld, nbhandle)Non-blocking version of the blocking put operation. The put operation can be completed locally by making a call to the wait (e.g.NGA_NbWait) routine.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo[ndim] - array of starting indices for global array section [input]
integer hi[ndim] - array of ending indices for global array section [input]
<type>buf - local buffer array where the data is [input]
integer ld[ndim-1] - array specifying leading dimensions/strides/extents for buffer array [input]
integer nbhandle - non-blocking request handle [input]
This is a non-blocking one-sided
operation.
subroutine ga_nbacc(g_a, ilo, ihi, jlo, jhi, buf, ld, alpha, nbhandle)Non-blocking version of the blocking accumulate operation. The accumulate operation can be completed locally by making a call to the wait (e.g.NGA_NbWait) routine.
integer g_a - global array handle [input]
integer ilo, ihi - starting indices for global array section [input]
integer jlo, jhi - ending indices for global array section [input]
<type>buf - local buffer array where the data is [input]
integer ld - leading dimension/stride/extent for buffer array [input]
double precision/complex alpha - scale factor [input]
integer nbhandle - non-blocking request handle [input]
This is a non-blocking one-sided
operation.
subroutine nga_nbacc(g_a, lo, hi, buf, ld, alpha, nbhandle)Non-blocking version of the blocking accumulate operation. The accumulate operation can be completed locally by making a call to the wait (e.g.NGA_NbWait) routine.
integer g_a - global array handle [input]
integer ndim - number of dimensions of the global array
integer lo[ndim] - array of starting indices for global array section [input]
integer hi[ndim] - array of ending indices for global array section [input]
<type>buf - local buffer array where the data is [input]
integer ld[ndim-1] - array specifying leading dimensions/strides/extents for buffer array [input]
double precision/complex alpha - scale factor [input]
integer nbhandle - non-blocking request handle [input]
This is a non-blocking one-sided
operation.
subroutine ga_nbwait(nbhandle)This function completes a non-blocking one-sided operation locally. Waiting on a nonblocking put or an accumulate operation assures that data was injected into the network and the user buffer can be now be reused. Completing a get operation assures data has arrived into the user memory and is ready for use. Wait operation ensures only local completion. Unlike their blocking counterparts, the nonblocking operations are not ordered with respect to the destination. Performance being one reason, the other reason is that by ensuring ordering we incur additional and possibly unnecessary overhead on applications that do not require their operations to be ordered. For cases where ordering is necessary, it can be done by calling a fence operation. The fence operation is provided to the user to confirm remote completion if needed.
integer nbhandle - non-blocking request handle [input]
subroutine nga_nbwait(nbhandle)This function completes a non-blocking one-sided operation locally. Waiting on a nonblocking put or an accumulate operation assures that data was injected into the network and the user buffer can be now be reused. Completing a get operation assures data has arrived into the user memory and is ready for use. Wait operation ensures only local completion. Unlike their blocking counterparts, the nonblocking operations are not ordered with respect to the destination. Performance being one reason, the other reason is that by ensuring ordering we incur additional and possibly unnecessary overhead on applications that do not require their operations to be ordered. For cases where ordering is necessary, it can be done by calling a fence operation. The fence operation is provided to the user to confirm remote completion if needed.
integer nbhandle - non-blocking request handle [input]
double precision function ga_wtime()This function return a wall (or elapsed) time on the calling processor. Returns time in seconds representing elapsed wall-clock time since an arbitrary time in the past. Example:
subroutine ga_set_debug(flag)This function sets an internal flag in the GA library to either true or false. The value of this flag can be recovered at any time using the ga_get_debug function. The flag is set to false when the the GA library is initialized. This can be useful in a number of debugging situations, especially when examining the behavior of routines that are called in multiple locations in a code.
logical flag - value to set internal debug flag [input]
logical function ga_get_debug()This function returns the value of an internal flag in the GA library whose value can be set using the ga_set_debug subroutine.
subroutine ga_patch_enum(g_a, lo, hi, start, inc)This subroutine enumerates the values of an array between elements lo and hi starting with the value istart and incrementing each subsequent value by inc. This operation is only applicable to 1-dimensional arrays. An example of its use is shown below:
integer g_a - array handle [input/output]
integer lo, hi - low and high values of array patch [input]
integer/double precision/complex start - starting value of enumeration [input]
integer/double precision/complex inc - increment value [input]
call ga_patch_enum(g_a, 1, n, 7, 2)This is a collective operation.
g_a: 7 9 11 13 15 17 19 21 23 ...
subroutine ga_scan_add(g_src, g_dest, g_mask, lo, hi, excl)This operation will add successive elements in a source vector g_src and put the results in a destination vector g_dest. The addition will restart based on the values of the integer mask vector g_mask. The scan is performed within the range specified by the integer values lo and hi. Note that this operation can only be applied to 1-dimensional arrays. The excl flag determines whether the sum starts with the value in the source vector corresponding to the location of a 1 in the mask vector (excl=0) or whether the first value is set equal to 0 (excl=1). Some examples of this operation are given below.
integer g_src - handle for source array [input]
integer g_dest - handle for destination array [output]
integer g_mask - handle for integer array representing a bit-mask [input]
integer lo, hi - low and high values of range on which operation
is performed [input]
integer excl - value to signify if masked values are included
in add [input]
call ga_scan_add(g_src, g_dest, g_mask, 1, n, 0)This is a collective operation.
g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
g_dest: 1 3 6 10 16 21 7 15 9 19 30 12 25 39 15 16 33
call ga_scan_add(g_src, g_dest, g_mask, 1, n, 1)
g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
g_dest: 0 1 3 6 10 15 0 7 0 9 19 0 12 25 0 0 16
subroutine ga_scan_copy(g_src, g_dest, g_mask, lo, hi)This subroutine does a segmented scan-copy of values in the source array g_src into a destination array g_dest with segments defined by values in the integer mask array g_mask. The scan-copy operation is only applied to the range between the lo and hi indices. This operation is restriced to 1-dimensional arrays. The resulting destination array will consist of segments of consecutive elements with the same value. An example is shown below
integer g_src - handle for source array [input]
integer g_dest - handle for destination array [output]
integer g_mask - handle for integer array representing a bit-mask [input]
integer lo, hi - low and high values of range on which operation
is performed [input]
call ga_scan_copy(g_src, g_dest, g_mask, 1, n)
g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
g_dest: 1 1 1 1 1 1 7 7 9 9 9 12 12 12 15 16 16
This is a collective operation.
subroutine ga_pack(g_src, g_dest, g_mask, lo, hi, icount)The pack subroutine is designed to compress the values in the source vector g_src into a smaller destination array g_dest based on the values in an integer mask array g_mask. The values lo and hi denote the range of elements that should be compressed and icount is a variable that on output lists the number of values placed in the compressed array. This operation is the complement of the ga_unpack operation. An example is shown below
integer g_src - handle for source array [input]
integer g_dest - handle for destination array [output]
integer g_mask - handle for integer array representing a bit-mask [input]
integer lo, hi - low and high values of range on which operation
is performed [input]
integer icount - number of values in compressed array [output]
call ga_pack(g_src, g_dest, g_mask, 1, n, icount)This is a collective operation.
g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
g_src: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
g_dest: 1 7 9 12 15 16
icount: 6
subroutine ga_unpack(g_src, g_dest, g_mask, lo, hi, icount)The unpack subroutine is designed to expand the values in the source vector g_src into a larger destination array g_dest based on the values in an integer mask array g_mask. The values lo and hi denote the range of elements that should be compressed and icount is a variable that on output lists the number of values placed in the uncompressed array. This operation is the complement of the ga_pack operation. An example is shown below
integer g_src - handle for source array [input]
integer g_dest - handle for destination array [output]
integer g_mask - handle for integer array representing a bit-mask [input]
integer lo, hi - low and high values of range on which operation
is performed [input]
integer icount - number of values in uncompressed array [output]
call ga_unpack(g_src, g_dest, g_mask, 1, n, icount)This is a collective operation.
g_src: 1 7 9 12 15 16
g_mask: 1 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
g_dest: 1 0 0 0 0 0 7 0 9 0 0 12 0 0 15 16 0
icount: 6