The one-sided operations that Global Arrays
provide
can be summarized into three categories:
| Remote blockwise write/read | ga_put, ga_get |
| Remote atomic update | ga_acc, ga_read_inc, ga_scatter_acc |
| Remote elementwise write/read | ga_scatter, ga_gather |
Put copies data from the local array to the global array section, which is
n-D
Fortran
subroutine nga_put(g_a,
lo, hi, buf, ld)
2-D
Fortran
subroutine ga_put(g_a,
ilo, ihi, jlo, jhi, buf, ld)
C
void NGA_Put(int
g_a, int lo[], int hi[], void *buf, int ld[])
C++
void GA::GlobalArray::put(int
lo[], int hi[], void *buf, int ld[])
All the arguments are provided in one call: lo and hi specify where the data should go in the global array; ld specifies the stride information of the local array buf. The local array should have the same number of dimensions as the global array; however, it is really required to present the n-dimensional view of the local memory buffer, that by itself might be one-dimensional.
The operation is transparent to the user, which means the user doesn't have to worry about where the region defined by lo and hi is located. It can be in the memory of one or many remote processes, owned by the local process, or even mixed (part of it belongs to remote processes and part of it belongs to a local process).
Get is the reverse operation of put. It copies data from a global array section to the local array. It is
n-D
Fortran
subroutine nga_get(g_a,
lo, hi, buf, ld)
2-D
Fortran
subroutine ga_get(g_a,
ilo, ihi, jlo, jhi, buf, ld)
C
void NGA_Get(int
g_a, int lo[], int hi[], void *buf, int ld[])
C++
void GA::GlobalArray::get(int
lo[], int hi[], void *buf, int ld[])
Similar to put, lo
and
hi
specify
where the data should come from in the global array, and
ld specifies
the stride information of the local array buf.
The local array is assumed to have the same number of dimensions as the
global array. Users don't need to worry about where the region defined
by lo and hi
is physically located.
Example:
For a ga_get operation transferring data from the (11:15,1:5) section of a 2-dimensional 15 x10 global array into a ocal buffer 5 x10 array we have: (In Fortran notation)
lo={11,1}, hi={15,5}, ld={10}
| 15 | ||||
| 10 |
10
|
Accumulate combines the data from the local array with data in the global array section, which is
n-D
Fortran
subroutine nga_acc(g_a,
lo, hi, buf, ld, alpha)
2-D
Fortran
subroutine ga_acc(g_a,
ilo, ihi, jlo, jhi, buf, ld, alpha)
C
void NGA_Acc(int
g_a, int lo[], int hi[], void *buf, int ld[],
void *alpha)
C++
void NGA::GlobalArray::acc(int
lo[], int hi[], void *buf, int ld[],
void *alpha)
The local array is assumed to have the same number of dimensions as the global array. Users don't need to worry about where the region defined by lo and hi is physically located. The function performs
global array section (lo[], hi[]) += alpha * buf
Read_inc remotely updates a particular element in the global array, which is
n-D
Fortran
subroutine nga_read_inc(g_a,
subscript, inc)
2-D
Fortran
subroutine ga_read_inc(g_a,
i, j, inc)
C
long NGA_Read_inc(int
g_a, int subscript[], long inc)
C++
long GA::GlobalArray::readInc(int
subscript[], long inc)
This function applies to integer arrays only. It atomically reads and increments an element in an integer array. It performs
a(subsripts) += inc
and returns the original value (before the update) of a(subscript).
Scatter puts array elements into a global array, which is
n-D
Fortran
subroutine nga_scatter(g_a,
v, subsarray, n)
2-D
Fortran
subroutine ga_scatter(g_a,
v, i, j, n)
C
void NGA_Scatter(int
g_a, void *v, int *subsarray[], int n)
C++
void GA::GlobalArray::scatter(void
*v, int *subsarray[], int n)
It performs (in C notation)
for(k=0; k<= n; k++) {
a[subsArray[k][0]][subsArray[k][1]][subsArray[k][2]]...
= v[k];
}
Example:
Scatter the 5 elements into a 10x10 global array
Element
1
v[0] = 5 subsArray[0][0] = 2
subsArray[0][1] = 3
Element
2
v[1] = 3 subsArray[1][0] = 3
subsArray[1][1] = 4
Element
3
v[2] = 8 subsArray[2][0] = 8
subsArray[2][1] = 5
Element
4
v[3] = 7 subsArray[3][0] = 3
subsArray[3][1] = 7
Element
5
v[4] = 2 subsArray[4][0] = 6
subsArray[4][1] = 3
After the scatter operation, the five elements would be scattered
into
the global array as shown in the following figure.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| 0 | ||||||||||
| 1 | ||||||||||
| 2 | 5 | |||||||||
| 3 | 3 | 7 | ||||||||
| 4 | ||||||||||
| 5 | ||||||||||
| 6 | 2 | |||||||||
| 7 | ||||||||||
| 8 | 8 | |||||||||
| 9 |
Gather is the reverse operation of scatter. It gets the array elements from a global array into a local array.
n-D
Fortran
subroutine nga_gather(g_a,
v, subsarray, n)
2-D
Fortran
subroutine ga_gather(g_a,
v, i, j, n)
C
void NGA_Gather(int
g_a, void *v, int *subsarray[], int n)
C++
void GA::GlobalArray::gather(void
*v, int *subsarray[], int n)
It performs (in C notation)
for(k=0; k<= n; k++){
v[k] = a[subsArray[k][0]][subsArray[k][1]][subsArray[k][2]]...;
}
Example:
Assume a two dimensional global array g_a with dimensions
5 X 5.

To access a patch [2:4,-1:3], one can assume that the array is wrapped over in the second dimension, as shown in the following figure

17 22 2
7
12
18 23 3
8
13
19 24 4
9
14
Periodic operations extend the boudary of each dimension in two
directions,
toward lower bound and torward
the upper bound. For any dimension with lo(i) to hi(i),
where 1 < i < ndim, it extends the range
from
[lo(i) : hi(i)]
to
[(lo(i)-1-(hi(i)-lo(i)+1)) : (lo(i)-1)], [lo(i) : hi(i)],
and [(hi(i)+1) : (hi(i)+1+(hi(i)-lo(i)+1))], or
[(lo(i)-1-(hi(i)-lo(i)+1)) : (hi(i)+1+(hi(i)-lo(i)+1))].
Even though the patch span in a much large range, the length must always be less, or equals to (hi(i)-lo(i)+1)).
Example:
For a 2 x 2 array as shown in the following figure, where the
dimensions
are [1:2, 1:2], periodic operations would look the range of each
dimensions
as [-1:4, -1:4].

Current version of GA supports three periodic operations. They are
Fortran
subroutine nga_periodic_get(g_a,
lo, hi, buf, ld)
C
void NGA_Periodic_get(int
g_a, int lo[], int hi[], void *buf, int ld[])
C++
void GA::GlobalArray::periodicGet(int
lo[], int hi[], void *buf, int ld[])
Similar to regular get, lo and hi specify where the data should come from in the global array, and ld specifies the stride information of the local array buf.
Example:
Let us look at the first example in this section. It is 5 x 5 two
dimensional
global array. Assume that the local buffer is an 4x3 array.

After the periodic get, the local buffer buf would be
19 24 4
20 25 5
16 21 1
17 22 2
Periodic Put is the reverse operations of Periodic Get. It copies data from the local array to the global array section, which is
Fortran
subroutine nga_periodic_put(g_a,
lo, hi, buf, ld)
C
void NGA_Periodic_put(int
g_a, int lo[], int hi[], void *buf, int ld[])
C++
void GA::GlobalArray::periodicPut(int
lo[], int hi[], void *buf, int ld[])
Similar to regular put, lo and hi specify where the data should go in the global array; ld specifies the stride information of the local array buf.
Periodic Put/Get (also include the Accumulate, which will be discussed later in this section) divide the patch into several smaller patches. For those smaller patches that are outside the global aray, adjust the indices so that they rotate back to the original array. After that call the regular Put/Get/Accumulate, for each patch, to complete the operations.
Example:
Look at the example for periodic get. Because it is a 5 x 5 globla
array, the valid indices for each dimension are
dimension 0: [1 : 5]
dimension 1: [1 : 5]
The specified lo and hi are apparently out of the range of each dimension:
dimemsion 0: [-1 :
2]
--> [-1 : 0] -- wrap back --> [4 : 5]
[ 1 : 2] ok
dimension 1: [ 4 :
6]
--> [ 4 : 5] ok
[ 6 : 6] -- wrap back --> [1 : 1]
Hence, there will be four smaller patches after the adjustment. They are
patch 0: [4 :
5,
4 : 5]
patch 1: [4 :
5, 1 : 1]
patch 2: [1 :
2, 4 : 5]
patch 3: [1 :
2, 1 : 1]
as shown in the following figure

Of course the destination addresses of each samller patch in the local buffer also need to be calculated.
Similar to regular Accumulate, Periodic Accumulate combines the data from the local array with data in the global array section, which is
Fortran
subroutine nga_periodic_acc(g_a,
lo, hi, buf, ld, alpha)
C
void NGA_Periodic_acc(int
g_a, int lo[], int hi[], void *buf, int ld[],
void *alpha)
C++
void GA::GlobalArray::periodicAcc(int
lo[], int hi[], void *buf, int ld[],
void *alpha)
The local array is assumed to have the same number of dimensions as the global array. Users don't need to worry about where the region defined by lo and hi is physically located. The function performs
global array section (lo[], hi[]) += alpha * buf
Example:
Let us look at the same example as above. There is 5 x 5 two
dimensional
global array. Assume that the local buffer is an 4x3 array.

The local buffer buf is
1 5 9
4 6 5
3 2 1
7 8 2
and the alpha = 2.
After the Periodic Accumulate operation, the global array will be

The non-blocking operations (get/put/accumulate) are derived from the blocking interface by adding a handle argument that identifies an instance of the non-blocking request. Nonblocking operations initiate a communication call and then return control to the application. A return from a nonblocking operation call indicates a mere initiation of the data transfer process and the operation can be completed locally by making a call to the wait (e.g. nga_nbwait) routine.
The wait function completes a non-blocking one-sided operation locally. Waiting on a nonblocking put or an accumulate operation assures that data was injected into the network and the user buffer can be now be reused. Completing a get operation assures data has arrived into the user memory and is ready for use. Wait operation ensures only local completion. Unlike their blocking counterparts, the nonblocking operations are not ordered with respect to the destination. Performance being one reason, the other reason is that by ensuring ordering we incur additional and possibly unnecessary overhead on applications that do not require their operations to be ordered. For cases where ordering is necessary, it can be done by calling a fence operation. The fence operation is provided to the user to confirm remote completion if needed.
Example: Let us take a simple case for illustration. Say, there are two global arrays i.e. one array stores pressure and the other stores temperature. If there are two computation phases (first phase computes pressure and second phase computes temperature), then we can overlap communication with computation, thus hiding latency. . . . . . . . .nga_get (get_pressure_array) nga_nbget(initiates data transfer to get temperature_array, and returns immediately) compute_pressure() /* hiding latency - communication is overlapped with computation */ nga_nbwait(temperature_array - completes data transfer) compute_temperature() . . . . . . . . |
n-D
Fortran
subroutine nga_nbput(g_a,
lo, hi, buf, ld, nbhandle)
n-D
Fortran
subroutine nga_nbget(g_a,
lo, hi, buf, ld, nbhandle)
n-D
Fortran
subroutine nga_nbacc(g_a,
lo, hi, buf, ld, alpha, nbhandle)
n-D
Fortran
subroutine nga_nbwait(nbhandle)
2-D
Fortran
subroutine ga_nbput(g_a,
ilo, ihi, jlo, jhi, buf, ld, nbhandle)
2-D
Fortran
subroutine ga_nbget(g_a,
ilo, ihi, jlo, jhi, buf, ld, nbhandle)
2-D
Fortran
subroutine ga_nbacc(g_a,
ilo, ihi, jlo, jhi, buf, ld, alpha, nbhandle)
2-D
Fortran
subroutine ga_nbwait(nbhandle)
C
void NGA_NbPut(int
g_a, int lo[], int hi[], void *buf, int ld[],
ga_nbhdl_t* nbhandle)
C
void NGA_NbGet(int
g_a, int lo[], int hi[], void *buf, int ld[],
ga_nbhdl_t* nbhandle)
C
void NGA_NbAcc(int
g_a, int lo[], int hi[], void *buf, int ld[],
void *alpha, ga_nbhdl_t* nbhandle)
C
int NGA_NbWait(ga_nbhdl_t*
nbhandle)
C++
void GA::GlobalArray::nbPut(int
lo[], int hi[], void *buf, int ld[],
ga_nbhdl_t* nbhandle)
C++
void GA::GlobalArray::nbGet(int
lo[], int hi[], void *buf, int ld[],
ga_nbhdl_t* nbhandle)
C++
void GA::GlobalArray::nbAcc(int
lo[], int hi[], void *buf, int ld[],
void *alpha, ga_nbhdl_t* nbhandle)
C++
int GA::GlobalArray::NbWait(ga_nbhdl_t*
nbhandle)