趣味で計算流砂水理

趣味で計算流砂水理

Computational Sediment Hydraulics for Fun

MENU

numbaによる並列化の検証:水深平均二次元流計算を例に

スポンサーリンク

以前の記事:ALB測量を使って水深平均二次元流計算をすると凄い結果がでた - 趣味で計算流砂水理の計算速度についてのメモです。

numbaによる並列化の話です。numbaについては以下を参照下さい。


numba並列化の有無による速度比較

ちゃんと計測したわけではなく、タイムスタンプからの概算です。

  • 並列化 6コア12スレッド:31時間
  • シングル:64時間

並列化効率 206%

メッシュ数は430万とそれなりの数ですが、二次元なのでスレッド数の割には全然高速化しないですね。多分4スレッドくらいで十分だと思います。

numbaによる並列化の基礎

詳しくは、公式Automatic parallelization with @jit — Numba 0.50.1 documentationに書いていますが、 要点だけをまとめておきます。

numbaの高速化は、

  • 自動並列化
  • 手動並列化

の2種類に分けられます。

自動並列化について

自動並列化は、 numbaによる高速化の対象となる関数のオプションにnopython=True、parallel=Trueを追記することによって、有効になります。

これによって、numpy関数が並列化されたり、複数のループを結合したり、冗長的な書き方が最適化されたりと、上手く高速化してくれます。

最適化の結果は、parallel_diagnostics(level)メソッドで確認できます。

今回の計算例より、numba並列化を行った1つ関数について結果を確認してみます。

まず、numbaのオプションを設定

@numba.jit(nopython=True, parallel=True)
def conEq(dep, qx, qy, dzb, dt, dx, dy, ibx, hmin, hbuf, hdown, periodic=True):
......................

一度プログラムを実行した後に、parallel_diagnosticsメソッドで確認します。なお、levelは出力する情報のレベルで1が最小、4が最大になります。今回は4にしています。

conEq.parallel_diagnostics(level=4)
================================================================================
 Parallel Accelerator Optimizing:  Function conEq, <ipython-
input-5-cf997de63147> (1)  
================================================================================


Parallel loop listing for  Function conEq, <ipython-input-5-cf997de63147> (1) 
------------------------------------------------------------------------------------------------------------------|loop #ID
@numba.jit(nopython=True, parallel=True)                                                                          | 
def conEq(dep, qx, qy, dzb, dt, dx, dy, ibx, hmin, hbuf, hdown, periodic=True):                                   | 
#     Qind = range(154,159)                                                                                       | 
                                                                                                                  | 
    imax, jmax = len(dep), len(dep[0])                                                                            | 
    depn = np.zeros_like(dep, dtype=np.float64)                                                                   | 
    fluxx = np.zeros((imax+1, jmax), dtype=np.float64)------------------------------------------------------------| #3
    fluxy = np.zeros((imax, jmax+1), dtype=np.float64)------------------------------------------------------------| #4
    modflux = np.full( (imax, jmax), False)                                                                       | 
                                                                                                                  | 
    gravity = float( 9.8 )                                                                                        | 
                                                                                                                  | 
    f = lambda Qp, Qm : Qm if Qp >= 0.0 and Qm >= 0.0 else (Qp if Qp <= 0.0 and Qm <= 0.0 else 0.5*Qp+0.5*Qm )    | 
                                                                                                                  | 
    def flux(Qp, Qm, depp, depm, zbp, zbm, ib, delta) :                                                           | 
        r = f(Qp, Qm)                                                                                             | 
#         if ( (depm + zbm) < zbp - ib*delta) and (depp <= hbuf) : r = 0.0                                        | 
#         if ( (depp + zbp) < zbm + ib*delta) and (depm <= hbuf) : r = 0.0                                        | 
        if ( (depm + zbm) <= zbp + hbuf - ib*delta) and (depp <= hbuf) : r = 0.0                                  | 
        if ( (depp + zbp) <= zbm + hbuf + ib*delta) and (depm <= hbuf) : r = 0.0                                  | 
                                                                                                                  | 
        return r                                                                                                  | 
                                                                                                                  | 
    for i in numba.prange( imax ):--------------------------------------------------------------------------------| #8
        for j in range( jmax ):                                                                                   | 
            c, xm = (i,j), (i-1,j)                                                                                | 
            fluxx[c] = flux(qx[c], qx[xm], dep[c], dep[xm], dzb[c], dzb[xm], ibx, dx)                             | 
                                                                                                                  | 
    if periodic :                                                                                                 | 
# boundary : periodic                                                                                             | 
        fluxx[-1,:] = fluxx[0,:] ---------------------------------------------------------------------------------| #0
    else:                                                                                                         | 
        for j in numba.prange( jmax ): fluxx[-1,j] = fluxx[-2,j] # qx[-1,j] # if qx[-1,j] > 0.0 else qx[,j]-------| #5
# normal                                                                                                          | 
#         for j in numba.prange( jmax ): fluxx[-1,j] = qx[-1,j] if qx[-1,j] > 0.0 else 0.0                        | 
                                                                                                                  | 
    for i in numba.prange( imax ):--------------------------------------------------------------------------------| #7
        for j in range( 1, jmax ):                                                                                | 
            c, ym = (i,j), (i,j-1)                                                                                | 
            fluxy[c] = flux(qy[c], qy[ym], dep[c], dep[ym], dzb[c], dzb[ym], 0.0, dy)                             | 
                                                                                                                  | 
# wall boundary                                                                                                   | 
#     fluxy[:,0] = 0.0                                                                                            | 
#     fluxy[:,-1] = 0.0                                                                                           | 
                                                                                                                  | 
    for i in numba.prange( imax ):--------------------------------------------------------------------------------| #6
        fluxy[i,-1] = qy[i,-1] if qy[i,-1] > 0.0 else 0.0                                                         | 
        fluxy[i, 0] = qy[i, 0] if qy[i, 0] < 0.0 else 0.0                                                         | 
                                                                                                                  | 
    nis = 0 if periodic else 1                                                                                    | 
# limiter --------------------------------------------------------------                                          | 
# 水深が負になる際に質量保存を満たすためにフラックスを修正する                                                                                  | 
    for i in range(nis, imax):                                                                                    | 
        for j in range(jmax):                                                                                     | 
            if dep[c] > hmin :                                                                                    | 
                c, xp, yp = (i, j), (i+1, j), (i, j+1)                                                            | 
                fxp = fluxx[xp] if fluxx[xp] > 0.0 else 0.0                                                       | 
                fxm = -fluxx[c] if fluxx[c] < 0.0 else 0.0                                                        | 
                fyp = fluxy[yp] if fluxy[yp] > 0.0 else 0.0                                                       | 
                fym = -fluxy[c] if fluxy[c] < 0.0 else 0.0                                                        | 
                V =  dep[c]*dx*dy - hmin*dx*dy                                                                    | 
                Vq = ( fxp*dy + fxm*dy + fyp*dx + fym*dx )*dt                                                     | 
                if V <  Vq:                                                                                       | 
                    alfa = V / Vq - 0.001                                                                         | 
                    if fluxx[xp] > 0.0 : fluxx[xp] *= alfa                                                        | 
                    if fluxx[c]  < 0.0 : fluxx[c]  *= alfa                                                        | 
                    if fluxy[yp] > 0.0 : fluxy[yp] *= alfa                                                        | 
                    if fluxy[c]  < 0.0 : fluxy[c]  *= alfa                                                        | 
                                                                                                                  | 
                    modflux[c] = True                                                                             | 
# ------------------------------------------------------------------------                                        | 
    n = 0                                                                                                         | 
    for i in numba.prange(nis, imax):-----------------------------------------------------------------------------| #9
        for j in range(jmax):                                                                                     | 
            c, xp, yp = (i, j), (i+1, j), (i, j+1)                                                                | 
            depn[c] = dep[c] - dt*(fluxx[xp] - fluxx[c])/dx - dt*(fluxy[yp] - fluxy[c])/dy                        | 
            if depn[c] < hmin :                                                                                   | 
                n += 1                                                                                            | 
#                 print('dep-error')                                                                              | 
#                 print( modflux[c], depn[c], fluxx[xp], fluxx[c], fluxy[yp], fluxy[c] )                          | 
                fxp = fluxx[xp] if fluxx[xp] > 0.0 else 0.0                                                       | 
                fxm = -fluxx[c] if fluxx[c] < 0.0 else 0.0                                                        | 
                fyp = fluxy[yp] if fluxy[yp] > 0.0 else 0.0                                                       | 
                fym = -fluxy[c] if fluxy[c] < 0.0 else 0.0                                                        | 
                V =  dep[c]*dx*dy - hmin*dx*dy                                                                    | 
                Vq = ( fxp*dy + fxm*dy + fyp*dx + fym*dx )*dt                                                     | 
#                 print(V,Vq)                                                                                     | 
                                                                                                                  | 
                depn[c] = hmin                                                                                    | 
                                                                                                                  | 
# upstream boundary                                                                                               | 
#     if periodic == False: depn[0][:] = depn[1][:]                                                               | 
    if periodic == False:                                                                                         | 
#         depn[0][:] = dep[0][:]                                                                                  | 
#         for j in Qind : depn[0,j] = depn[1,j]                                                                   | 
                                                                                                                  | 
        depn[0,:] = depn[1,:]  -----------------------------------------------------------------------------------| #1
                                                                                                                  | 
# downstream boundary                                                                                             | 
#     depn[-1][:] = hdown                                                                                         | 
    depn[-1][:] = depn[-2][:]-------------------------------------------------------------------------------------| #2
                                                                                                                  | 
    return depn, n                                                                                                | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
  Trying to fuse loops #7 and #6:
    - fusion succeeded: parallel for-loop #6 is fused into for-loop #7.
  Trying to fuse loops #3 and #4:
    - fusion failed: loop dimension mismatched in axis 0. slice(0, 
$48binary_add.19, 1) != slice(0, dep_size0.183, 1)
----------------------------- Before Optimisation ------------------------------
Parallel region 0:
+--7 (parallel)
+--6 (parallel)


--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel region 0:
+--7 (parallel, fused with loop(s): 6)


 
Parallel region 0 (loop #7) had 1 loop(s) fused.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 
---------------------------Loop invariant code motion---------------------------
Allocation hoisting:
No allocation hoisting found

Instruction hoisting:
loop #3:
  Has the following hoisted:
    $expr_out_var.287 = const(float64, 0.0)
  Failed to hoist the following:
    dependency: $parfor_index_tuple_var.288 = build_tuple(items=[Var($parfor__index_281.743, <string>:2), Var($parfor__index_282.745, <string>:3)])
loop #4:
  Has the following hoisted:
    $expr_out_var.295 = const(float64, 0.0)
  Failed to hoist the following:
    dependency: $parfor_index_tuple_var.296 = build_tuple(items=[Var($parfor__index_289.789, <string>:2), Var($parfor__index_290.791, <string>:3)])
loop #8:
  Has the following hoisted:
    $28binary_multiply.12.112 = ibx * dx
    bool34.115 = global(bool: <class 'bool'>)
    bool42.120 = global(bool: <class 'bool'>)
    $const172.7 = const(int, 1)
    $const4.1.151 = const(float, 0.0)
    bool8.153 = global(bool: <class 'bool'>)
    r.2.145 = const(float, 0.0)
    $const42.0.174 = const(float, 0.5)
    $const48.3.177 = const(float, 0.5)
    r.1.144 = const(float, 0.0)
    $64binary_multiply.8.131 = ibx * dx
    bool70.134 = global(bool: <class 'bool'>)
    $const12.1.156 = const(float, 0.0)
    bool16.158 = global(bool: <class 'bool'>)
    $152load_global.1 = global(range: <class 'range'>)
    $const32.1.168 = const(float, 0.0)
    bool36.170 = global(bool: <class 'bool'>)
    $const24.1.163 = const(float, 0.0)
    bool28.165 = global(bool: <class 'bool'>)
    bool78.139 = global(bool: <class 'bool'>)
  Failed to hoist the following:
    dependency: $160for_iter.2 = iternext(value=$158get_iter.4)
    dependency: $j.328 = pair_first(value=$160for_iter.2)
    dependency: $160for_iter.4 = pair_second(value=$160for_iter.2)
    dependency: $16binary_add.6.106 = $depm.94.327 + $zbm.96.326
    dependency: $22binary_add.9.109 = $zbp.95.322 + hbuf
    dependency: $30binary_subtract.13.113 = $22binary_add.9.109 - $28binary_multiply.12.112
    dependency: $32compare_op.14.114 = $16binary_add.6.106 <= $30binary_subtract.13.113
    dependency: $34pred.116 = call $push_global_to_block.837($32compare_op.14.114, func=$push_global_to_block.837, args=(Var($32compare_op.14.114, <ipython-input-5-cf997de63147>:19),), kws=(), vararg=None)
    dependency: $40compare_op.2.119 = $depp.93.319 <= hbuf
    dependency: $42pred.121 = call $push_global_to_block.838($40compare_op.2.119, func=$push_global_to_block.838, args=(Var($40compare_op.2.119, <ipython-input-5-cf997de63147>:19),), kws=(), vararg=None)
    dependency: $c.9.656 = build_tuple(items=[Var($parfor__index_303.828, <string>:3), Var($j.328, <ipython-input-5-cf997de63147>:25)])
    dependency: $174binary_subtract.8 = $parfor__index_303.828 - $const172.7
    dependency: $xm.323 = build_tuple(items=[Var($174binary_subtract.8, <ipython-input-5-cf997de63147>:26), Var($j.328, <ipython-input-5-cf997de63147>:25)])
    dependency: $r.103.3.2.848 = getitem(value=qx, index=$c.9.656, fn=<built-in function getitem>)
    dependency: $r.103.3.2.850 = getitem(value=qx, index=$xm.323, fn=<built-in function getitem>)
    dependency: $depp.93.319 = getitem(value=dep, index=$c.9.656, fn=<built-in function getitem>)
    dependency: $depm.94.327 = getitem(value=dep, index=$xm.323, fn=<built-in function getitem>)
    dependency: $zbp.95.322 = getitem(value=dzb, index=$c.9.656, fn=<built-in function getitem>)
    dependency: $zbm.96.326 = getitem(value=dzb, index=$xm.323, fn=<built-in function getitem>)
    dependency: $6compare_op.2.152 = $r.103.3.2.848 >= $const4.1.151
    dependency: $8pred.154 = call $push_global_to_block.839($6compare_op.2.152, func=$push_global_to_block.839, args=(Var($6compare_op.2.152, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: $46binary_multiply.2.176 = $const42.0.174 * $r.103.3.2.848
    dependency: $52binary_multiply.5.179 = $const48.3.177 * $r.103.3.2.850
    dependency: $r.103.3.2.849 = $46binary_multiply.2.176 + $52binary_multiply.5.179
    dependency: $52binary_add.2.125 = $depp.93.319 + $zbp.95.322
    dependency: $58binary_add.5.128 = $zbm.96.326 + hbuf
    dependency: $66binary_add.9.132 = $58binary_add.5.128 + $64binary_multiply.8.131
    dependency: $68compare_op.10.133 = $52binary_add.2.125 <= $66binary_add.9.132
    dependency: $70pred.135 = call $push_global_to_block.840($68compare_op.10.133, func=$push_global_to_block.840, args=(Var($68compare_op.10.133, <ipython-input-5-cf997de63147>:20),), kws=(), vararg=None)
    dependency: $14compare_op.2.157 = $r.103.3.2.850 >= $const12.1.156
    dependency: $16pred.159 = call $push_global_to_block.841($14compare_op.2.157, func=$push_global_to_block.841, args=(Var($14compare_op.2.157, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: c_8 = c.9
    dependency: i = $parfor__index_303.828
    not pure: $156call_function.3 = call $push_global_to_block.836(_jmax_316, func=$push_global_to_block.836, args=[Var(_jmax_316, <ipython-input-5-cf997de63147>:5)], kws=(), vararg=None)
    dependency: $158get_iter.4 = getiter(value=$156call_function.3)
    dependency: $34compare_op.2.169 = $r.103.3.2.850 <= $const32.1.168
    dependency: $36pred.171 = call $push_global_to_block.842($34compare_op.2.169, func=$push_global_to_block.842, args=(Var($34compare_op.2.169, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: $26compare_op.2.164 = $r.103.3.2.848 <= $const24.1.163
    dependency: $28pred.166 = call $push_global_to_block.843($26compare_op.2.164, func=$push_global_to_block.843, args=(Var($26compare_op.2.164, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: $76compare_op.2.138 = $depm.94.327 <= hbuf
    dependency: $78pred.140 = call $push_global_to_block.844($76compare_op.2.138, func=$push_global_to_block.844, args=(Var($76compare_op.2.138, <ipython-input-5-cf997de63147>:20),), kws=(), vararg=None)
loop #0:
  Failed to hoist the following:
    dependency: $value_var.274 = getitem(value=_262binary__subscr_7, index=$parfor__index_273.877, fn=<built-in function getitem>)
loop #5:
  Has the following hoisted:
    $const298.3 = const(int, -2)
    $const308.8 = const(int, -1)
  Failed to hoist the following:
    dependency: $302build_tuple.5 = build_tuple(items=[Var($const298.3, <ipython-input-5-cf997de63147>:33), Var($parfor__index_297.903, <string>:2)])
    dependency: $304binary_subscr.6 = getitem(value=fluxx, index=$302build_tuple.5, fn=<built-in function getitem>)
    dependency: $312build_tuple.10 = build_tuple(items=[Var($const308.8, <ipython-input-5-cf997de63147>:33), Var($parfor__index_297.903, <string>:2)])
loop #7:
  Has the following hoisted:
    $const24.1.72 = const(float, 0.0)
    bool28.74 = global(bool: <class 'bool'>)
    $const42.0.83 = const(float, 0.5)
    $const48.3.86 = const(float, 0.5)
    r.1.53 = const(float, 0.0)
    $const12.1.65 = const(float, 0.0)
    bool16.67 = global(bool: <class 'bool'>)
    $const364.8 = const(int, 1)
    $ib.6.334 = const(float, 0.0)
    $const4.1.60 = const(float, 0.0)
    bool8.62 = global(bool: <class 'bool'>)
    $64binary_multiply.8.40 = $ib.6.334 * dy
    bool70.43 = global(bool: <class 'bool'>)
    $const32.1.77 = const(float, 0.0)
    bool36.79 = global(bool: <class 'bool'>)
    $const458.4 = const(int, -1)
    $const464.7 = const(float, 0.0)
    bool468 = global(bool: <class 'bool'>)
    $28binary_multiply.12.21 = $ib.6.334 * dy
    bool34.24 = global(bool: <class 'bool'>)
    bool78.48 = global(bool: <class 'bool'>)
    r.2.54 = const(float, 0.0)
    bool42.29 = global(bool: <class 'bool'>)
    $340load_global.1 = global(range: <class 'range'>)
    $const342.2 = const(int, 1)
    $const532.4 = const(int, 0)
    $const518.3 = const(int, 0)
    $const476.3 = const(int, -1)
    $const526.1 = const(float, 0.0)
    $const490.4 = const(int, -1)
    $const500.8 = const(int, 0)
    $const506.11 = const(float, 0.0)
    bool510 = global(bool: <class 'bool'>)
    $const484.1 = const(float, 0.0)
  Failed to hoist the following:
    dependency: $26compare_op.2.73 = $r.12.3.2.962 <= $const24.1.72
    dependency: $28pred.75 = call $push_global_to_block.948($26compare_op.2.73, func=$push_global_to_block.948, args=(Var($26compare_op.2.73, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: $46binary_multiply.2.85 = $const42.0.83 * $r.12.3.2.962
    dependency: $52binary_multiply.5.88 = $const48.3.86 * $r.12.3.2.963
    dependency: $r.12.3.2.961 = $46binary_multiply.2.85 + $52binary_multiply.5.88
    dependency: $350for_iter.2 = iternext(value=$348get_iter.5)
    dependency: $j.2.341 = pair_first(value=$350for_iter.2)
    dependency: $350for_iter.4 = pair_second(value=$350for_iter.2)
    dependency: $14compare_op.2.66 = $r.12.3.2.963 >= $const12.1.65
    dependency: $16pred.68 = call $push_global_to_block.949($14compare_op.2.66, func=$push_global_to_block.949, args=(Var($14compare_op.2.66, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: $c.7.659 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($j.2.341, <ipython-input-5-cf997de63147>:38)])
    dependency: $366binary_subtract.9 = $j.2.341 - $const364.8
    dependency: $ym.330 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($366binary_subtract.9, <ipython-input-5-cf997de63147>:39)])
    dependency: $r.12.3.2.962 = getitem(value=qy, index=$c.7.659, fn=<built-in function getitem>)
    dependency: $r.12.3.2.963 = getitem(value=qy, index=$ym.330, fn=<built-in function getitem>)
    dependency: $depp.2.337 = getitem(value=dep, index=$c.7.659, fn=<built-in function getitem>)
    dependency: $depm.3.335 = getitem(value=dep, index=$ym.330, fn=<built-in function getitem>)
    dependency: $zbp.4.331 = getitem(value=dzb, index=$c.7.659, fn=<built-in function getitem>)
    dependency: $zbm.5.336 = getitem(value=dzb, index=$ym.330, fn=<built-in function getitem>)
    dependency: $6compare_op.2.61 = $r.12.3.2.962 >= $const4.1.60
    dependency: $8pred.63 = call $push_global_to_block.950($6compare_op.2.61, func=$push_global_to_block.950, args=(Var($6compare_op.2.61, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: $52binary_add.2.34 = $depp.2.337 + $zbp.4.331
    dependency: $58binary_add.5.37 = $zbm.5.336 + hbuf
    dependency: $66binary_add.9.41 = $58binary_add.5.37 + $64binary_multiply.8.40
    dependency: $68compare_op.10.42 = $52binary_add.2.34 <= $66binary_add.9.41
    dependency: $70pred.44 = call $push_global_to_block.951($68compare_op.10.42, func=$push_global_to_block.951, args=(Var($68compare_op.10.42, <ipython-input-5-cf997de63147>:20),), kws=(), vararg=None)
    dependency: $34compare_op.2.78 = $r.12.3.2.963 <= $const32.1.77
    dependency: $36pred.80 = call $push_global_to_block.952($34compare_op.2.78, func=$push_global_to_block.952, args=(Var($34compare_op.2.78, <ipython-input-5-cf997de63147>:13),), kws=(), vararg=None)
    dependency: c_6 = c.7
    dependency: $460build_tuple.5 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($const458.4, <ipython-input-5-cf997de63147>:47)])
    dependency: $462binary_subscr.6 = getitem(value=qy, index=$460build_tuple.5, fn=<built-in function getitem>)
    dependency: $466compare_op.8 = $462binary_subscr.6 > $const464.7
    dependency: $468pred = call $push_global_to_block.953($466compare_op.8, func=$push_global_to_block.953, args=(Var($466compare_op.8, <ipython-input-5-cf997de63147>:47),), kws=(), vararg=None)
    dependency: $16binary_add.6.15 = $depm.3.335 + $zbm.5.336
    dependency: $22binary_add.9.18 = $zbp.4.331 + hbuf
    dependency: $30binary_subtract.13.22 = $22binary_add.9.18 - $28binary_multiply.12.21
    dependency: $32compare_op.14.23 = $16binary_add.6.15 <= $30binary_subtract.13.22
    dependency: $34pred.25 = call $push_global_to_block.954($32compare_op.14.23, func=$push_global_to_block.954, args=(Var($32compare_op.14.23, <ipython-input-5-cf997de63147>:19),), kws=(), vararg=None)
    dependency: $76compare_op.2.47 = $depm.3.335 <= hbuf
    dependency: $78pred.49 = call $push_global_to_block.955($76compare_op.2.47, func=$push_global_to_block.955, args=(Var($76compare_op.2.47, <ipython-input-5-cf997de63147>:20),), kws=(), vararg=None)
    dependency: $40compare_op.2.28 = $depp.2.337 <= hbuf
    dependency: $42pred.30 = call $push_global_to_block.956($40compare_op.2.28, func=$push_global_to_block.956, args=(Var($40compare_op.2.28, <ipython-input-5-cf997de63147>:19),), kws=(), vararg=None)
    not pure: $346call_function.4 = call $push_global_to_block.947($const342.2, _jmax_316, func=$push_global_to_block.947, args=[Var($const342.2, <ipython-input-5-cf997de63147>:38), Var(_jmax_316, <ipython-input-5-cf997de63147>:5)], kws=(), vararg=None)
    dependency: $348get_iter.5 = getiter(value=$346call_function.4)
    dependency: $534build_tuple.5 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($const532.4, <ipython-input-5-cf997de63147>:48)])
    dependency: $520build_tuple.4 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($const518.3, <ipython-input-5-cf997de63147>:48)])
    dependency: $522binary_subscr.5 = getitem(value=qy, index=$520build_tuple.4, fn=<built-in function getitem>)
    dependency: $478build_tuple.4 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($const476.3, <ipython-input-5-cf997de63147>:47)])
    dependency: $480binary_subscr.5 = getitem(value=qy, index=$478build_tuple.4, fn=<built-in function getitem>)
    dependency: $492build_tuple.5 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($const490.4, <ipython-input-5-cf997de63147>:47)])
    dependency: $502build_tuple.9 = build_tuple(items=[Var($parfor__index_301.939, <string>:3), Var($const500.8, <ipython-input-5-cf997de63147>:48)])
    dependency: $504binary_subscr.10 = getitem(value=qy, index=$502build_tuple.9, fn=<built-in function getitem>)
    dependency: $508compare_op.12 = $504binary_subscr.10 < $const506.11
    dependency: $510pred = call $push_global_to_block.957($508compare_op.12, func=$push_global_to_block.957, args=(Var($508compare_op.12, <ipython-input-5-cf997de63147>:48),), kws=(), vararg=None)
loop #9:
  Has the following hoisted:
    $const1202.6 = const(float, 0.0)
    bool1206 = global(bool: <class 'bool'>)
    $const1228.6 = const(float, 0.0)
    bool1232 = global(bool: <class 'bool'>)
    $1010load_global.1 = global(range: <class 'range'>)
    $const1032.7 = const(int, 1)
    $const1044.13 = const(int, 1)
    bool1130 = global(bool: <class 'bool'>)
    $const1136.3 = const(int, 1)
    $const1148.8 = const(float, 0.0)
    bool1152 = global(bool: <class 'bool'>)
    $const1174.6 = const(float, 0.0)
    bool1178 = global(bool: <class 'bool'>)
  Failed to hoist the following:
    dependency: $1200binary_subscr.5 = getitem(value=fluxy, index=$yp.1.355, fn=<built-in function getitem>)
    dependency: $1204compare_op.7 = $1200binary_subscr.5 > $const1202.6
    dependency: $1206pred = call $push_global_to_block.1022($1204compare_op.7, func=$push_global_to_block.1022, args=(Var($1204compare_op.7, <ipython-input-5-cf997de63147>:83),), kws=(), vararg=None)
    dependency: $1226binary_subscr.5 = getitem(value=fluxy, index=$c.3.356, fn=<built-in function getitem>)
    dependency: $1230compare_op.7 = $1226binary_subscr.5 < $const1228.6
    dependency: $1232pred = call $push_global_to_block.1023($1230compare_op.7, func=$push_global_to_block.1023, args=(Var($1230compare_op.7, <ipython-input-5-cf997de63147>:84),), kws=(), vararg=None)
    not pure: $1014call_function.3 = call $push_global_to_block.1021(_jmax_316, func=$push_global_to_block.1021, args=[Var(_jmax_316, <ipython-input-5-cf997de63147>:5)], kws=(), vararg=None)
    dependency: $1016get_iter.4 = getiter(value=$1014call_function.3)
    dependency: $1018for_iter.2 = iternext(value=$1016get_iter.4)
    dependency: $j.4.359 = pair_first(value=$1018for_iter.2)
    dependency: $1018for_iter.4 = pair_second(value=$1018for_iter.2)
    dependency: $c.3.356 = build_tuple(items=[Var($parfor__index_305.1017, <string>:3), Var($j.4.359, <ipython-input-5-cf997de63147>:74)])
    dependency: $1034binary_add.8 = $parfor__index_305.1017 + $const1032.7
    dependency: $xp.1.357 = build_tuple(items=[Var($1034binary_add.8, <ipython-input-5-cf997de63147>:75), Var($j.4.359, <ipython-input-5-cf997de63147>:74)])
    dependency: $1046binary_add.14 = $j.4.359 + $const1044.13
    dependency: $yp.1.355 = build_tuple(items=[Var($parfor__index_305.1017, <string>:3), Var($1046binary_add.14, <ipython-input-5-cf997de63147>:75)])
    dependency: $1064binary_subscr.18 = getitem(value=dep, index=$c.3.356, fn=<built-in function getitem>)
    dependency: $1072binary_subscr.22 = getitem(value=fluxx, index=$xp.1.357, fn=<built-in function getitem>)
    dependency: $1078binary_subscr.25 = getitem(value=fluxx, index=$c.3.356, fn=<built-in function getitem>)
    dependency: $1080binary_subtract.26 = $1072binary_subscr.22 - $1078binary_subscr.25
    dependency: $1082binary_multiply.27 = dt * $1080binary_subtract.26
    dependency: $1086binary_true_divide.29 = $1082binary_multiply.27 / dx
    dependency: $1088binary_subtract.30 = $1064binary_subscr.18 - $1086binary_true_divide.29
    dependency: $1096binary_subscr.34 = getitem(value=fluxy, index=$yp.1.355, fn=<built-in function getitem>)
    dependency: $1102binary_subscr.37 = getitem(value=fluxy, index=$c.3.356, fn=<built-in function getitem>)
    dependency: $1104binary_subtract.38 = $1096binary_subscr.34 - $1102binary_subscr.37
    dependency: $1106binary_multiply.39 = dt * $1104binary_subtract.38
    dependency: $1110binary_true_divide.41 = $1106binary_multiply.39 / dy
    dependency: $1112binary_subtract.42 = $1088binary_subtract.30 - $1110binary_true_divide.41
    dependency: $1124binary_subscr.47 = getitem(value=depn, index=$c.3.356, fn=<built-in function getitem>)
    dependency: $1128compare_op.49 = $1124binary_subscr.47 < hmin
    dependency: $1130pred = call $push_global_to_block.1024($1128compare_op.49, func=$push_global_to_block.1024, args=(Var($1128compare_op.49, <ipython-input-5-cf997de63147>:77),), kws=(), vararg=None)
    dependency: $n.4.663 = inplace_binop(fn=<built-in function iadd>, immutable_fn=<built-in function add>, lhs=n.2, rhs=$const1136.3, static_lhs=Undefined, static_rhs=Undefined)
    dependency: $1146binary_subscr.7 = getitem(value=fluxx, index=$xp.1.357, fn=<built-in function getitem>)
    dependency: $1150compare_op.9 = $1146binary_subscr.7 > $const1148.8
    dependency: $1152pred = call $push_global_to_block.1025($1150compare_op.9, func=$push_global_to_block.1025, args=(Var($1150compare_op.9, <ipython-input-5-cf997de63147>:81),), kws=(), vararg=None)
    dependency: n_3 = n.2
    dependency: $1172binary_subscr.5 = getitem(value=fluxx, index=$c.3.356, fn=<built-in function getitem>)
    dependency: $1176compare_op.7 = $1172binary_subscr.5 < $const1174.6
    dependency: $1178pred = call $push_global_to_block.1026($1176compare_op.7, func=$push_global_to_block.1026, args=(Var($1176compare_op.7, <ipython-input-5-cf997de63147>:82),), kws=(), vararg=None)
loop #1:
  Failed to hoist the following:
    dependency: $value_var.277 = getitem(value=_1356binary__subscr_7, index=$parfor__index_276.1071, fn=<built-in function getitem>)
loop #2:
  Failed to hoist the following:
    dependency: $value_var.280 = getitem(value=_1384binary__subscr_7, index=$parfor__index_279.1098, fn=<built-in function getitem>)
--------------------------------------------------------------------------------

かなり長い出力ですが、 After Optimisationをみるとわかるとおり、自動で最適された場所は、#7と#6のループがfusedされたのみです。

私は並列化を意識してソースを書いているので、あまり最適化する場所が無かったみたいです。

手動並列化について

次に手動並列化ですが、先程と同様の

@numba.jit(nopython=True, parallel=True)
def conEq(dep, qx, qy, dzb, dt, dx, dy, ibx, hmin, hbuf, hdown, periodic=True):
......................

に加えて、並列化の対象とするループの回数をrangeからnumba.prangeに変更します。

for i in numba.prange( imax ):
......................

これによって、指定したループが並列化されます。 通常の並列化のため、共有変数等の扱いには十分に注意する必要があります。

まとめ

以上でnumba並列化の基礎をまとめました。気軽に使えますが、並列化効率を上げるためにはコーディングの段階で十分に工夫する必要があるのは、他の並列化手法と同様な感じです。


pythonの高速化はもうちょっと勉強したいですね。 以下の書籍が詳しそうですがまだ読めてないです。