The fast Gauss transform with all the proofs. VIKAS C. RAYKAR
by user
Comments
Transcript
The fast Gauss transform with all the proofs. VIKAS C. RAYKAR
The fast Gauss transform with all the proofs. VIKAS C. RAYKAR University of Maryland, CollegePark [email protected] In this report we discuss some technical aspects of the fast Gauss transform (FGT). The FGT is a special case of the single level FMM for the Gaussian potential. We give detailed derivation of the correct error bounds which is rather terse in the original paper. We also briefly discuss about the data structures to implement the space subdivision in higher dimensional spaces. [ FGT ]: April 8, 2006 Contents 1 Fast Gauss Transform 1 2 Hermite polynomials 2 3 Multi-index notation 3 4 Taylor expansion of the Hermite function 4 5 Factorization [S- and R-expansion] 4 6 Fast Gauss Transform 6.1 Direct Evaluation . . . . . . . . . . . . . . . . . . . . . 6.2 S-expansion (Hermite) at the center of each source box 6.3 R-expansion (Taylor) at the center of each target box 6.4 S|R translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 6 7 8 7 Algorithm 9 8 Data Structures 9 9 Appendices 12 9.1 Appendix 1 [Hermite series truncation error] . . . . . . . . . . . . . . 12 9.2 Appendix 2 [Taylor series truncation error] . . . . . . . . . . . . . . 13 9.3 Appendix 3 [S|R truncation error] . . . . . . . . . . . . . . . . . . . 14 1. FAST GAUSS TRANSFORM Here we give a detailed analysis of the fast Gauss transform (FGT) based on the techniques and the notation used in the FMM primer [Raykar 2005]. The FGT is a special case of the single level FMM for the Gaussian potential. For each target Fast Gauss Transform April 8, 2006 2 · Vikas Chandrakant Raykar point {yj ∈ Rd }j=1,...,M the discrete Gauss transform is defined as, Φ(yj ) = N X qi e−kyj −xi k 2 /h2 . (1) i=1 where {qi ∈ R+ }i=1,...,N are the source weights, {xi ∈ Rd }i=1,...,N are the source points, i.e., the center of the Gaussians, and h ∈ R+ is the source scale or bandwidth. Φ(yj ) is the total contribution at yj of N Gaussians centered at xi each with bandwidth h. Each Gaussian is weighted by the term qi . The computational complexity to evaluate the discrete Gauss transform at M target points is O(M N ). The Fast Gauss Transform [Greengard and Strain 1991] is an approximation algorithm that reduces the computational complexity to O(M + N ). The constant depends on the desired precision. Given any ² > 0, it computes an approximation Φ̂(yj ) to Φ(yj ) such that the maximum absolute error relative to the total weight PN Q = i=1 qi , " # |Φ̂(yj ) − Φ(yj )| max ≤ ². (2) yj Q The algorithm is based on the single level FMM described above. The Gaussian is factorized via Hermite and Taylor series. The error bound derived in the original paper was shown to be incorrect and a new bound was derived in [Baxter and Roussos 2002]. The space is subdivided into a number of boxes and the sources and targets are assigned to different boxes. For all sources belonging to a particular box, the Sexpansion (Hermite series) coefficients are consolidated at the center of each source box. For each target box the S-expansion (Hermite series) coefficients at each source box are translated via the S|R translation operator as R-expansion (Taylor series) coefficients at the center of the target box. Since the Gaussian decays very rapidly only a few neighboring source boxes will have influence on the target box. 2. HERMITE POLYNOMIALS The Hermite polynomial Hn (y) is defined by the Rodrigues formula ´ n ³ 2 d −y 2 Hn (y) = (−1)n ey e , y ∈ R. dy n (3) Following are the first few Hermite polynomials. H0 (y) = 1 H1 (y) = 2y H2 (y) = 4y 2 − 2 H3 (y) = 8y 3 − 12y (4) The generating function for the Hermite polynomials is 2 e2yx−x = FMM tutorial April 8, 2006 ∞ X xn Hn (y). n! n=0 (5) Writing TRs is a character building process! · 3 2 Multiplying both sides of the preceding equation by e−y yields 2 e−(y−x) = ∞ X xn hn (y), n! n=0 (6) where the Hermite functions hn (y) are defined as, 2 hn (y) = e−y Hn (y). (7) We need a shifted and the scaled version of this formula. 2 e−(y−x) /h2 2 2 = e−[(y−x0 )−(x−x0 )] /h µ ¶n µ ¶ ∞ X 1 x − x0 y − x0 = hn . n! h h n=0 (8) This formula tells us how to evaluate the Gaussian field at the target y due to a source at x, as a Hermite expansion centered at a nearby source x0 . This series converges rapidly and for a given precision, only a certain number of p terms need be retained. Interchanging x and y we have the following expansion. µ ¶n µ ¶ ∞ X 2 2 1 y − y0 x − y0 e−(y−x) /h = hn . (9) n! h h n=0 This formula tells us how to evaluate the Gaussian field at the target y due to a source at x, as a Taylor expansion centered at a nearby target y0 . The following recurrence relation will be useful in computation of the Hermite functions. hn+1 (y) = 2yhn (y) − 2nhn−1 (y), y ∈ R. (10) The Cramer’s inequality [Hille 1926] for Hermite polynomials will be useful in deriving the error bounds. √ 2 |Hn (y)| ≤ K2n/2 n!ey /2 , (11) where K < 1.09. Based on the Cramer’s inequality we have the following useful bound for Hermite functions. 2 1 1 |hn (y)| ≤ K2n/2 √ e−y /2 . (12) n! n! [Baxter and Roussos 2002] use a slightly tighter version of this 2 1 1 |hn (y)| ≤ 2n/2 √ e−y /2 . n! n! (13) 3. MULTI-INDEX NOTATION In order to generalize the above relations to higher dimensions we use the multiindex notation. A multi-index α = (α1 , α2 . . . , αd ) is a d-tuple of nonnegative integers. The length of the multi-index α is defined as |α| = α1 + α2 + . . . + αd . If p is an integer, we say α ≥ p of αi ≥ p for 1 ≤ i ≤ d. The factorial of α is defined as α! = α1 !α2 ! . . . αd !. For any multi-index α ∈ Nd and x = (x1 , x2 , . . . , xd ) ∈ Rd αd α 1 α2 the d-variate monomial xα is defined as xα = xα 1 x2 . . . xd . x is of degree n if FMM tutorial April 8, 2006 4 · Vikas Chandrakant Raykar |α| = n. Let x, y ∈ Rd and v = x·y = x1 y1 +. . .+xd yd . Then using the multi-index notation v n can be written as, X n! vn = xα y α . (14) α! |α|=n The αth derivative with respect to x is defined as, ∂ αd dα ∂ α1 ∂ α2 . = α1 α2 . . . d α dx ∂x1 ∂x2 ∂xα d (15) The multidimensional Hermite function is defined as 2 hα (y) = e−kyk Hα (y) = hα1 (y1 )hα2 (y2 ) . . . hαd (yd ). The Hermite expansion of the multivariate Gaussian can be written as, µ ¶ X 1 µ x − x0 ¶α y − x0 −(y−x)2 /h2 e = hα . α! h h (16) (17) α≥0 The Taylor expansion of the multivariate Gaussian can be written as, µ ¶µ ¶β X 1 2 2 x − y0 y − y0 e−(y−x) /h = hβ . β! h h (18) β≥0 4. TAYLOR EXPANSION OF THE HERMITE FUNCTION For the translation operators we will also need the Taylor expansion of the Hermite function hα (y) about an arbitrary point y0 ∈ Rd . X (y − y0 )β hα (y) = Dβ hα (y0 ). (19) β! β≥0 2 hα (y) = (−1)α Dα e−kyk . (20) Dβ hα (y) = (−1)β hα+β (y). (21) Hence, hα (y) = X (y − y0 )β (−1)β hα+β (y). β! (22) β≥0 5. FACTORIZATION [S- AND R-EXPANSION] The potential or field at y due to a source at xi is given by Φ(y, xi ) = e−ky−xi k 2 /h2 . (23) Note that Φ(y, xi ) is a regular potential. All the sources and targets are assumed d to lie in the smaller boxes with sides √ unit box B0 = [0, 1] . B0 is subdivided into √ of length 2rh, parallel to the axes, with a fixed r ≤ 1/ 2. The spatial domain enclosed by the lth box is denoted as I1 (l) and the center of the box is denoted by xlc . I1c (l) is the complement of I1 (l). Since the potential is regular we need not worry about singular points. FMM tutorial April 8, 2006 Writing TRs is a character building process! · 5 For the lth box ∀xi ∈ I1 (l) and ∀y ∈ I1c (l) we need an S-expansion (far-field) of the form X aα (xi , xlc )Sα (y − xlc ). (24) Φ(y, xi ) = α≥0 The FGT uses the Hermite expansion of the Gaussian as the S-expansion. µ ¶ X 1 µ xi − xl ¶α y − xlc c Φ(y, xi ) = hα . α! h h (25) α≥0 aα (xi , xlc ) 1 = α! µ xi − xlc h ¶α µ , Sα (y − xlc ) = hα y − xlc h ¶ . (26) For the lth box ∀xi ∈ I1c (l) and ∀y ∈ I1 (l) we need an R-expansion (local) of the form X Φ(y, xi ) = bβ (xi , xlc )Rβ (y − xlc ). (27) β≥0 The FGT uses the Taylor expansion of the Gaussian (which is obtained by interchanging y and xi in the Hermite expansion) as the R-expansion. µ ¶µ ¶β X 1 xi − xlc y − xlc Φ(y, xi ) = hβ . (28) β! h h β≥0 bβ (xi , xlc ) = 1 hβ β! µ xi − xlc h ¶ µ , Rβ (y − xlc ) = y − xlc h ¶β . (29) 6. FAST GAUSS TRANSFORM Let the target point y belong to the nth box, i.e., y ∈ I1 (n). We need to evaluate the total field at y due sources in all boxes. Since the Gaussian decays very rapidly only a few boxes close to the target box can contribute more than Q² to the field at y. If we include only (2n + 1)d nearest boxes, then the error q due to ignoring 2 2 ln 1/² all other boxes is bounded by Qe−2r n . This implies n ≥ 2r 2 . Suppose we want compute the field at all yj ∈ I1 (n) due to all sources xi ∈ I1 (l). There are four different ways of doing this. Let there be NB sources in I1 (l) and MC targets in I1 (n). Given the sources in one box and the targets in a neighboring box, the computation is performed using one of the following four methods depending on the number of sources and targets in these boxes: Direct evaluation is used if the number of sources and targets are small. If the sources are clustered in a box then they can be transformed into Hermite expansion about the center of the box. This expansion is directly evaluated at each target in the target box if the number of the targets is small. If the targets are clustered then the sources or their expansion are converted to a local Taylor series which is then evaluated at each target in the box. In practice a cutoff NF = O(pd−1 ) and ML = O(pd−1 ) is introduced. —If NB < NF then the source box I1 (l) sends out NB Gaussians. —If NB ≥ NF then the source box I1 (l) sends out a Hermite expansion. FMM tutorial April 8, 2006 6 · Vikas Chandrakant Raykar —If MC < ML then the target box I1 (n) evaluates all the fields sent to it immediately. —If MC ≥ ML then the target box I1 (n) transforms all fields sent to it into Taylor series, accumulates the coefficients, and then evaluates the Taylor series. 6.1 Direct Evaluation For the lth box let Φl (y) be the potential at y ∈ I1c (l) due to all sources xi ∈ I1 (l). For all y ∈ I1 (n) X X 2 2 Φl (y) = (30) qi Φ(y, xi ) = qi e−ky−xi k /h . xi ∈I1 (l) xi ∈I1 (l) The computational cost is O(NB MC ). 6.2 S-expansion (Hermite) at the center of each source box We form S-expansion (Hermite) of the potential about the source box center xlc . For each box we consolidate the S-expansion coefficients due to all sources in that box into a single rapidly converging Hermite expansion about the center of the box. For all y ∈ I1 (n) X Φl (y) = qi Φ(y, xi ) xi ∈I1 (l) X = qi = α≥0 = X aα (xi , xlc )Sα (y − xlc ) α≥0 xi ∈I1 (l) X X X qi aα (xi , xlc ) Sα (y − xlc ) xi ∈I1 (l) Alα Sα (y − xlc ) = α≥0 X µ Alα hα α≥0 where Alα X = qi aα (xi , xlc ) xi ∈I1 (l) 1 = α! X xi ∈I1 (l) µ qi y − xlc h xi − xlc h ¶ (31) ¶α . (32) We retain only the first pd coefficients. The computational cost is O(pd NB ) + O(pd MC ). The error due to truncation of the Hermite series is bounded by ¯ ¯ ¯ µ ¶¯ µ p ¶d−k d−1 µ ¶ n ¯ X ¯X l y − x Ql d r c ¯ p k ¯ √ EH (p) = ¯ (33) Aα hα ≤ (1 − r ) ¯ d h p! ¯α≥p ¯ (1 − r) k=0 k P where Ql = xi ∈I1 (l) qi and r < 1. See Appendix 1 for the derivation. The total potential at y due to all sources is µ ¶ XX X y − xlc b l (y) = b Alα hα Φ . (34) Φ(y) = h α<p ∀l FMM tutorial April 8, 2006 ∀l Writing TRs is a character building process! · 7 Since there are at most (2n + 1)d source boxes within range the total computational complexity will be O(pd N ) + O((2n + 1)d pd M ). The error µ p ¶d−k d−1 µ ¶ X b |Φ(y) − Φ(y)| 1 d r p k √ ≤ (1 − r ) d Q (1 − r) k p! (35) k=0 where r < 1. 6.3 R-expansion (Taylor) at the center of each target box We form R-expansion (Taylor) of the potential about the target box center xnc . For each source box I1 (l) we consolidate the R-expansion coefficients due to all sources in that box into a single Taylor expansion about the center of the target box I1 (n). For all y ∈ I1 (n) X X X Φl (y) = qi Φ(y, xi ) = qi bβ (xi , xnc )Rβ (y − xnc ) xi ∈I1 (l) = X β≥0 = X β≥0 xi ∈I1 (l) X qi bβ (xi , xnc ) Rβ (y − xnc ) xi ∈I1 (l) Bβln Rβ (y − xnc ) = X µ Bβln β≥0 β≥0 y − xnc h ¶β where, Bβln = X qi bβ (xi , xnc ) = xi ∈I1 (l) 1 β! X (36) µ qi hβ xi ∈I1 (l) xi − xnc h ¶ . (37) d We retain only the first p coefficients. The computational cost is O(pd NB ) + O(pd MC ). The error due to truncation of the Taylor series is bounded by ¯ ¯ ¯ µ ¶ ¯ µ p ¶d−k d−1 µ ¶ X ¯ X ln y − xnc β ¯ d r Ql p k ¯ ¯ √ ET (p) = ¯ (38) Bβ (1 − r ) ≤ ¯ d h p! ¯β≥p ¯ (1 − r) k=0 k P where Ql = xi ∈I1 (l) qi and r < 1. See Appendix 2 for the derivation. The total potential at y ∈ I1 (n) due to all sources is #µ " µ ¶ ¶β n β X XX X X y − x y − xnc c l ln ln b b (y) = Φ(y) = Φ Bβ = Bβ . (39) h h ∀l ∀l β<p β<p ∀l Since there are at most (2n + 1)d source boxes within range the total computational complexity will be O((2n + 1)d pd N ) + O(pd M ). The error µ p ¶d−k d−1 µ ¶ X b 1 |Φ(y) − Φ(y)| d r p k √ ≤ (1 − r ) Q (1 − r)d k p! (40) k=0 where r < 1. FMM tutorial April 8, 2006 8 · Vikas Chandrakant Raykar 6.4 S|R translation We accumulate all sources in a box via a truncated Hermite expansion and then translate the S-expansion coefficients into R-expansion Taylor coefficients at the center of the target box containing the evaluation point. Let y ∈ I1 (n), i.e., the target point belongs to the nth box with center xnc . Now we want to write the potential at y due to all sources in the lth box Φl (y) (expanded using an S expansion around xlc ) as an R expansion around xnc . ¶β µ X X y − xnc Φl (y) = Cβln Rβ (y − xnc ) = Cβln (41) h β≥0 β≥0 P where Cβln = α≥0 (S|R)βα (xnc − xlc )Alα and Alα are the S-expansion coefficients around xlc . The S|R-translation operator can be derived using the Taylor expansion of the Hermite function [See Section 4]. µ ¶ X µ ¶β µ n ¶ (−1)|β| y − xnc y − xlc xc − xlc hα = hα+β . (42) h β! h h β≥0 The S-expansion is given by Φl (y) = = = = µ ¶ ¶ µ n |β| n β l X (−1) y − xc y− xc − xc Alα hα = Alα hα+β h β! h h α≥0 α≥0 β≥0 µ n ¶ µ ¶β X (−1)|β| X xc − xlc y − xnc Alα hα+β β! h h α≥0 β≥0 µ l ¶ µ ¶ n n β X (−1)|β| X x − x c c y − xc Alα (−1)|α|+|β| hα+β β! h h α≥0 β≥0 µ l ¶ µ ¶β X 1 X xc − xnc y − xnc Alα (−1)|α| hα+β β! h h µ X X β≥0 ¶ X α≥0 β≥0 = xlc µ Cβln y − xnc h where, Cβln ¶β (43) µ l ¶ xc − xnc 1 X l |α| Aα (−1) hα+β = . β! h (44) α≥0 There are two sources of truncation. First we truncate the Hermite series coefficients Alα and then we truncate the coefficients Cβln . The error due to truncation of both the Taylor series and the Hermite series is bounded by à √ !d−k 2 d−1 µ ¶ ³ X √ p ´k ( 2r)p d Ql (45) √ √ 1 − ( 2r) ET H (p) ≤ ET (p) + p! (1 − 2r)2d k=0 k FMM tutorial April 8, 2006 Writing TRs is a character building process! √ P where Ql = xi ∈I1 (l) qi and r < 1/ 2. See Appendix 3 for the derivation. The total potential at y ∈ I1 (n) due to all sources is " #µ µ ¶ ¶β n β X XX X X y − x y − xnc c l ln ln b b (y) = bβ bβ Φ(y) = = . Φ C C h h ∀l where, ∀l β<p β<p · 9 (46) ∀l ¶ µ l 1 X l xc − xnc ln |α| b Cβ = A (−1) hα+β . β! α<p α h (47) The total computational complexity will be O((2n + 1)d dpd+1 Nbox ) + O(pd M ) + O(pd N ). 7. ALGORITHM A formal description of the algorithm and detailed analysis of the computational cost can be seen in the original paper [Greengard and Strain 1991]. 8. DATA STRUCTURES The first step of the√ algorithm is the spatial subdivision of the unit hypercube into d d Nside boxes of side 2rh were r < 1/2. The boxes are numbered from 1 to Nside in the order of increasing dimensionality d. Fig. 1 shows an example of spatial ordering in two dimensions. 12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3 (a) Fig. 1. Example of spatial ordering in two dimensions. We need the following operations to implement the FGT. (1) Given x = (x1 , x2 , . . . , xd ) ∈ Rd find the corresponding box number to which x belongs. d X j−1 BoxN umber = bxj Nside cNside . (48) j=1 (2) Given the box number find the center of the box. First write BoxN umber in base Nside . (BoxN umber)(10) = (ad ad−1 . . . a2 a1 )(Nside ) . The j th coordinate of the box center is given by √ 1 BoxCenterj = ( 2rh)(aj + ). 2 (49) (50) FMM tutorial April 8, 2006 10 · Vikas Chandrakant Raykar (3) List the (2n + 1)d neighbors of a given box. First write BoxN umber − 1 in base Nside . (BoxN umber)(10) = (ad ad−1 . . . a2 a1 )(Nside ) . (51) Increment or decrement each aj as follows bji = aj + i for i = 0, ±1, ±2, . . . , ±n. (52) The base Nside representation of the (2n + 1)d neighbors is as follows (bdid . . . b2i2 b1i1 )(Nside ) . (53) d where (id , . . . , i2 , i1 ) ∈ {0, ±1, ±2, . . . , ±n} . The box number is given by converting it into base 10 representation. (bdid . . . b2i2 b1i1 )(Nside ) = (BoxN umber)(10) (54) REFERENCES Baxter, B. J. C. and Roussos, G. 2002. A new error estimate of the fast gauss transform. SIAM Journal of Scientific and Statistical Computing 24, 1, 257–259. 2, 3, 13 Greengard, L. and Strain, J. 1991. The fast Gauss transform. SIAM Journal of Scientific and Statistical Computing 12, 1, 79–94. 2, 9, 13, 14 Hille, E. 1926. A class of reciprocal functions. The Annals of Mathematics, 2nd Ser. 27, 4 (Jun.), 427–464. 3 Raykar, V. C. 2005. A short primer on the fast multipole method. Available online on the author’s website. 1 FMM tutorial April 8, 2006 Writing TRs is a character building process! · 11 FMM tutorial April 8, 2006 12 · Vikas Chandrakant Raykar 9. APPENDICES 9.1 Appendix 1 [Hermite series truncation error] The error EH (p) due to truncating the series after pd terms is ¯ ¯ ¯X ¯ µ µ ¶¯ X ¶¯ l ¯ ¯ ¯ l ¯¯ y − x y − xlc ¯¯ c ¯ l ¯ ¯ ¯ ¯ EH (p) = ¯ Aα hα Aα ¯hα ¯≤ ¯ h h ¯α≥p ¯ α≥p µ ¶¯ d ¯ l X¯ ¯Y ¯ ¯ ¯hαj (y)j − (xc )j ¯ [(y)j is the j th coordinate of y.] ¯Alα ¯ = ¯ ¯ h j=1 α≥p ¯ ¯ µ ¶ ¯ d ¯ µ ¶¯ l α¯ Y X ¯¯ 1 X ¯ x − x (y)j − (xlc )j ¯¯ i c ¯ ¯ ¯ = qi ¯ α! ¯ ¯hαj ¯ h h ¯ j=1 α≥p ¯ xi ∈I1 (l) ¯ ¯ ¯µ ¯ ¶ d µ ¶¯ X ¯¯ X √ ¯ r αY 1 ¯¯ (y)j − (xlc )j ¯¯ ¯ ¯ ≤ qi ¯ √ hα j [Since the box side is 2rh] ¯ ¯ ¯ α ! h 2 j=1 j α≥p ¯xi ∈I1 (l) ¯ ¯ ¶¯ µ d Y X X ¯ (y)j − (xlc )j ¯¯ αj −αj /2 1 ¯ r 2 ≤ |qi | hα ¯ αj ! ¯ j h α≥p = Ql j=1 xi ∈I1 (l) X d Y rαj 2−αj /2 α≥p j=1 ¯ ¶¯ µ 1 ¯¯ (y)j − (xlc )j ¯¯ h α ¯ [Define Ql = αj ! ¯ j h X |qi |] xi ∈I1 (l) d XY l 2 2 1 p rαj 2−αj /2 2−αj /2 e−((y)j −(xc )j ) /2h [Cramer’s inequality] α ! j α≥p j=1 d d d XY XY XY 1 1 1 p rαj = Ql p r αj − p r αj ≤ Ql α ! α ! αj ! j j α<p j=1 α≥p j=1 α≥0 j=1 d X d X Y Y 1 1 p r αj − p rαj = Ql αj ! αj ! αj <p j=1 j=1 αj ≥0 d d X 1 X X 1 1 p r αj + p r αj p r αj = Ql − α <p αj ! αj ! αj ! αj <p α ≥p j ≤ Ql j k d−k X 1 1 p r αj p rαj = Ql [Binomial theorem] k α ! αj ! j αj <p αj ≥p k=0 k d−k d−1 µ ¶ X d X αj r p X αj √ r ≤ Ql r k p! α ≥0 α <p k=0 d−1 µ ¶ X d X j j ¶d−k rp 1 1−r √ [Geometric series r < 1.] 1−r p! 1 − r k=0 µ p ¶d−k d−1 µ ¶ X FMM tutorial April Ql 8, 2006 d r p k √ = (1 − r ) d (1 − r) k p! ≤ Ql d−1 µ X d k ¶µ k=0 ¶ µ p k (55) Writing TRs is a character building process! · 13 The error bound in the original FGT paper [Greengard and Strain 1991] was shown to be incorrect and a new bound was proposed by [Baxter and Roussos 2002]. This is the new error estimate as in [Baxter and Roussos 2002] but derived in a slightly different way. 9.2 Appendix 2 [Taylor series truncation error] The error ET (p) due to truncating the series after pd terms is (derivation very similar to EH (p)) ET (p) = ≤ = ≤ ≤ ¯ ¯ ¯ ¶ ¯ µ ¯ X ln y − xnc β ¯ ¯ ¯ Bβ ¯ ¯ h ¯β≥p ¯ ¯µ ¶ ¯ X¯ ¯ ¯¯ y − xnc β ¯¯ ln ¯Bβ ¯ ¯ ¯ ¯ ¯ h β≥p ¯ ¯ ¯ ¶¯ ¯µ ¶ ¯ µ n ¯¯ n β¯ X 1 ¯ X x − x y − x ¯ ¯ i c ¯ c ¯ q i hβ ¯ [Eq. 37.] ¯ ¯ ¯ ¯ ¯ β! ¯ h h ¯ β≥p xi ∈I1 (l) ½ ¯ µ ¶¯¾ ¯¯µ ¶ ¯ n ¯ n β¯ X X ¯ 1 x − x y − x ¯ ¯ i c ¯ c ¯hβ |qi | ¯ ¯ ¯ ¯ ¯ β! ¯ h h β≥p xi ∈I1 (l) ¶¯¾ ½ ¯ µ n ¯ X X ¯ x − x 1 i c ¯ β −β/2 ¯hβ r 2 |qi | ¯ β! ¯ h β≥p xi ∈I1 (l) ¯ µ ¶¯ d Y n ¯ ¯ 1 (x ) − (x ) i j j c ¯h β j ¯ rβ 2−β/2 = |qi | ¯ ¯ β ! h j j=1 β≥p xi ∈I1 (l) d Y X X 1 p 2βj /2 rβ 2−β/2 ≤ |qi | βj ! j=1 β≥p x ∈I (l) X X i X 1 d Y 1 p rβj βj ! β≥p j=1 µ p ¶d−k d−1 µ ¶ X Ql d r p k √ ≤ (1 − r ) (1 − r)d k p! = Ql (56) k=0 FMM tutorial April 8, 2006 14 · Vikas Chandrakant Raykar 9.3 Appendix 3 [S|R truncation error] Using the expression for Alα , Cβln can be simplified as follows. ¶ µ l 1 X l xc − xnc Aα (−1)|α| hα+β β! h α≥0 µ ¶α µ l ¶ 1 X 1 X xi − xlc xc − xnc |α| qi (−1) hα+β = β! α! h h α≥0 xi ∈I1 (l) µ l ¶ n X 1 µ xi − xl ¶α 1 X x − x c c c = qi (−1)|α| hα+β β! α! h h α≥0 xi ∈I1 (l) ¶ µ X 1 xi − xnc . = qi hβ β! h Cβln = (57) xi ∈I1 (l) Note that this is exactly the same as Bβln the R-expansion coefficients, which should be the case. There is a error in the original paper [Greengard and Strain 1991] where there is an extra (−1)|β| term. The error ET H (p) due to truncating both the series after pd terms is ET H (p) = = = ≤ = ¯ ¯ ¯ µ ¶ ¯ ¯ X ln y − xnc β ¯ bβ ¯ ¯ C ¯ ¯ h ¯β≥p ¯ ¯ ¯ ¯X ³ ´ µ y − xn ¶β ¯¯ ¯ c bβln − Cβln ) ¯ ¯ Cβln + (C ¯ ¯ h ¯ ¯β≥p ¯ ¯ ¯ µ ¶ ¶ ¯ µ n β¯ ¯ X ln y − xnc β X ln y − x c bβ − Cβln ) ¯ ¯ + (C Cβ ¯ ¯ h h ¯ ¯β≥p β≥p ¯ ¯ ¯ ¯ ¯ ¶ ¯ ¯ ¶ ¯ µ µ n β¯ ¯ X ln y − xnc β ¯ ¯ X ln y − x c b − C ln ) ¯+¯ ¯ [Triangle inequality.] ¯ Cβ (C β β ¯ ¯ ¯ ¯ h h ¯ ¯β≥p ¯ ¯β≥p ¯ ¯ ¯ ¶β ¯ µ ¯ ¯ X ln y − xnc ln b ¯ ¯ ET (p) + ¯ (58) ( Cβ − Cβ ) ¯ h ¯ ¯β≥p FMM tutorial April 8, 2006 Writing TRs is a character building process! · 15 b ln can be simplified as follows C β ¶ µ l 1 X l xc − xnc ln |α| b Cβ = A (−1) hα+β β! α<p α h µ µ l ¶α ¶ 1 X 1 X xi − xlc xc − xnc = qi (−1)|α| hα+β β! α<p α! h h xi ∈I1 (l) " ¶# µ l X 1 µ xi − xl ¶α 1 X xc − xnc c |α| = qi (−1) hα+β β! α! h h α<p xi ∈I1 (l) µ ¶ ¶ µ l l α n X X X 1 xc − xc 1 xi − xc = qi − (−1)|α| hα+β β! α! h h α≥0 α≥p xi ∈I1 (l) µ ¶ ¶ µ l l α n X X 1 1 xi − xc xc − xc = Cβln − qi (−1)|α| hα+β β! α! h h xi ∈I1 (l) = Cβln + bβln (C − Cβln ) α≥p (59) FMM tutorial April 8, 2006 16 = ≤ ≤ ≤ · Vikas Chandrakant Raykar ¯ ¯ ¯ ¯X ¶ ¯ X¯ ¶ ¯ µ ¯ ¯µ n β¯ n β¯ ¯ y − x ¯ b ln ¯ c ln ¯ ¯ y − xc ln ln b ¯ ¯ (Cβ − Cβ ) ¯(Cβ − Cβ )¯ ¯ ¯ ¯ ¯≤ ¯ ¯ h h ¯β≥p ¯ β≥p ¯ ¯ µ l ¶ ¯ ¯¯µ ¶ ¯ n n β¯ X 1 ¯¯ X X 1 µ xi − xl ¶α ¯ x − x ¯ c c c ¯ ¯ y − xc |α| ¯ qi (−1) hα+β ¯ ¯ ¯¯ ¯ β! ¯¯ α! h h h ¯ α≥p β≥p xi ∈I1 (l) ¯ ¯ ¯ ¯X µ ¶α ¶¯ µ ¶β ¯¯ µ l X X 1 ¯ 1 xi − xlc xc − xnc ¯¯ ¯¯ y − xnc ¯ |α| ¯ (−1) hα+β |qi | ¯ ¯ ¯ ¯¯ ¯ β! α! h h h ¯ ¯ α≥p β≥p xi ∈I1 (l) ¯ ¯ µ l ¶¯ µ ¶β ¯¯ X 1 X X 1 ¯¯µ xi − xl ¶α ¯¯ ¯ xc − xnc ¯¯ ¯¯ y − xnc ¯ c ¯ ¯ ¯ hα+β |qi | ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ β! α! h h h α≥p β≥p xi ∈I1 (l) ¯ µ l ¶¯ n ¯ X 1 X X 1 ¯ x − x c ¯ β −β/2 c |qi | rα 2−α/2 ¯¯hα+β ¯ r 2 β! α! h β≥p xi ∈I1 (l) α≥p ¶¯ µ l d ¯ Y n X X 1 X 1 ¯ ¯ (x ) − (x ) j j c c ¯hα +β ¯ rβ 2−β/2 |qi | ≤ rα 2−α/2 ¯ j j ¯ β! α! h j=1 α≥p β≥p xi ∈I1 (l) ¯ d ¯q Y X X 1 X 1 ¯ l n 2 2¯ ¯ (αj + βj )!2αj /2 2βj /2 e−((xc )j −(xc )j ) /2h ¯ rβ 2−β/2 |qi | ≤ rα 2−α/2 ¯ ¯ β! α! j=1 α≥p β≥p xi ∈I1 (l) d q Y XX 1 1 rα ≤ Ql (αj + βj )! rβ α! β! j=1 α≥p β≥p ≤ = ≤ ≤ ≤ ≤ p XX (αj + βj )! αj +βj Ql r αj !βj ! α≥p β≥p j=1 s s d XXY 1 (αj + βj )! αj +βj Ql r α !β ! αj !βj ! j j α≥p β≥p j=1 s s d d XXY XXY √ αj +βj 1 1 ³√ ´αj +βj αj +βj Ql 2 ≤ Ql 2r r αj !βj ! αj !βj ! α≥p β≥p j=1 α≥p β≥p j=1 s s d d ´ ´ ³ ³ Y X XY α β √ √ j j 1 1 Ql 2r 2r α ! β j j! α≥p j=1 β≥p j=1 s à √ !d−k d d−1 µ ¶ ³ XY X √ p ´k ( 2r)p 1 ³√ ´αj 1 d √ √ Ql 2r 1 − ( 2r) αj ! p! (1 − 2r)d k=0 k α≥p j=1 à √ !d−k 2 d−1 µ ¶ ³ X √ p ´k ( 2r)p √ d Ql [r < 1/ 2] √ √ 1 − ( 2r) p! (1 − 2r)2d k=0 k d Y FMM tutorial April 8, 2006 (60)