800 likes | 942 Views
USolids Multi-Union notes. Marek Gayer. GetCandidatesVoxelArray method, on which Inside very much relies. Finding voxel candidates did not delete buffers Error in creating the final bit mask:
E N D
USolids Multi-Union notes Marek Gayer
GetCandidatesVoxelArray method, on which Inside very much relies • Finding voxel candidates did not delete buffers • Error in creating the final bit mask: • ((unsigned int *)(maskResult))[iIndex] = ((unsigned int *)(maskX))[iIndex] & ((unsigned int *)(maskY))[iIndex] & ((unsigned int *)(maskX))[iIndex]; • Was possible to merge with Intersect method • => No buffers was in the end necessary at all • Optimization and reducing of original 140 lines of code to 67 • Performance of new method faster in hundreds % • Root timer precision is +-0.01s =>Number of points tested increased to 10M
New script to measure scalability based on splitting of Multi-Union integration with Root graphics • void TestMultiUnion() • { • // TestMultiUnionWithGraphics(); return; • int numNodes[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 34, 36, 38, 40, 45, 50, 60, 70, 80, 90, 100, 500, 1000, 5000, 8000}; • int numNodesCount = sizeof (numNodes) / sizeof (int); • // numNodesCount -= 3; • ofstream nodes("nodes.dat"), times("times.dat"); • for (int i = 0; i < numNodesCount; i++) • { • int numNodesImpl = numNodes[i]; • double chrono = TestMultiUnionOneStep(numNodesImpl); • nodes << numNodesImpl << endl; • times << chrono << endl; • } • nodes.close(), times.close(); • printf("[> END\n"); • }
TestMultiUnionOneStep (listed simplified)based on splitting of Multi-Union integration with Root graphics • doubleTestMultiUnionOneStep(intnumNodesImpl) • { • UMultiUnion*multiUnion = CreateMultiUnion(numNodesImpl); • multiUnion->Voxelize(); • DisplayMultiUnionInfo(multiUnion); • doublebmin[3], bmax[3]; • multiUnion->Extent(bmin, bmax); • intnpoints = 10000000; • int n10 = npoints/10; • UVector3 point; • TStopwatch timer; • timer.Start(); • for (inti = 0; i < npoints; i++) • { • if (numNodesImpl > 100) • if (n10 && (i % n10)==0) printf("test inside ... %d%%\n",int(100*i/npoints)); • point.x = gRandom->Uniform(-2000, 2000); • point.y = gRandom->Uniform(-2000, 2000); • point.z = gRandom->Uniform(-2000, 2000); • VUSolid::EnumInside result = multiUnion->Inside(point); • } • timer.Stop(); • doublechrono = timer.CpuTime(); • returnchrono; • }
Scaling of Multi-Union Inside: source with original conditions by Jean-Marie
Scaling of Multi-Union Inside: source with original conditions by Jean-Marie
Multi-Union with 21 boxes. Faster than one box in original measurement
Scaling of Multi-Union – fixed the ranges to be really [-2000; 2000]
Profiling of Inside method content (listed simplified) • VUSolid::EnumInsideUMultiUnion::Inside(const UVector3 &aPoint) const • { • UVector3 tempPointConv; • VUSolid::EnumInsidetempInside = eOutside; • boolboolSurface = false; • vector<int> vectorOutcome; • vectorOutcome = fVoxels->GetCandidatesVoxelArrayOld(aPoint); • fVoxels->GetCandidatesVoxelArray(aPoint, vectorOutcome); • int limit = vectorOutcome.size(); • for(int i = 0 ; i < limit ; i++) • { • VUSolid*tempSolid= ((*fNodes)[vectorOutcome[i]])->fSolid; • UTransform3D *tempTransform= ((*fNodes)[vectorOutcome[i]])->fTransform; • tempPointConv= tempTransform->LocalPoint(aPoint); • tempInside = tempSolid->Inside(tempPointConv); • if(tempInside == eSurface) boolSurface = true; • if(tempInside == eInside) returneInside; • } • tempInside= boolSurface ? eSurface : eOutside; returntempInside; • }
valgrind --tool=callgrind root.exe run.Ckcachegrind [“New Implementation”]std::vector<int> GetCandidatesVoxelArray(const UVector3 &point);
valgrind --tool=callgrind root.exe run.Ckcachegrind [“New Implementation 2”]void GetCandidatesVoxelArray(const UVector3 &point, std::vector<int> &list);
Touch up of sources • Added missing deletes • Adding references to objects in methods headers • double* arrMin = new double[3] => double arrMin[3] • In debug mode, all results are compared to InsideDummy to make sure of consistency • iIndex, jIndex etc. => i,j • Optimization of loops • Removing underscores • Avoiding calling Inside 2x times in Normal method • Several similar, small changes
New performance setup • All boxes are located in -1 to 1 for xyz container • Volume ratio of test are / boxes is same, ratio of inside vs. outside points is a bit lower than ½, because boxes can collide each other • Random points located in -1.1 to 1.1 for xyz
Arrays improvements 1/2 • Replace C style dynamic arrays (e.g. double *) with new and detete • Using instead resizable std::vector<T> for boxes and boundaries • => no need for various temporary variables • Verified that performance did not drop • No new and delete constructs => no possible memory leaks by missing or wrong deletes
Arrays improvements 2/2 • No indexes like “boxes[6*i + 3 + k]” anymore, since structure used: • structUVoxelBox • { • UVector3 hlen; // half length of the box • UVector3 pos; // position of the box • }; std::vector<UVoxelBox> boxes;
Bitmasks abstrastion • Bitmasks have special class UBits, based on adaptation of ROOT Tbits • One for each whole axis • Other alternatives, problematic for one or another reason… • boost dynamic array: • Vector<bool>, • bitset<1> or other online free alternatives • Makes more clean and clear code
Exclusions (“crossings”) • Ubits data structure is used for making additional exclusion masks (and type) for excluding selected candidates • Used in DistanceToIn and DistanceToOut • Local var passed as parameter => thread safe • Usage philosophy example: • UBits exclusion(voxels->GetBitsPerSlice()); • exclusion.ResetAllBits(true); // set all bits to 1 • exclusion.ResetBitNumber(intexcludedCandidate=1); • VUSolid::EnumInside location = InsideWithExclusion(currentPoint, &exclusion); • exclusion.SetBitNumber(candidate);
Voxels bit masks memory padding • Changed bitmask sizes increase for each voxel from 8bits (size of char) to 32bit/64bit (size of int) • Faster memory addressing (each mask starts 4/8 bytes for each bitmask) and algorithm makes increase up to 10% in performance • Possible to make faster loops, because we do not have to count bits • We can and masks and compare them to zeros using whole unsigned int
1/3 Fast routines for finding set bit components of bitmask in u. int • for (int bit = 0; bit < (int) (8*sizeof(unsignedint)); bit++) • { • if (mask & 1) • { • list.push_back(8*sizeof(unsignedint)*i+bit); • if (!(mask >>= 1)) break; // new • } • else mask >>= 1; • }
2/3 Fast routines for finding set bit components of bitmask in u. int • for (int byte = 0; byte < (int) (sizeof(unsignedint)); byte++) • { • if (int maskByte = mask & 0xFF) • { • do • { • staticconstint firstBits[256] = { • 8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0, • 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0}; • int bit = firstBits[maskByte]; • list.push_back(8*(sizeof(unsignedint)*i+ byte) + bit); • maskByte -= 1 << bit; • } • while (maskByte); • } • mask >>= 8; • }
3/3 Fast routines for finding set bit components of bitmask in u. int • for (int byte = 0; byte < (int) (sizeof(unsignedint)); byte++) • { • if (intmaskByte = mask & 0xFF) • { • for (int bit = 0; bit < 8; bit++) • { • if (maskByte & 1) • { • list.push_back(8*(sizeof(unsignedint)*i+ byte) + bit); • if (!(maskByte >>= 1)) break; // new • } • elsemaskByte >>= 1; • } • } • mask >>= 8; • }
Preferring references • Except for bitmasks values, using references instead of pointers because in C++ “References are the preferred way of indirectly accessing a variable”. • [ see e.g. C/C++ Pointers vsReferences, http://www.dgp.toronto.edu/~patrick/csc418/wi2004/notes/PointersVsRef.pdf ]
Other effort at VoxelFinder/MultiUnion 1/2 • Renamed all and removed some unnecessary fields in voxel finder • UVoxelFinder is separated and independent on UMultiUnion and can be used separately • UVoxelFinderUsed as common C++ property of UMultiUnion • Overall cleanup and reducing of code • Fixed wrong construction of bit masks, which for certain points was making inconsistencies
Other effort at VoxelFinder/MultiUnion 2/2 • Boundaries, bit slices counts and bitmasks are stored as arrays [0..2] • This allowed to rewrite all loops to for cycles, instead of having special codes for each axis => significant reducing of code + readability • Separate cycles for x, y, z merged when possible to one. • Removed all “1e-4” type constants • No need to assign fields in constructor, their own constructor properly initializes • Removed UNode class
Inside • => Inside very fast, integrity is checked vs. InsideDummy, and is100% conformal • Based on calling InsideWithExclusion with NULL as exclusion mask
Safety’s • Their algorithm is simple • With utilization of TGeoShapeAssembly::Safety and TGeoVoxelFinder, checkup and small fixes of J.M.’s Safety’s
Normal • Code is straightforward • Based on J.M. code + TGeoShapeAssembly::ComputeNormal • If there is a solid from union where point on surface, return normal from that point • Otherwise, find solid with smallest SafetyFromInside/ SafetyFromOutside to the point and return the normal for that solid and point
DistanceToIn • Integrity is checked vs. DistanceToInDummyand is100% conformal within VUSolid::Tolerance() • Checked TGeoShapeAssembly::DistFromOutside + incomplete J.M. code + comments from Andrei • Clear code divided to methods: voxels.DistanceToFirst, voxels.DistanceToNext, voxels.GetCandidatesVoxelArray and DistanceToInCandidates • Algorithm based on calculating DistanceToNextVoxel in voxel finder: • Finding distance (“shift”) to first voxel on the direction • Shifting the point • Finding solids contained in the voxel for shifted point (candidates), if any • Computing minimum distance from original point for all candidates using solid->DistanceToIn() • From now, do not take into account candidates that was already checked • If distance less than shift to next voxel return it, otherwise update minimal distance and go to step 2. • Return minimal distance (or infinity if it was not set)
DistanceToOut • Integrity is checked vs. DistanceToOutDummy, and is100% conformal within VUSolid::Tolerance() • Improved non-functional J.M. code + comments from Andrei • Based on: • Find solid from the voxel where the point is located on surface or inside with maximum distance to out, otherwise return -1 (we are outside) • Propagate the point using DistanceToOutof such solid • Cumulate the returned distance • If inside (exclude the current solid when figuring it out) another solid, go to 1., but there ignore the solid that we just passed from • Return cumulated distance • If we were outside, take into consideration voxels located +-tolerance on all axis from given point
Tests • SBT cannot be used directly, no Geant4 multi-union • Optical Escape used. Tests Inside, Safety’s, Distance’s but not Normal method • It’s passing tests for boxes and orbs with count of objects 1 - 10000
Optical Escape, for Orbs/Boxes multi-unions, no particles escape
Optical Escape, for Orbs/Boxes multi-unions, no particles escape
Questions • Is it possible to make .q in root in a script? • (root -b -q run.C) • VUSolid::Extent. Why not use UVector3 instead of double[3] • Maybe make [] operator for UVector3? • Uvector3 (usolids::tolerance, usolids::tolerance, usolids::tolerance) • UVector3 (double x=usolids::tolerance , x, x); • If you have this constructor, Uvector3(double) • Then this will be automatically done by compiler (no warning) • Uvector3 Vec = (Uvector3) usolids::tolerance;
Questions • UVector3 /= operator, Set(value), SetNull(), • Operator[] • Using UVector3 inside loops
Voxel cells tolerance padding, by adding tolerance to solid extents • Added tolerance to extent of each solid, from which boundaries are being created • This removes many errors which are gained on points on borders, for example in case of Inside and Distance* methods. No hack attempts to get it work necessary anymore. • Also it allows significant simplification of the core routine GetCandidatesVoxelArray
Boxes with tolerance padding vs. no padding performance • The only practical difference is, that in worst case the number of boundaries doubles • But because we use binary search, it does not have any significant performance impact • Up to 4 percent for number of boxes > 1000 for inside e.g. {1.28 2.08 2.9} vs. {1.26 2 2.83} • I have modified J.M. boxes test, to have zero spacing with each other and run test 1:10000
UUtils::BinarySearch (OLD) • //______________________________________________________________________________ • intUUtils::BinarySearch(int n, constdouble *array, double value) • { • // Binary search in an array of doubles. If match is found, function returns • // position of element. If no match found, function gives nearest • // element smaller than value. • //if (array[n-1] == value) return n - 2; // patch, let us discuss it • intnabove, nbelow, middle; • nabove = n+1; • nbelow = 0; • while(nabove-nbelow > 1) { • middle = (nabove+nbelow)/2; • if(value == array[middle-1]) • { • nbelow = middle; • break; • } • if (value < array[middle-1]) nabove = middle; • elsenbelow = middle; • } • return nbelow-1; • }
UUtils::BinarySearch (NEW) • intUUtils::BinarySearch(conststd::vector<double> &vec, double value) • { • vector<double>::const_iterator begin=vec.begin(), end=vec.end(); • int res = upper_bound(begin, end, value) - begin - 1; • #ifdef DEBUG • intresold = UUtils::BinarySearch(vec.size(), &vec[0], value); • if (res != resold) • res = resold; • #endif • return res; • }
Using std::upper_bound for binary search instead of UUtils::BinarySearch
Commented out this method, using std::sort better • voidUUtils::Sort(int n, constdouble* a, int* index, bool down) • { • // Sort the n elements of the array a of generic templated type Element. • // In output the array index of type Index contains the indices of the sorted array. • // If down is false sort in increasing order (default is decreasing order). • // NOTE that the array index must be created with a length >= n • // before calling this function. • // NOTE also that the size type for n must be the same type used for the index array • // (templated type Index) • for(int i = 0; i < n; i++) { index[i] = i; } • if ( down ) • std::sort(index, index + n, CompareDesc<constdouble*>(a) ); • else • std::sort(index, index + n, CompareAsc<constdouble*>(a) ); • } • => std::sort(boundary.begin(), boundary.end());
Vector access operator • inlinedouble UVector3::operator[] (int index) const • { • // TODO: test performance of both versions on Linux • if (true) • { • doublevec[3] = {x, y, z}; • returnvec[index]; • } • switch (index) • { • case 0: return x; • case 1: return y; • case 2: return z; • default: return 0; • } • } • inlinedouble &UVector3::operator[] (int index) • { • switch (index) • { • case 0: return x; • case 1: return y; • case 2: return z; • default: return x; • } • }
Integrated multi-union into SBT • As a primary solid, G4UnionSolid is created • In data-analysis and performance test, we traverse through this solid and we create • UMultiUnion • TGeoCompositeShape • Advantage, by adding a new test, it is propagated to all 3 solid
Polyhedron received from Geant4 -figure; sbtpolyhedra(Normal);
DistanceToIn was not conformal • Problem #1 – DistanceToFirst did not work • => solution bounded multi-union extent, wrapped as Ubox, calling DistanceToIn • doubleUVoxelFinder::DistanceToFirst(UVector3 &point, const UVector3 &direction) const • { • UVector3 pointShifted = point - boundingBoxCenter; • double shift = boundingBox->DistanceToIn(pointShifted, direction); • returnshift; • } • Problem #2 – DistanceToNextVoxel not working. Rounding problems + bad algorithm • I consider OpticalEscape to be less suitable for validity testing than SBT