Très courte présentation du groupe de Perpignan

Très courte présentation du groupe de Perpignan

Personnes impliquées • Permaments : • Philippe Langlois (30 %) • Marc Daumas (20 %) • David Defour (10 %) • Non-Permanents : • Nicolas Louvet • Sylvain Collange

État d’avancement • Preuves automatiques • M. Daumas • Utilisation des statistiques pour caractériser le comportement de code flottant • Spécification • M. Daumas, D. Defour • Arithmétiques exotiques (GPU) • Algorithmes pour l’évaluation d’expression flottante • N. Louvet, P. Langlois, M. Daumas, D. Defour • Algorithmes compensés • Approximation bivariée à base de table

Statistiques et erreurs en arithmétique flottante

Systems are now running fast enough and long enough for their errors to impact on their functionality • Worst case analysis is meaningless for applications that run for a long time • For example • A process adds numbers in ±1 to single precision • Each addition produces a round-off error of ± 2-25 • This process adds 225 items • The accumulated error is ±1 • Note that • 10 hours of flight time • At operating frequency of 1 kHz • Is approximately 225 operations • Provided round-off errors are not correlated, the actual accumulated error will be much smaller

FAA regulations for aircraft require • Probability of an error be below 10-9 for a 10 hour flight • Provides a bound on the number of numeric operations (fixed or floating point) that can safely be performed before accuracy is lost • Important implications for control systems with safety-critical software • Worst-case analysis would blindly advise the replacement of existing systems that have been successfully running for years • Set of formal theorems validated by the PVS proof assistant • Allow code analyzing tools to produce formal certificates

Some easy ways to obtain worst case behavior • Systematic ad-hoc errors may lead to the slow accumulation of small quantities of the same sign • Biased measures • Synchronized time shift

Developing probabilities on floating point arithmetic • Formal proof assistants such as ACL2, HOL, Coq and PVS are used in areas where • Errors can cause loss of life or significant financial damage • Common misunderstandings can falsify key assumptions • Developments in probability share many features with developments in floating point arithmetic: • Each result usually relies on a long list of hypotheses and slight variations induce a large number of results that look almost identical • Most people want a trustworthy result but they are not proficient enough to either select the best scheme or detect minor faults that can quickly lead to huge problems • Validation of a safety-critical numeric software using probability should be done using an automatic proof checker

The Central Limit Theorem in action (n = 1, 2 or 5)

Limitations of the Central Limit Theorem to target probability 10-9 (n = 5, 40, 100 or 200)

Arithmétiques exotiques

Problématique • Notre expertise l’arithmétique IEEE-754 : • Cadre très précis • Précision, arrondi, gestion des exceptions • Portabilité • Nouvelles architectures : (GPU) • Ne respecte pas la norme • Gestion des arrondis et des exceptions • Problématiques : • Comment vont ce comporter les algorithmes sur ces architectures • Est-il possible de définir des algorithmes robustes ?

Caractéristiques de l’arithmétique des GPU • Dépendant de la génération et des constructeurs • Plusieurs unités de calcul • 3 MAD • A.x + B • 1 unité pour le calcul des fonctions spéciales (exp, log, cos, sin, 1/x, 1/x) • 1 interpolateurbilinéaire, trilinéaire, anisotropique • Exemple : a.x + b.y , a0.(a1.x1 + b1.y1) + b0.(a2.x2 + b2.y2) • 1 unité de mélange • Exemple : r = a.r + b.y • Chaque unité se situe le long d’un pipeline • Contrainte sur leurs utilisations

Bloc diagramme d’un GPU Command & data fetch Vertex Shader Cull/Clip/Setup Rasterization Z-Cull SharedL2 textureCache Pixel Shader Fragment pixel crossbar Z-compare & Blend Memorypartition Memorypartition Memorypartition Memorypartition GDDR 3128 Mo GDDR 3128 Mo GDDR 3128 Mo GDDR 3128 Mo

FP32VectorUnit Vertex Shader programmable Vertex data VLIW MIMD 4 voies- MAD VertexTexture Fetch FP32ScalarUnit L1 cache 1 voiesin,cos,log,exprcp, rsq Shared L2 texture Cache Branch Unit • Vertex engine : • Multithread • Branchement sanspénalité • 2 inst. / cycle • 9 FLOPS Primitive Assembly Viewport Processing Mémoire de Texture Triangle setup

FP32VectorUnit FP32VectorUnit Pixel Shader Texture data Pixel data MADDSIMD 4 voies + calcul adresse de texture +Mini ALU +Normalisation FP16 Mip-mapping Filtrage FPTexture Processor CacheL1 Mini-ALU MADD SIMD 4 voies +Mini ALU Shared L2 texture Cache Mini-ALU • Pixel engine : • Multithread • SIMD Branch Unit Fog Unit Mémoirede Texture Fragment pixel Crossbar

Notre travail • Caractérisation des MAD • A.x + B avec arrondi au milieu ( FMA) • Mode d’arrondi : troncature • Nombre de bit supplémentaire entre 0 et 2 • Multiplication sans le calcul de tous les produits partiels • Ajout éventuel d’une constante de biais • Pas de gestion des dénormalisés ( 0) • Pas de qNaN • Précision • Définition d’algorithmes float-float fonctionnels

Réflexion • Objectifs : • Définir des algorithmes « robustes » en l’absence de standard flottant • Quantifier le surcoût induit • Exemple : • Addition / multiplication float-float avec arithmétique faithfull D. M.Priest, On properties of floating point arithmetic's: Numerical stability and the cost of accurate computations. Phd Thesis, 1992

Opérateurs Float-Float D. M.Priest, Algorithms for arbitrary precision floating point arithmetic, Proceedings of the 10th IEEE Symposium on Computer Arithmetic (Arith-10), 1991

Arithmétique flottante sur GPU

Très courte présentation du groupe de Perpignan

Très courte présentation du groupe de Perpignan

Presentation Transcript