# A Comprehensive Study of Real-World NumericalBug Characteristics真实世界数值的综合研究 缺陷特征

A Comprehensive Study of Real-World Numerical Bug Characteristics Anthony Di Franco*, Hui Guo*, and Cindy Rubio-González Department of Computer Science University of California, Davis, USA {acdifranco, higuo, crubio}@ucdavis.edu Abstract—Numerical software is used in a wide variety of ap- plications including safety-critical systems, which have stringent correctness requirements, and whose failures have catastrophic consequences that endanger human life. Numerical bugs are known to be particularly diffi cult to diagnose and fi x, largely due to the use of approximate representations of numbers such as fl oating point. Understanding the characteristics of numerical bugs is the fi rst step to combat them more effectively. In this paper, we present the fi rst comprehensive study of real-world numerical bugs. Specifi cally, we identify and carefully examine 269 numerical bugs from fi ve widely-used numerical software libraries: NumPy, SciPy, LAPACK, GNU Scientifi c Library, and Elemental. We propose a categorization of numerical bugs, and discuss their frequency, symptoms and fi xes. Our study opens new directions in the areas of program analysis, testing, and automated program repair of numerical software, and provides a collection of real-world numerical bugs. I. INTRODUCTION Numerical software provides the foundation for a wide variety of software applications, including safety-critical sys- tems such as control systems for vehicles, medical equipment, and industrial plants. Libraries for machine learning (e.g., TensorFlow, scikit-learn), computer graphics (e.g., OpenGL), computer vision (e.g., OpenCV), and data analysis (e.g., Pandas) rely on numerical libraries such as NumPy [7], SciPy [24], and LAPACK [6] to perform numerical calculations. Domain- specifi c languages such as GNU Octave for scientifi c pro- gramming and R for statistical computing integrate numerical libraries to provide support for numerical computations. Numerical software relies heavily on fl oating point arithmetic, which brings additional challenges in terms of software reliability and performance. Floating point is a widely used representation that approximates real numbers. By nature, fl oating point introduces imprecision in numerical calculations. Sources of numerical errors include extreme sensitivity to roundoff, fl oating-point exceptions such as overfl ow/underfl ow, and nonreproducibility across machines or even across runs on the same machine. This has led, in part, to numerical bugs that have caused catastrophic failures [2, 4, 8, 41]. Numerical bugs are those related to either the fi nite approxi- mate representation of real numbers, or to mathematical aspects of the computation. Techniques have been proposed to estimate roundoff error [18,20,36], generate inputs that maximize errors [14 ], or trigger fl oating-point exceptions [11], and to detect *Both authors contributed equally. accuracy [9,12,45] and instability problems [10,26]. However, there is a gap of knowledge in understanding the characteristics of numerical bugs in real-world numerical software. What are the most common numerical bugs? How prevalent are these bugs? Are existing tools capable of detecting them? How are such bugs fi xed? Are existing program repair tools suitable to fi x them? This paper takes a fi rst step towards answering these questions by conducting the fi rst comprehensive study of real-world numerical bug characteristics. The goal of this paper is to study the causes, symptoms, and fi xes of numerical bugs in numerical libraries, and provide a high-level categorization that can serve as a guide for researchers to develop tools for fi nding and fi xing numerical bugs. Empirical studies of bug characteristics have been conducted in the past to learn about concurrency bugs [27,33], perfor- mance bugs [23,40], and error handling bugs [13,16], among many others. These studies have revealed patterns for both bug detection and bug fi xing. To the best of our knowledge, no study thus far has focused on numerical bugs, or has considered inspecting numerical libraries, despite the reputation of being particularly subtle and error prone. We faced a number of challenges while conducting this study. First, numerical bugs are not as numerous in general- purpose applications, which motivated us to focus our search on numerical libraries, in particular, NumPy, SciPy, LAPACK, the GNU Scientifi c Library (GSL) [5], and Elemental [37]. Second, examining bugs from these libraries was particularly challenging due to three main reasons: (1) the libraries use distinct version control systems and bug tracking systems (if any), (2) the libraries range from 7 to 25 years old, thus not all bugs are documented in the same manner, (3) the libraries are implemented in several different programming languages (Python, C/C++, and Fortran,) often with several languages involved in the same library. Finally, we found that, in many cases, examining and classifying numerical bugs required signifi cant domain knowledge, and understanding of the code under inspection. We identify and carefully examine a total of 269 numerical bugs. We propose to categorize numerical bugs into accuracy bugs, special-value bugs, convergence bugs, and correctness bugs. We fi nd that correctness bugs are the most frequent (37%) in our dataset, followed by special-value bugs (28%), convergence bugs (21%), and accuracy bugs (14%). We discuss their characteristics, including common symptoms and fi xing strategies, and present real-world examples. Finally, based on our fi ndings, we discuss new research directions to develop tools to improve the reliability and effi ciency of numerical software. This paper makes the following contributions: ?We identify and examine 269 real-world numerical bugs found across fi ve widely-used numerical libraries: NumPy, SciPy, LAPACK, GNU Scientifi c Library (GSL), and Elemental (Sections III and IV). ?We present a categorization of numerical bugs, and discuss their symptoms and fi xes (Section IV). ?We discuss new directions to test and analyze numerical programs (Section V). The rest of the paper is organized as follows. Section II presents background on fl oating point, Section VI discusses related work, and Section VII concludes. II. FLOATINGPOINTPRELIMINARIES Floating point is the most common representation of real numbers. Unfortunately, fl oating point requires navigating many subtle tradeoffs and inevitably introduces semantics that differ from those of the reals. Indeed, many of the diffi culties faced by numerical programmers are inherent in the semantics of fl oating point. Here we review the IEEE 754 standard [1] for fl oating point and its semantics. A fl oating-point value is one of the following: ?Anumberrepresented asscbqwheresis the sign (±1), c is the coeffi cient or signifi cand, andqis the exponent. The base or radixbis usually 2 (binary), though the standard also describes a base 10 (decimal) representation. The signifi cand is represented in the same base. Binary formats range in size from half precision (16 bits) to octuple precision (256 bits) in power-of-two bit widths. Most computations are carried out in single (32 bit) or double (64 bit) precision in practice. ?The infi nities ±∞ representing infi nite quantities. The sign of an infi nity will correspond to the signs of the operands used to produce it, i.e.,1/0and?1/ ? 0yield ∞ while ?1/0 and 1/ ? 0 yield ?∞. ?TheNaN valuerepresenting a result that is not a representable number, such as the result of an attempt to take the square root of a negative number.NaNs generally propagate through computations once they occur, and come in quiet and signaling varieties to implement sensible exception-generating behavior given this propagation. Because the signifi cand is scaled by a function of the exponent, the absolute precision of representable numbers varies with the exponent over their range. The width of the interval enclosed by a number with its last signifi cand bit set to zero and the same number with its last signifi cand bit set to one is called a Unit in the Last Place or ULP. Overfl ow, underfl ow, and subnormal numbers.Results of computations that exceed the greatest representable number in magnitude in the current precision are said to overfl ow, and those that fall between zero and the smallest representable nonzero number are said to underfl ow. TABLE I FLOATING-POINTEXCEPTIONBEHAVIOR ExceptionCondition Default Result Underfl ow Result between smallest normal number and zero subnormal number Overfl ow Result larger than largest representable number in magnitude ±∞ Inexact Result was not exactly rep- resentable Rounded result Invalid Result was indeterminate or not representable as a number NaN Divide by zero Finite, nonzero number is divided by zero ±∞ A subnormal number uses an exponent of zero to indicate that the exponent is the lowest representable and that the signifi cand contains leading zero bits. This convention allows all remaining precision in the signifi cand to be gradually exhausted as representable numbers approach zero. Roundoff and truncation errors.Converting from a higher to a lower precision requires truncating the representations of the signifi cand and exponent to the lengths allowed by the lower precision, which reduces both the available dynamic range and precision. The loss of precision is called truncation error while the loss of dynamic range can result in overfl ow. Roundoff error is the consequence of needing to express the result of an operation in terms of representable numbers, which due to varying absolute precision across the representable range and other effects incurs errors even for operations on numbers that are themselves representable. Roundoff error of basic arithmetic operations is specifi ed to be at most one-half of an ULP, though some implementations that favor speed (such as GPUs) violate this requirement. Floating-point exceptions.Table I shows a summary of the fl oating-point exceptions. An underfl ow exception occurs when a result is too small to be represented by a normal number, while an overfl ow exception occurs when the result of an operation is too large to be represented. An inexact exception occurs when the result of a fl oating point operation was rounded. An invalid exception occurs when an indeterminate form, such as0/0is evaluated. A divide by zero exception occurs when division by zero is attempted. The environment may choose to mask exceptions, in which case the appropriate subnormal numbers, infi nities orNaNs are used to represent the results and no exception is raised. Catastrophic cancellation.Subtracting two nearly equal numbers means that most of the bits in their representations will cancel, incurring a large risk that the result will denormalize (underfl ow). The resulting approximation error will then propagate through the rest of the computation. Two notable techniques intended to address the diffi culties in dealing with the many undesired behaviors that can arise in computing with IEEE fl oating point and to track the error that accumulates in the course of computation are interval arithmetic, standardized as IEEE 1788 in 2015 [22] and unum arithmetic [21]. Interval arithmetic represents a compact set of real numbers by its bounds, which allows tracking the set of possible solutions computed by a program, whose size measures the error. Unum arithmetic is related, and uses representations that partition the real line into exactly representable points and open sets between them. In addition to building error tracking into the representations, unum arithmetic improves the semantics of operations involving infi nities orNaNversus IEEE 754 fl oating point. III. METHODOLOGY This section describes the critera for selecting numerical libraries and the bugs to be examined, and the threats to the validity of our study. A. Selection of Numerical Libraries We selected fi ve representative numerical libraries for our study: NumPy, SciPy, LAPACK, GSL, and Elemental. These libraries are characterized by their maturity (7 to 25 years old), popularity (used by thousands of projects), and active development. NumPy and SciPy are Python libraries that combine Python numerical code with wrappers for low-level routines written in C and Fortran, targeting general numerical and scientifi c computing, respectively. LAPACK is a C/Fortran library of low-level numerical routines, and a building block for many higher-level libraries such as NumPy. GSL mainly implements special functions and probability distributions essential to scientifi c computing. Finally, we searched for public GitHub C++ repositories using keywords associated with numerical computation. We ranked the results by number of stars (project popularity). The Elemental library was ranked fi rst. Elemental is a library that provides effi cient and general linear algebra routines suitable for numerical analysis, scientifi c computing, and work in theoretical mathematics, via support for features such as arbitrary precision, and large-scale distributed computation. For each library, Table II shows its language of implementa- tion, the earliest date for which data is available, whether it is hosted in GitHub, bug tracking system used, current size (LOC), number of commits, contributors, and releases. B. Selection and Characterization of Numerical Bugs We chose to examine bug reports (as opposed to commits) to have access to more bug information that includes original reports from users and developer discussions. We imposed a few requirements on our bug report selection to include bugs (1) confi rmed by developers (e.g., status is closed), (2) considered important (e.g., bug fi x is available), and (3) more likely to be numerical bugs. NumPy, SciPy, and Elemental are hosted on GitHub, and use GitHub’s integrated issue tracking system. This is often used by developers to track bugs, enhancements, and other tasks. Additionally, it includes GitHub pull requests. Pull requests are used by developers to contribute code (after review and approval) to a repository. LAPACK, by contrast, hosted its own development before version3.6.1, when it was migrated to GitHub (adopting GitHub’s issue tracking system). Thus, there are two sources of bug reports for LAPACK, (1) the netlib page, which lists the bugs fi led between LAPACK3.0(released in 2000) and LAPACK3.6.1(released in 2016); and (2) the GitHub repository’s issue tracker since LAPACK3.6.1. We refer to these as LAPACK1 (before GitHub) and LAPACK2 (after GitHub) through the rest of the paper. Finally, GSL maintains a Savannah bug tracking system. We found that all libraries, except for LAPACK1 and GSL, make use of labels to classify issues. In our selection, we fi ltered out any issues with labels related to build issues, documentation issues, and feature requests. After fi ltering out open issues, issues without patches, and issues without r