# Introduction

Figure 1. Total cost of one million computing operations over time. Data from Nordhaus Nordhaus_01. code
Figure 2. Storage cost, in US dollars per Mbyte, of mass market technologies over time. Data from McCallum McCallum_16. code
Figure 3. Initial growth of time-sharing systems available in the US. Data extracted from Glauthier Glauthier_67. code
Figure 4. Growth of transport and product distribution infrastructure in the USA (underlying data is measured in miles). Data from Grübler et al Grubler_91. code
Figure 5. Market capitalization of IBM, Microsoft and Apple (upper), and expressed as a percentage of the top 100 listed US tech companies (lower). Data extracted from the Economist website Economist_15. code
Figure 6. Total annual sales of computer families over the last 60 years. Data from Gordon Gordon_87 (mainframes and minicomputers), Reimer Reimer_12 (PCs) and Gartner Gartner_17 (smartphones). code
Figure 7. Total investment in tangible and intangible assets by UK companies, based on their audited accounts. Data from Goodridge et al Goodridge_14. code
Figure 8. Billions of dollars of worldwide semiconductor sales per month. Data from World Semiconductor Trade Statistics WSTs_16. code
Figure 9. Changing habits in men’s facial hair. Data from Robinson Robinson_76. code
Figure 10. Number of papers, in each year between 1987 and 2003, associated with a particular IT topic. The E-commerce paper count peaks at 1,775 in 2000 and in 2003 is still off the scale compared to other topics. Data kindly provided by Wang Wang_10. code
Figure 11. Normal distribution with total percentage of values enclosed within a given number of standard deviations. code

# Human cognitive characteristics

Figure 12. Unless cognition and the environment in which it operates closely mesh together, no problems are solved; the blades of a pair of scissors need to closely mesh for cutting to occur. code
Figure 13. The assumption of light shining from above creates the appearance of bumps and pits. Could be more convincing hemispheres with light shining from above and below… code
Figure 14. Probability that rat N1 will press a lever a given number of times before pressing a second lever to obtain food, when the target count is 4, 8, 12 and 16. Data extracted from Mechner Mechner_58. code
Figure 15. Boy/girl (aged 11-12 years) verbal reasoning, quantitative reasoning, non-verbal reasoning and mean CAT score over the three tests; each stanine band is 0.5 standard deviations wide. Data from Strand et al Strand_06. code
Figure 16. Rotate text in the real world, by tilting the head, or in the mind? code
Figure 17. Two objects paired with another object that may be a rotated version. Based on Shepard et al Shepard_71. code
Figure 18. Error rate, with standard error, for the left/right hand in a study of the SNARC effect. Data from Nuerk et al Nuerk_05. code
Figure 19. Structure of mammalian long-term memory subsystems; brain areas in red. Based on Squire et al Squire_15.
Figure 20. Percentage correct answers to questions about binary operator precedence against occurrence in source code. Data from Jones Jones_06a. code
Figure 21. Response time (left axis) and error percentage (right axis) on reasoning task with given number of digits held in memory. Data extracted from Baddeley Baddeley_09. code
Figure 22. Major components of working memory: working memory in yellow, long-term memory in orange. Based on Baddeley Baddeley_12. code
Figure 23. Yes/no response time (in milliseconds) as a function of the number of digits held in memory. Data extracted from Sternberg Sternberg_69. code
Figure 24. Parse tree of a sentence with no embedding, upper "S 1", and a sentence with four degrees of embedding, lower "S 4". Based on Miller et al Miller_64. code
Figure 25. Sequencing errors (as percentage) after interruptions of various length (red), including 95% confidence intervals, normal sequence error rate in green; lines are fitted model predictions. Data from Altmann et al Altmann_17. code
Figure 26. Semantic memory representation of alphabetic letters (the numbers listed along the top are place markers and are not stored in subject memory). Readers may recognize the structure of a nursery rhyme in the letter sequences. Derived from Klahr Klahr_83. code
Figure 27. Probability of correct recall of words by serial presentation order (each word visible for 1 or 2 seconds, last digit in legend). Data extracted from Murdoch Murdoch_62. code
Figure 28. Time taken to solve the same jig-saw puzzle 35 times, followed by a two-week interval and then another 35 times, with power law and exponential fits. Data extracted from Alteneder Alteneder_35. code
Figure 29. Completion times of eight solo (upper) and eight pairs (lower) for each implementation round, along with fitted equation…. Data kindly provided by Lui Lui_06. code
Figure 30. Subjects belief response curves for positive weak&endash; strong, negative weak&endash; strong, and positive&endash; negative evidence. Based on Hogarth et al Hogarth_92. code
Figure 31. Country boundaries distort judgement of relative city locations. Based on Stevens et al Stevens_78.
Figure 32. Orthogonal representation of shape, color and size stimuli. Based on Shepard Shepard_61.
Figure 33. The six unique configurations of selecting four times from eight possibilities, i.e., it is not possible to rotate one configuration into another within these six configurations. Based on Shepard Shepard_61.
Figure 34. Percentage of correct answers given by one subject, against boolean-complexity of category, colored by number of positive cases needed to define the category. Data kindly provided by Feldman Feldman_00. code
Figure 35. The Berlin and Kay Berlin_69 language color hierarchy. The presence of any color term in a language implies the existence, in that language, of all terms below it. Papuan Dani has two terms (black and white), while Russian has eleven (Russian may also be an exception in that it has two terms for blue.) code
Figure 36. Cup- and bowl-like objects of various widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2, 1.5, 1.9, and 2.4). The percentage of subjects who selected the term cup or bowl to describe the object they were shown (the paper did not explain why the figures do not sum to 100%). Based on Labov Labov_73. code
Figure 37. A commercial event involving a buyer, seller, money, and goods; as seen from the buy, sell, pay, or charge perspective. Based on Fillmore Fillmore_77. code
Figure 38. Lines of code correctly recalled after a given number of 2 minute memorization sessions; upper plot actual program, lower plot line order scrambled. Data extracted from McKeithen et al McKeithen_81. code
Figure 39. Examples of features that may be preattentively processed (parallel lines and the junction of two lines are the odd ones out). Based on Ware Ware_00.
Figure 40. Continuity&emdash; upper left plot is perceived as two curved lines; Closure&emdash; when the two perceived lines are joined at their end (upper right), the perception changes to one of two cone-shaped objects; Symmetry and parallelism&emdash; where the direction taken by one line follows the same pattern of behavior as another line; Proximity&emdash; the horizontal distance between the dots in the lower left plot is less than the vertical distance, causing them to be perceptually grouped into lines (the relative distances are reversed in the right plot); Similarity&emdash; a variety of dimensions along which visual items can differ sufficiently to cause them to be perceived as being distinct; rotating two line segments by 180°ree; does not create as big a perceived difference as rotating them by 45°ree;; TODO look good. code
Figure 41. Perceived grouping of items on a line may be by shape, color or proximity. Based on kubovy et al kubovy_08. code
Figure 42. Examples of unique items among visually similar items. Those at the top include an item that has a distinguishing feature (a vertical line or a gap); those underneath them include an item that is missing this distinguishing feature. Based on displays used by Treisman et al Treisman_85. code
Figure 43. The foveal, parafoveal and peripheral vision regions when three characters visually subtend 3°ree;. Based on Schotter et al Schotter_12. code
Figure 44. Local context can change the interpretation given to the surrounding symbols. code
Figure 45. Example object layout and the corresponding ordered tree produced from the answers given by one subject. Data extracted from McNamara et al McNamara_89. code
Figure 46. Heat map of one subject’s cumulative fixations (black dots) on a screen image. Data kindly provided by Ali Ali_12. code
Figure 47. The four cards used in the Wason selection task. Based on Wason Wason_68. code
Figure 48. Probability a subject will successfully distinguish a difference between the number of dots displayed and a specified target number (x-axis is the difference between these two values). Data extracted from van Oeffelen et al van_Oeffelen_82. code
Figure 49. Line locations chosen for the numeric values seen by each of four subjects; color of fitted loess line changes at one million boundary. Data kindly provided by Landy Landy_17. code
Figure 50. Number of errors, in 132 simple multiplication trials (e.g., $3\times7$), upper plot shows operand values (a loess fit in yellow) and lower plot result value (points where both operands have the same value are in blue). Data from Campbell Campbell_97. code
Figure 51. Number of change requests having a given recorded time to decide whether needed and to implement. Data from Basili et al Basili_84. code
Figure 52. One subject’s response time over successive blocks of command line trials and fitted loess (in green). Data kindly provided by Remington Remington_16. code
Figure 53. Risk neutral (green, $u(w)=w$), risk loving (red, quadratic) and risk averse (blue, square-root) utility functions. code
Figure 54. Subjects' estimate of their ability (x-axis) to correctly answer a question and actual performance in answering on the left scale. The responses of a person with perfect self-knowledge is given by the solid line. Data extracted from Lichtenstein et al Lichtenstein_77. code
Figure 55. Each row shows a scaled version of the three stripes, along with actual lengths in inches, from which subjects were asked to select the longest. Based on Asch Asch_56. code

Figure 56. Company revenue ($millions) against total software development costs. Data from Mulford et al Mulford_16. code Figure 57. Average Return On Invested Capital of various U.S. industries between 1992-2006. Data from Porter Porter_08. code Figure 58. Ratio of actual to estimated hours of effort to enhance an existing product, for 25 versions of one application. Data from Huijgens et al Huijgens_16. code Figure 59. Accounting practice for breaking down income from sales… code Figure 60. Average effort (in days) used to fix a defect detected in a given phase (x-axis) that had been introduced in an earlier phrase (colored lines), introduced in an earlier phase (total of 38,120 defects in projects at Hughes Aircraft). Data extracted from Willis et al Willis_98. code Figure 61. Months of developer effort needed to produce systems containing a given number of lines of code… Data from Gayek et al Gayek_04. code Figure 62. Introductory price and performance (measured using wPrime32 benchmark) of various Intel processors between 2003-2013. Data from Sun Sun_14. code Figure 63. Example supply and demand curves. code Figure 64. Rates at which product sales are made on Gumroad at various prices; lines join prices that differ in 1¢s;, e.g.,$1.99 and $2. Data from Nichols Nichols_13. code Figure 65. Growth of Github users during its first 58 months. Data from Irving Irving_16. code Figure 66. Sales of game software (solid lines) for the corresponding three major seventh generation hardware consoles (dotted lines). Data from VGChartz VGChartz_17. code Figure 67. Percentage of sales closed in a given week of a quarter, with average discount given. Data from Larkin Larkin_13. code Figure 68. Facebook’s ARPU and cost of revenue per user. Data from Facebook’s 10-K filings Facebook_14Facebook_16. code Figure 69. Top 100 software companies ranked by total revenue (in millions of dollars) and ranked by Software-as-a-Service revenue. Data from PwC PwC_13PwC_14PwC_16. code Figure 70. Various vendor’s retail price and upgrade prices for C and C++ compilers available under MS-DOS and Microsoft Windows between 1987 and 1998. Data kindly provided by Viard Viard_07. code Figure 71. Interval between product preannouncement date and its promised availability date against delay between promised date and actual date product became available. Data from Bayus et al Bayus_01. code # Ecosystems Figure 72. Total gigabytes of DRAM shipped world-wide in given year, along with shipments by device capacity (in bits). Data from Victor et al Victor_02. code Figure 73. Mean age of installed mainframe computers, 1968-1983. Data from Greenstein Greenstein_94. code Figure 74. Mobile phone operating system shipments, as percentage of total per year. Data from Reimer Reimer_12 (before 2007) and Gartner Gartner_17 (after 2006). code Figure 75. Maximum speed achieved by vehicles over the surface of the Earth and in the air, over time. Data from Lienhard Lienhard_06. code Figure 76. Number of transistors, frequency and SPEC performance of cpus when first launched. Data from Danowitz et al Danowitz_12. code Figure 77. Number of process model change requests made in three years of a banking Customer registration project. Data kindly supplied by Branco Branco_12. code Figure 78. Total instructions in the software shipped with various models of IBM computer, plus Datatron from Burroughs. Data extracted from Naur et al Naur_69. code Figure 79. Size of 40 operating systems (Kbytes, measured in 1975) capable of controlling a given number of unique devices, plus quadratic regression model. Data from Elci Elci_75. code Figure 80. Total value of custom and packaged software (hardware vendor+third-party) sales in the US. Data from Phister Phister_79. code Figure 81. Estimated number of comments written in German, in the LibreOffice source code. Data from Meeks Meeks_17. code Figure 82. Percentage of function definitions in embedded applications, the SPECint95 benchmark???, and the translated form of C source benchmark programs declared to have a given number of parameters. Data for embedded and SPECint95 kindly supplied by Engblom Engblom_99a, C book data from Jones Jones_05a. code Figure 83. Hours required to build a car radio after the production of a given number of radios, with break periods (shown in days above x-axis); lines are models fitted to each production period. Data extracted from Nembhard et al Nembhard_01. code Figure 84. Man-hours required to build a particular kind of ship, at the Delta Shipbuilding yard, delivered on a given date (x-axis). Data from Thompson Thompson_07. code Figure 85. Total computer systems purchased and rented by the US Federal Government in the respective fiscal years ending June 30. Data from US Government General Accounting Office Staats_71. code Figure 86. Yearly development cost and lines of code delivered to the US Air Force between 1960 and 1986. Data extracted from NeSmith NeSmith_86. code Figure 87. Total sales of various kinds of processors. Data from Hilbert et al Hilbert_11. code Figure 88. Monthly unit sales (in millions) of microprocessors having a given bus width. Data kindly supplied by Turley Turley_02. code Figure 89. TSMC revenue from wafer production, as a percentage of total revenue, at various line widths. Data from TSMC TSMC_17. code Figure 90. Number of new UK companies registered each month, whose SIC description includes the word software or computer (case not significant). Data extracted from OpenCorporates OpenCorporates_15. code Figure 91. Connections between companies in a Dutch software business network. Data kindly provided by Crooymans Crooymans_15. code Figure 92. Reported worldwide software industry Mergers and Acquisitions (M&A). Data from Solganick Solganick_16. code Figure 93. Loess fits to time taken to publish a RFC having Standard or non-Standard status, for IETF committees having a given percentage of commercial membership (people wearing suits). Data from Simcoe Simcoe_13. code Figure 94. Percentage of employment by US industry sector 1850-2009. Data kindly provided by Kossik Kossik_11. code Figure 95. Total value of bug bounties earned by researchers between 2014-2016. Data from Maillart et al Maillart_16. code Figure 96. Decade in which newly designed US Air Force aircraft first flew, with colors indicating current operational status. Data from Echbeth el at Eckbreth_11. code Figure 97. Daily minutes spent using an App, from Apple’s AppStore, … Data extracted from Ansar <book Ansar_1?>. code Figure 98. Number of optional features selected by a given number of flags. Data kindly provided by Berger Berger_12. code Figure 99. Cumulative percentage of configuration options impacting a given number of source files in the Linux kernel. Data kindly provided by Ziegler Ziegler_16. code Figure 100. Ratio of development costs to total five-year maintenance costs for 158 IBM software systems sorted by size; curve is a beta distribution fitted to the data (in red). Data from Dunn Dunn_11. code Figure 101. Number of software systems surviving to a given number of years and exponential equation fits. Data from Tamai Tamai_92. code Figure 102. Age of systems, developed using one of two methodologies, and corresponding monthly maintenance time, along with loess fits. Data extracted from Dekleva Dekleva_92. code Figure 103. Percentage of patches submitted to WebKit (34,535 in total) transitioning between various stages of code review. Data from Baysal et al Baysal_13. code Figure 104. Number of forked projects identified in Wikipedia during August 2011. Data from Robles et al Robles_12b. Figure 105. Percentage of code ported from NetBSD to various versions of OpenBSD, broken down by version of NetBSD in which it first occurred (denoted by incrementally changing color). Data kindly provided by Ray Ray_13. Figure 106. Survival curve for Linux distributions derived from various widely-used distributions. Data from Lundqvist et al Lundqvist_12. code Figure 107. Survival curve for packages included in the standard Debian distribution. Data from Caneill et al Caneill_14. code Figure 108. Number of pdf files created using a given version of the portable document format appearing on sites having a .uk web address between 1996 and 2010. Data from Jackson Jackson_12. code Figure 109. Percentage share of total Android market at days since launch for various versions of Android. Data from Villard Villard_15. code Figure 110. Words in Intel x86 architecture manuals and code-points in Unicode Standard over time. Data for Intel x86 manual kindly provided by Baumann Baumann_16. code Figure 111. Number of gcc compiler flags and options over time, and fitted regression models. Data from Fursin et al Fursin_14. code Figure 112. Number of new programming languages, per year, described in a published paper. Data from Pigott et al Pigott_15. code Figure 113. Number of monthly developer job related tweets specifying a given language. Data kindly provided by Destefanis Destefanis_14. code Figure 114. Number of projects making use of a given number of different languages in a sample of 100,000 GitHub project. Data kindly supplied by Bissyande Bissyande_13. code Figure 115. Ranked order of number of Android/Ubuntu (1.1 million apps)/(71,199 packages) linking to each supported POSIX function. Data from Atlidakis et al Atlidakis_16. code Figure 116. Survival curves for Debian package lifetime and for a package to contain its first dependency conflict. Data from Drobisz et al Drobisz_15. code Figure 117. Dependencies between the Java packages in various versions of ANTLR. Data from Al-Mutawa Al-Mutawa_13. code Figure 118. Fraction of source in 130 releases of Linux (x-axis) that originates in an earlier release (y-axis). Data extracted from png file kindly supplied by Matsushita Livieri_07. code Figure 119. Number of functions (in Evolution; the point at zero are incorrect counts) modified a given number of times (upper) or modified by a given number of different people (lower); red line is a straight line fit, green line a quadratic fit. Data from Robles et al Robles_12a. code Figure 120. Number of functions (in Evolution) modified a given number of times broken down by number of authors. Data from Robles et al Robles_12a. code Figure 121. Density plot of time interval, in hours, between each modification of a function in Evolution. Data from Robles et al Robles_12a. code Figure 122. Survival curves of clones in the Linux high/medium/low level SCSI subsystems. Data from Wang Wang_12. code Figure 123. Number of identifiers renamed, each month, in the source of Eclipse-JDT; version released on given date shown. Data from Eshkevari et al Eshkevari_11. code Figure 124. Changes in the number of tables in the Mediawiki and Ensembl project database schema over time. Data from Skoulis Skoulis_13. code Figure 125. Survival curve for tables in Wikimedia and Ensembl database schema. Data from Skoulis Skoulis_13. code # Projects Figure 126. Number of projects having a given duration (upper; 2,992 projects), producing a given number of SLOC (middle; 1,859 projects), and having a given percentage effort out sourced (lower; 1,267 projects). Data extracted from Akita et al Akita_12. code Figure 127. Firm bid price against schedule estimate, received from 14 companies, for the same tender specification. Data from Anda et al Anda_09. code Figure 128. Distribution of effort (person hours) during the development of four engine control systems projects, plus non-project work and holidays, at Rolls-Royce. Data extracted from Powell Powell_01. code Figure 129. Commits within a particular hour and day of week for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code Figure 130. Estimate given by three groups of subjects after seeing a statement by a middle manager containing an estimate (2 months or 20 months) or no estimate (control). Data from Aranda Aranda_05. code Figure 131. Estimated and actual project implementation effort. Data from Jørgensen Jorgensen_04b and Kitchenham et al Kitchenham_02. code Figure 132. Two estimates (in work hours), made by seven subjects, for each of six tasks. Data from Grimstad et al Grimstad_07. code Figure 133. Density plot of number of projects investing a given fraction of their total effort in a given project phase. Data kindly provided by Wang Wang_17. code Figure 134. Mean and median effort (hours) for projects having elapsed time between four and 20 months (lines a fitted quadratics). Data from Wang et al Wang_17. code Figure 135. Estimated project cost from 12 estimating models. Data from Mohanty Mohanty_81. code Figure 136. Elapsed weeks (x-axis) against effort in man-hours per week (y-axis) for a project, plus three fitted curves. Data extracted from Basili et al Basili_81. code Figure 137. Function points and corresponding normalised costs for 149 projects from one large institution. Data extracted from Kampstra el al <book Kampstra_0?>. code Figure 138. Cost per requirement, function point and story point for two projects, over 13 monthly releases. Data from Huijgens Huijgens_13. code Figure 139. Cost per requirement, function point and story point for two projects, over 13 monthly releases. Data from Huijgens Huijgens_13. code Figure 140. Percentage profit/loss on 145 fixed-price software development contracts. Data extracted from Coombs Coombs_03. code Figure 141. IBM’s profit margin on all System 360s sold in 1966, by system memory capacity in kilobytes; monthly rental cost during 1967 in parentheses. Data from DeLamarter DeLamarter_88. code Figure 142. COSMIC function-points and compiled size (in kilobytes) of components in four different ECU modules; lines show fitted regression model. Data from Lind et al Lind_12. code Figure 143. SLOC again standard deviation for multiple implementations of seven different problems (grey line is fitted regression). Data from: Anda et al Anda_09, Jørgensen Jorgensen_16b, Lauterbach Lauterbach_87, McAllister et al McAllister_89, Selby et al Selby_85, Shimasaki et al Shimasaki_80, van der Meulen van_der_Meulen_07. code Figure 144. Bids made by 19 estimators from the same company (divided into two groups for the experiment). Data from Jørgensen et al Jorgensen_04c. code Figure 145. Initial implementation schedule, with employee number(s) given for each task (percentage given when not 100%) for a project. Data from Ge et al Ge_16. code Figure 146. Estimated cost of developing a bespoke software system by the three companies contracted to do the work. Data from Yu Yu_03. code Figure 147. Number of citations from Standard documents within protocol level to documents in the same and other levels (RTG routing, INT internet, TSV transport, RAI realtime applications and infrastructure, APP Applications, W3C recommendations). Data from Simcoe Simcoe_15. code Figure 148. Effort, in person hours per month, used in the implementation of the five components making up the PAVE PAWS project (grey line shows total effort). Data extracted from Curtis et al Curtis_80. code Figure 149. Percentage of actual project duration elapsed when 882 schedule estimates were made, during 121 projects, against estimated/actual time ratio (boundary maximum in red). Data kindly provided by Little Little_06. code Figure 150. Initial estimated project duration against number of schedule estimates made before completion, for 121 projects; line is a loess fit. Data kindly provided by Little Little_06. code Figure 151. Percentage change in 882 estimated delivery dates announced at a given percentage of the estimated elapsed time of the corresponding project, for 121 projects (red is a loess fit); blue line is a density plot of percentage estimated duration when estimate made. Data kindly provided by Little Little_06. code Figure 152. Number of work packages completed within a given time; colored lines are work packages having the same estimated lead time. Data extracted from van Oorschot et al van_Oorschot_05. code Figure 153. Phase during which work on a given activity of development was actually performed, average percentages over 13 projects. Data from Zelkowitz Zelkowitz_87. code Figure 154. Percentage distribution of effort time (red) and schedule time (blue) across design/coding/testing for 38 NASA projects. Data from Condon et al Condon_93. code Figure 155. Percentage distribution of effort across design/coding/testing for 10 ICL projects (red), 11 BT projects (green), 11 space projects (blue) and 12 defense projects (purple). Data from Kitchenham et al Kitchenham_85 and Graver et al Graver_77. code Figure 156. Percentage of requirements added/deleted/modified in eight features (colored lines) of a product over 22 releases. Data extracted from Felici Felici_04. code Figure 157. Pagerank of the stakeholders in the network created from the Open (red) and Closed (blue) stakeholder responses (values for each have been sorted). Data from Lim Lim_10. code Figure 158. Average value assigned to requirements (red) and one standard deviation bounds (blue) based on omitting one stakeholder’s priority value list. Data from Regnell et al Regnell_01. code Figure 159. Average number of days taken to implement a feature, over time; smoothed using a 25-day rolling mean. Data kindly supplied by 7Digital 7Digital_12. code Figure 160. Number of features whose implementation took a given number of elapsed workdays; upper first 650-days, lower post 650-days. Fitted zero-truncated negative binomial distribution in green. Data kindly supplied by 7Digital 7Digital_12. code Figure 161. Number of feature developments started on a given work day (red new features, green bugs fixes, blue ratio of two values; 25-day rolling mean). Data kindly supplied by 7Digital 7Digital_12. code Figure 162. Survival curve of IT outsourcing suppliers continuing to work for 2,382 Credit Unions. Data kindly provided by Peukert Peukert_10. code # Reliability Figure 163. Transition counts of the order in which five distinct faults were discovered in 50 runs of Program A2. Data from Nagel et al Nagel_82. code Figure 164. Number of input cases that occurred before a particular fault was experienced by program A2; the list was sorted for each fault. Data from Nagel et al Nagel_82. code Figure 165. Number of accesses to memory address blocks, per 100,000 instructions, executing gzip on two different inputs. Data from Brigham Young Brigham_Young via Feitelson. code Figure 166. Number of reported incidents reported in each of 800 applications installed on over 120,000 desktop machines. Data from Lucente Lucente_15. code Figure 167. Power analysis (50 and 10 runs at various p-values) of detecting a difference between two runs having a binomial distribution (runs needed to achieve power=0.8 at various p-values). code Figure 168. Percentage of usability problems found by a given number of test subjects. Data extracted from Nielsen et al Nielsen_93. code Figure 169. Problems reported in the POSIX standard by problem classification. Data kindly provided by Josey OpenGroup_17. code Figure 170. Survival rate of faults in Linux device drivers and other Linux subsystems… Data from Palix et al Palix_10b. code Figure 171. Defects found against hours of testing… Data from Wood Wood_96. code Figure 172. Percentage of reported problems having a given mean time to first problem occurrence (in months, summed over all installations of a product) for none products. Data from Adams Adams_84. code Figure 173. Survival curve of the two most common warnings reported by Splint in Samba and Squid. Data from De Penta et al Di_penta_09. code Figure 174. Reported faults against number of installations (upper) and age (lower)… Data from the "wheezy" version of Debian UDD_14. code Figure 175. Number of various kinds of fault found during code review of nine implementations of the same specification and how located. Data extracted from Finifter Finifter_13b. code Figure 176. Input case on which a failure occurred, for a total of 500,000 inputs. Data from Dunham et al Dunham_86. code Figure 177. Number of input cases processed before a given fault is experienced. Data from Dunham et al Dunham_86. code Figure 178. Number of input cases processed before a given number of program failures is experienced; 25 replications. Data from Dunham et al Dunham_86. code Figure 179. Time taken, in 10 distinct runs, to discover a thread safety violation in 22 different Java classes. Data kindly supplied by Pradel Pradel_12. code Figure 180. Fraction of mutated programs, in various languages, that successfully compiled/executed/produced same output. Data from Spinellis et al Spinellis_12. code Figure 181. Total number of failures per 30-day interval for each LANL system. Data from Los Alamos National Lab (LANL). code Figure 182. Total number of failures for each node in the given LANL system. Data from Los Alamos National Lab (LANL). code Figure 183. For systems 2 and 18, number of uptime intervals, binned into 10 hour intervals, red line is fitted negative binomial distribution. Data from Los Alamos National Lab (LANL). code Margin Fault slip throughs for a development project at Ericsson (left column list when fault could have been detected, bottom row when fault was detected). Data from Hribar Hribar_08. code Figure 184. Various test suite coverage measures and mutants killed in 300 or so Java projects; black line is a loess fit. Data from Gopinath et al Gopinath_14. code Figure 185. Statement (triangles) and branch (stars) coverage achieved using a program’s test suite… Data from Marinescu et al Marinescu_14. code Figure 186. Amount of source (millions of lines) in each version broken down by the version in which it first appears. Data extracted Massacci et al Massacci_11. code Figure 187. Market share of Firefox versions between official release and end-of-support. Data from w3schools.com. code Figure 188. Number of people with Internet access per 100 head of population in the developed world and the whole world. Data from ITU ITU_12. code Figure 189. Amount of end-user usage of code originally written for Firefox version 1.0 by various other versions. Data extracted from Massacci et al Massacci_11. code # Source code in before, after, famous, and fictitious groups. Based on Dooling and Christiaansen Dooling_77. Figure 190. The time taken for subjects to read a page of text in a particular orientation, as they read more pages. Results are for the same six subjects in two tests more than a year apart. Based on Kolers Kolers_76. Figure 191. Boxplot of ratings given to snippets 1 to 50 by second year students (colors used to help distinguish boxplots for each snippet). code Figure 192. Aggregated ranking of snippets by subjects in years 1 and 2 (red and black) and years 2 and 4 (black and blue). Snippets have been sorted by year 2 ranking. code Figure 193. Correlation, using Kendall’s tau, between each subject and their corresponding year aggregate ranking. code Figure 194. The same program visually presented in three different ways; illustrating how a reader’s existing knowledge of words can provide a significant benefit in comprehending source code. By comparison, all the other tokens combined provide relatively little information. Based on an example from Laitinen Laitinen_95. Figure 195. Number of files and lines of code in 3,782 projects on Sourceforge. Data from Herraiz Herraiz_08. code Figure 196. Total number of C functions measured, their total unused parameters and two fitted models. Data from Jones <book Jones_??>. code Figure 197. Occurrences of sequences of java.lang.StringBuilder methods called on the same object in 11 GB of Java bytecode. Data from Mendez et al Mendez_13. code Figure 198. For each class the percentage of method sequences containing a given number of calls (in 11 GB of Java bytecode). Data from Mendez et al Mendez_13. code Figure 199. Number of commits of a given length, in lines added/deleted to fix various faults in Linux file systems. Data from Lu et al Lu_13. code Figure 200. "Worth estimate" for identifier visibility ordering preferences declarations within a Java class. Data from Biegel et al Biegel_12. code Figure 201. "Worth estimate" for the kind of method activity attribute. Data from Biegel et al Biegel_12. code Figure 202. Number of method calls to Java APIs and non-APIs in 6,286 Open source projects. Data from Lämmel et al Lammel_11. code Figure 203. Percentage occurrence of values appearing as the most significant digit of floating-point, integer and hexadecimal literals in C source code. Data from Jones Jones_05a. code Figure 204. Lines of code, Halstead’s volume and cyclomatic complexity of Linux version 2.6.9. Data from Israel et al Israeli_10. code Figure 205. Number of feature constants against LOC for 40 large C programs and two fitted regression lines (red and green; blue is one confidence interval). Data from Liebig et al Liebig_10. code # Stories told by data Figure 206. Years of professional experience in a given language for experimental subjects. Data from Prechelt Prechelt_07. code Figure 207. Plots of sample values having various visual patterns. code Figure 208. Total number of lines of C code, in .c and .h files, having a given length, i.e., containing a given number of characters (upper) and tokens (lower). Data from Jones Jones_05a. code Figure 209. Various measurements of work performed implementing the same functionality, number of lines of Haskell and C implementing functionality, CFP (COSMIC function points; based on user manual) and length of formal specification. Data kindly provided by Staples Staples_13. code Figure 210. Effort, in hours (log scale), spent in various development phases of projects written in Ada (blue) and Fortran (red). Data from Waligora et al Waligora_95. code Figure 211. Performance of experts (e) and novices (n) in a test driven development experiment. Data from Muller et al Muller_07. code Figure 212. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using colored ellipses. Data from Gousios et al Gousios_14. code Figure 213. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using pie charts and shaded boxes. Data from Gousios et al Gousios_14. code Figure 214. Hierarchical cluster of correlation between pairs of attributes of 12,799 Github pull requests to the Homebrew repo. Data from Gousios et al Gousios_14. code Figure 215. Effort invested in project definition (as percentage of original estimate) against cost overrun (as percentage of original estimate). Data extracted from Gruhl Gruhl_9x. code Figure 216. Relative clock frequency of cpus when first launched (1970 == 1). Data from Danowitz et al Danowitz_12. code Figure 217. Year and age at which survey respondents started contributing to FLOSS, i.e., made their first FLOSS contribution. Data from Robles et al Robles_14. code Figure 218. SPECint results, summed over all distinct values (upper) and summed within equal width bins (lower). Data from SPEC website SPEC_14. code Figure 219. Kernel density plot of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code Figure 220. Number of commits containing a given number of lines of code made when making various categories of changes to the Linux filesystem code (upper) and a density plot of the same data (lower). Data from Lu et al Lu_13. code Figure 221. Three commonly used kernel density smoothing functions: gaussian, rectangular and triangular. code Figure 222. Developer estimated effort against actual effort (in hours), for various maintenance tasks, e.g., adaptive, corrective and perfective; upper as-is, middle jittered values and lower size proportional to the log of the number measurements. Data from Hatton Hatton_07. code Figure 223. Number of installations of Debian packages against the age of the package; middle plot was created by smoothScatter and lower plot by contour. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code Figure 224. Number of lines added to glibc each week. Data from González-Barahona et al Gonzalez-Barahona_14. code Figure 225. Boxplot of time between a bug in Eclipse being reported and the first response to the report; right plot is notched. Data from Breu et al Breu_10. code Figure 226. Violin plots (left using vioplot, right using beanplot) of time between bug being reported in Eclipse and first response to the report. Data from Breu et al Breu_10. code Figure 227. Time taken for developers to debug various programs using batch processing or online (i.e., time-sharing) systems. Data kindly provided by Prechelt Prechelt_99a. code Figure 228. Pairs of languages used together in the same GitHub project with connecting line width, color and transparency related to number of occurrences. Data kindly supplied by Bissyande Bissyande_13. code Figure 229. References from one document to another in the Microsoft Server Protocol specifications. Data extracted by the author from the 2009 document release WSPP_15. code Figure 230. Alluvial plot of relative prioritization order of selection and application of Github pull requests. Data from Gousios et al Gousios_15a. code Figure 231. Intel Sandy Bridge L3 cache bandwidth in GB/s at various clock frequencies and using combinations of cores (0-3 denotes cores zero-through-three, 0,2,4 denotes the three cores zero, two and four). Data from Schone et al Schone_12. code Figure 232. Contour plot of the number of sessions executed on a computer having a given processor speed and memory capacity. Data kindly provided by Thereska Thereska_10. code Figure 233. Root source of 1,257 faults and where fixes were applied for 21 large safety critical applications. Data from Hamill et al Hamill_14. code Figure 234. Ternary plots drawn with two possible visual aids for estimating the position of a point (red plus at x=0.1, y=0.35, z=0.55); axis names appear on the vertex opposite the axis they denote. code Figure 235. Earth relative positions of NASA’s Orbview-2 spacecraft when it experienced a single event upset (in blue) on 12 July 2000. Data kindly provided by LaBel Poivey_03. code Figure 236. Estimated market share of Android devices by brand and product, based on downloads from 682,000 unique devices in 2015. Data from OpenSignal OpenSignal_15. code Figure 237. Variables having a given number of read accesses, given 25, 50, 75 and 100 total accesses, calculated from running the weighted preferential attachment algorithm (red), the smoothed data (blue) and a fitted exponential (green). code Figure 238. Throughput when running the SPEC SDM91 benchmark on a Sun SPARCcenter 2000 containing 8 CPUs, with the predictions from three fitted queuing models. Data from Gunther Gunther_05. code Figure 239. Illustration of the difference in cognitive effort needed to locate points differing by shape or color. code Figure 240. The three, seven and twelve color palettes returned by calls to the diverge_hcl, sequential_hcl, rainbow_hcl and rainbow functions. code Figure 241. Percentage share of the Android market by successive Android releases between 2010 and 2015. Data from Villard Villard_15. code Figure 242. Values plotted using a linear (upper) and logarithmic (lower) x-axis. Data from Dunham et al Dunham_86. code Figure 243. Illustration of U-shape created when y-axis values are a ratio calculated from x-axis values. code Figure 244. Mean time to fail for systems of various sizes (measured in lines of code); linear y-axis left, log y-axis right. Data extracted from Figure 8.3 of Putnam et al Putnam_92. code Figure 245. Alternative representation of numeric values in Table. Data from Scott Scott_16. code Figure 246. What’s up doc? Not the fitted model you were expecting. Equations from White White_12. code # Probability Figure 247. Probability that three (red) or four (blue) consecutive false positive warnings occur in some total number of warnings (false positive rate appears on line). code Figure 248. The relationship between words for tracts of trees in various languages. The interpretation given to words (boundary indicated by the zigzags) in one language may overlap that given in other languages. Adapted from DiMarco et al DiMarco_93. Figure 249. Relationships between common discrete and continuous probability distributions. Figure 250. Shapes of commonly encountered discrete probability distributions (upper to lower: Uniform, Geometric, Binomial and Poisson). code Figure 251. Cumulative density plots of the discrete probability distributions in Figure. code Figure 252. Commonly encountered continuous probability distributions (upper to lower: Uniform, Exponential, Normal, beta). code Figure 253. Samples of randomly selected values drawn from the same normal distribution (left: 100 points in each sample, right 1,000 points in each sample). code Figure 254. Reading rate for text printed using a serif (blue) and sans-serif (red) font, data has been normalised and displayed as a density. Data from Veytsman et al Veytsman_12. code Figure 255. Probability, with 95% confidence, that shapiro.test correctly reports that samples drawn from various distributions are not drawn from a Normal distribution, and probability of an incorrect report when the sample is drawn from a Normal distribution. code Figure 256. Number of conditionally compiled code sequences dependent on a given number of feature macros (red overwritten by blue: Linux, blue: FreeBSD). Data from Berger et al Berger_10. code Figure 257. Percentage occurrence of statements for each of 100 or so C, C++ and Java programs, plotted as a density on the y-axis. Data from Zhu et al Zhu_15. code Figure 258. A Cullen and Frey graph for the$3n+1$program length data. Data kindly provided by van der Meulen van_der_Meulen_07. code Figure 259. Number of 3n+1 programs containing a given number of lines and four distributions fitted to this data. Data kindly provided by van der Meulen van_der_Meulen_07. code Figure 260. A zero-truncated Negative Binomial distribution fitted to the number of features whose implementation took a given number of elapsed workdays; first 650 days used. Data kindly provided by 7digital 7Digital_12. code Figure 261. Density plot of MPI micro-benchmark runtime performance for calls to MPI_Scan with 10,000 Bytes (upper) and to MPI_Allreduce with 1,000 Bytes (lower). Data kindly supplied by Hunold Hunold_14. code Figure 262. Mixture model fitted by the normalmixEM function to the performance data from calls to MPI_Allreduce. Data kindly supplied by Hunold Hunold_14. code Figure 263. Density plot of accesses to one article on Slashdot, in minutes since its publication. The distinct Normal distributions (colored and fitted to the log of the data) contained in the mixture models fitted by the REBMIX (upper) and normalmixEM (lower) functions. Data kindly supplied by Kaltenbrunner Kaltenbrunner_07. code Figure 264. Cumulative probability distribution of files size (red) and of number of bytes occupied in a file system (blue). Data from Irlam Irlam_93. code Figure 265. Graph of available state transitions for Alaris volumetric infusion pump (the button presses that cause transitions between states are not shown). Data kindly supplied by Oladimeji Oladimeji_08. code Figure 266. Discrete-time Markov chain for created/modified/deleted status of Linux kernel files at each major release from versions 2.6.0 to 2.6.39. Data from Tarasov Tarasov_12. code Figure 267. Directed graph of emails between FreeBSD and OpenBSD developers, plus a few people involved in both discussions, with developers who sent/received less than four emails removed. Data from Canfora et al Canfora_11. code Figure 268. Expected probability of a single instance (y-axis) against the probability of a measured struct type having grouped member types (x-axis); when both probabilities are the same points will be along the blue line. Data from Jones Jones_09b. code # Statistics for software engineering Figure 269. Example of a sample drawn from a population. code Figure 270. Date of introduction of a cpu against its commercial lifetime. Data from Culver Culver_10. code Figure 271. A population of items having one of three colors and three strata sampled from it. code Figure 272. Power consumed by three SERT benchmark programs at various levels of system load; crosses at 2% load intervals, lines based on 10% load intervals. Data kindly provided by Kistowski Kistowski_15. code Figure 273. Distribution of 4,000 sample means for two sample sizes drawn from exponential (left), lognormal (center) and Pareto (right) distributions, vertical lines are 95% confidence bounds. The blue curve is the Normal distribution predicted by theory. code Figure 274. Mean (red) and standard deviation (grey lines; they are not symmetrical because of the log scaling) of samples of 3 items drawn from a population of 1,000 items (blue line mean, green line standard deviation). Data kindly provided by Chen Chen_12. code Figure 275. Density plot of mean of samples containing 3 or 12 items randomly selected from a data set of 1,000 items; process repeated 1,000 times for each sample size. Data kindly provided by Chen Chen_12. code Figure 276. Number of commits to glibc for each day of the week, for the years from 1991 to 2012. Data from González-Barahona et al Gonzalez-Barahona_14. code Figure 277. A Normal distribution with mean=4 and variance=8 and a Chi-squared distribution with four degrees of freedom having the same mean and variance (the vertical lines are at the distributions' median value). code Figure 278. Density plot of execution time of 1,000 input data sets, with lines marking the mean, median and mode. Data kindly supplied by Chen Chen_12. code Figure 279. Impact of serial correlation, AR(1) in this example, on the calculated mean (upper) and standard deviation (lower) of a sample (the legends specify the amount of serial correlation). code Figure 280. Occurrence of sample median and mean values for 1,000 samples drawn from a binomial distribution. code Figure 281. A contaminated normal, values drawn from two normal distributions with 10% of values drawn from a distribution having a standard deviation five times greater than the other. code Figure 282. Regression model (red line; pvalue=0.02) fitted to the number of correct/false security code review reports made by 30 professionals; blue lines are 95% confidence intervals. Data from Edmundson et al Edmundson_13. code Figure 283. Bootstrapped regression lines fitted to random samples of the number of correct/false security code review reports made by 30 professionals. Data from Edmundson et al Edmundson_13. code Figure 284. Kernel density plot, with 95% confidence interval, of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code Figure 285. The four related quantities in the design of experiments. code Figure 286. Examples of the impact of population prevalence, statistical power and p-value on number of false positives and false negatives. code Figure 287. Visualization of Cohen’s$d$for two normal distributions having different means and the same standard deviation (two left) and both different (right). code Figure 288. The impact of differences in mean and standard deviation on the overlap between two populations ($\alpha$: probability of making a false positive error, and$\beta: probability of making a false negative error). code Figure 289. The power of a statistical test at detecting that a difference exists between the mean value of two sample drawn from two populations, both having a Normal distribution. code # Regression modeling Figure 290. Relationship between data characteristics (edge labels) and applicable techniques (node labels) for building regression models. Figure 291. Total lines of source code in FreeBSD by days elapsed since the project started (in 1993). Data from Herraiz Herraiz_08. code Figure 292. Estimated cost and duration of 73 large Dutch federal IT projects, along with fitted model and 95% confidence intervals. Data from Kampstra et al Kampstra_09. code Figure 293. Number of updates and fixes in each Linux release between version 2.6.11 and 3.2. Data from Corbet et al Corbet_12. code Figure 294. The number of commits made and the number of contributing developers for Linux versions 2.6.0 to 3.12. The green line in the right plot is the regression model fitted by switching the x/y values. Data from Kroah-Hartman Kroah-Hartman_14. code Figure 295. Effort/Size of various projects and regression lines fitted using Effort as the response variable (red, with green 95% confidence intervals) and Size as the response variable (blue). Data from Jørgensen et al <book Jorgensen_0?>. code Figure 296. Lines of code in every initial release (i.e., excluding bug-fix versions of a release) of the Linux kernel since version 1.0, along with fitted straight line (upper) and quadratic (lower) regression models. Data from Israeli et al Israeli_10. code Figure 297. Actual (left of vertical line) and predicted (right of vertical line) total lines of code in Linux at a given number of days since the release of version 1.0, derived from a regression model built from fitting a cubic polynomial to the data (dashed lines are 95% confidence bounds). Data from Israeli et al Israeli_10. code Figure 298. Number of classes in the Groovy compiler at each release, in days since version 1.0. Data From Vasa Vasa_10. code Figure 299. For each distinct language, the number of lines committed on Github and the number of questions tagged with that language. Data from Kunst Kunst_13. code Figure 300. Percentage of vulnerabilities detected by developers working a given number of years in security. Data extracted from Edmundson et al Edmundson_13. code Figure 301. Hours to develop software for 29 embedded consumer products and the amount of code they contain, with fitted regression model and loess fit (yellow). Data from Fenton el al Fenton_08. code Figure 302. Points remaining after removal of overly influential observations, repeatedly applying Cook’s distance and Studentized residuals. Data from Fenton el al Fenton_08. code Figure 303. Points remaining after removal of overly influential observations, also taking into account the Bonferroni p-value of the Studentized residuals; the line shows the fitted model and 95% confidence interval (loess fit in yellow). Data from Fenton el al Fenton_08. code Figure 304. influenceIndexPlot for the model having the fitted line shown in Figure. Data from Fenton el al Fenton_08. code Figure 305. Number of medical devices reported recalled by the US Food and Drug Administration, in two week bins. Upper: fitted straight line and confidence bounds, with loess fit (green); Lower: straight line (purple) fitted after two outliers replaced by mean and original fit (red). Data from Alemzadeh et al Alemzadeh_13. code Figure 306. influenceIndexPlot of data from Alemzadeh et al Alemzadeh_13. code Figure 307. Two fitted straight lines and confidence intervals, one up to the end of 2010 and one after 2010. Data from Alemzadeh et al Alemzadeh_13. code Figure 308. Results from various studies of software requirements function points counted using COSMIC and FPA; lines are loess fits to studies based on industry and academic counters. Data from Amiri et al Amiri_11. code Figure 309. Five different equations fitted to the Embedded subset of the COCOMO 81 data before influential observation removal (upper) and after influential observation removal (lower). Data from Boehm Boehm_81. code Figure 310. Anscombe data sets with Pearson correlation coefficient, mean, standard deviation, and line fitted using linear regression. Data from Anscombe Anscombe_73. code Figure 311. Residual of the straight line fit to the Linux growth data analysed in Figure (upper) and data+straight line fit (red) and loess fit (blue). Data from Israeli et al Israeli_10. code Figure 312. Array element assignment benchmark compiled with gcc using the O0 (upper) and O3 (lower) options (measurements were grouped into runs of 2,000 executions). Data from Flater et al Flater_13. code Figure 313. Number of installations of Debian packages against the age of the package, plus fitted model and loess fit. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code Figure 314. Quadratic relationship with various amounts of added noise fitted using a quadratic and exponential model. code Figure 315. Author workload against number of activity types per author (upper) and ratio test (lower). Data from Vasilescu et al Vasilescu_12. code Figure 316. Change-points detected by cpt.mean, upper using method="AMOC" and lower using method="PELT". Data from Alemzadeh et al Alemzadeh_13. code Figure 317. Number of flags (y-axis jittered) used to control the selection of optional features in system containing a total number of features, loess curve (red), regression line (green). Data from Berger et al Berger_12. code Figure 318. Monthly unit sales (in thousands) of 4-bit microprocessors. Data kindly supplied by Turley Turley_02. code Figure 319. Fitted regression line to points (in red) and 3-D illustration of assumed Normal distribution of errors. code Figure 320. Number of vulnerabilities detected by professional developers with web security review experience; upper: technically correct plot of model fitted using a Poisson distribution, lower: easier to interpret curve representation of fitted regression models assume error has a Poisson distribution (continuous lines) or a Normal distribution (dashed lines). Data extracted from Edmundson Edmundson_13. code Figure 321. Number of functions containing a given number of break statements and a fitted Negative Binomial distribution. Data from Jones Jones_05a. code Figure 322. Code review meeting duration for a given number of non-comment lines of code. Fitted regression model, assuming errors have a Gamma distribution (red, with confidence interval in blue) or a Normal distribution (green). Data from Porter et al Porter_98. code Figure 323. Number of APIs used in Java programs containing a given number of lines and three fitted models. Data from Starek Starek_10. code Figure 324. Yearly development cost and line of Fortran code delivered to the US Air Force between 1962 and 1984; with fitted regression models. Data extracted from NeSmith NeSmith_86. code Figure 325. Maintenance task effort and lines of code added+updated, with fitted regression model (red) and SIMEX adjusted for 10% error (blue). Data from Jørgensen Jorgensen_95. code Figure 326. Regression modeling 0/1 data with a straight line and a logistic equation. code Figure 327. ROC curve for the data listed in Table. code Figure 328. Percentage of mutants killed at various percentage of path coverage for 300 or so Java projects; fitted Beta (red) and glm (blue) regression models. Data from Gopinath et al Gopinath_14. code Figure 329. SPECint 2006 performance results for processors running at various clock rates, memory chip frequencies and processor family. Data from SPEC SPEC_14. code Figure 330. Component+residual plots for three explanatory variables in a fitted SPECint model. code Figure 331. Individual contribution of each explanatory variable to the response variable in a quadratic model of SPECint performance. code Figure 332. Estimated and actual effort broken down by communication frequency, along with individually fitted straight lines. Data from Moløkken-Østvold et al Molokken_Ostvold_07. code Figure 333. Illustration of the shared and non-shared contributions made by two explanatory variables to the response variable Y. code Figure 334. pairs plot of lines added/modified/removed, growth and number of files and total lines in versions 2.6.0 through 3.9 of the Linux kernel. Data from Kroah-Hartman Kroah-Hartman_14. code Figure 335. Example plots of functions listed in Table. These equations can be inverted, so they start high and go down. code Figure 336. Time to execute a computational biology program on systems containing processors with various L2 cache sizes. Data kindly provided by Hazelhurst Hazelhurst_10. code Figure 337. A logistic equation fitted to the lines of code in every non-bugfix release of the Linux kernel since version 1.0. Data from Israel et al Israeli_10. code Figure 338. Predictions by logistic equations fitted to Linux SLOC data, using subsets of data up to 2900, 3650, 4200 number of days and all days since the release of version 1.0. Data from Israel et al Israeli_10. code Figure 339. Increase in areal density of hard disks entering production over time. Data from Grochowski et al Grochowski_12. code Figure 340. Lines of code in the GNU C library against days since 1 January 1990. Data from González-Barahona Gonzalez-Barahona_14. code Figure 341. Number of failing programs caused by unique faults in gcc (upper) and SpiderMonkey (lower). Fitted model in green, with two exponential components in red and blue. Data kindly provided by Chen Chen_13. code Figure 342. Power law (red) and exponential (blue) fits to feature macro usage in 20 systems written in C; fail to reject p-value for 20 systems is 0.64. Data from Queiroz et al Queiroz_17. code Figure 343. Power consumption of six different Intel Core i5-540M processors running at various frequencies; colored lines denote fitted regression models for each processor. Data from Balaji et al Balaji_12. code Figure 344. Example showing the three ways of structuring a mixed effects model, i.e., different intersections/same slope (upper), same intersection/different slopes (middle) and different intersections/slopes (lower). code Figure 345. Confidence intervals, 95%, for within-subject intercept and slope (right plots) of mixed-effect models in the adjacent code. code Figure 346. The three components of the hourly rate of commits, during a week, to the Linux kernel source tree; components extracted from the time series by stl. Data from Eyolfson et al Eyolfson_11. code Figure 347. Autocorrelation of number of defects found on a given day, for development project C. Data kindly provided by Buettner Buettner_08. code Figure 348. Autocorrelation of two AR models (upper plots) and two MA models (lower plots). code Figure 349. Partial autocorrelation of same two AR models (upper plots) and two MA models (lower plots) shown in Figure. code Figure 350. Autocorrelation of indentation of source code written in various languages. Data from Hindle et al Hindle_08. code Figure 351. Number of features started for each day and fitted regression trend line (left) and number of features after subtracting the trend (right), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code Figure 352. Autocorrelation (left) and partial autocorrelation (right) of the number of features started on a given day (after differencing the log transformed data), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code Figure 353. Predicted daily difference in the number of new feature starts (red) and 95% confidence intervals (blue). Data kindly supplied by 7Digital 7Digital_12. code Figure 354. Time series whose values are uncorrelated (upper), but whose squared values are correlated (lower); see code for generation process. code Figure 355. Cross correlation of feature release ‘size’ (upper non-bugfix releases, lower all releases) and date when bugs are prioritised. Data kindly supplied by 7Digital 7Digital_12. code Figure 356. Estimated staff working on a project during every week. Data from Buettner Buettner_08. code Figure 357. Market share of Firefox version 3.0 fitted using loess regression with various values of the span option. Data from W3Counter W3Counter_14. code Figure 358. Cross-correlation of source lines added/deleted per week to the glibc library. Data from González-Barahona Gonzalez-Barahona_14. code Figure 359. Visualization of alignment between weekly time series of lines code in NetBSD (blue) and FreeBSD (red). Data from Herraiz Herraiz_08 code Figure 360. Effort distribution (person hours) over the eight main tasks of a development project at Rolls-Royce and a hierarchical clustering of each task effort time series based on pair-wise correlation and Euclidean distance metrics. Data extracted from Powell Powell_01. code Figure 361. Two commonly used hazard functions; Weibull is monotonic (always increases, decreases or remains the same) and Lognormal which can increase and then decrease. code Figure 362. Observation period with events inside and outside the study period. code Figure 363. The Kaplan-Meier curve for survivability of new releases: (blue) ETPs using only official APIs, (blue) ETPs calling internal APIs (red); dotted lines are 95% confidence intervals. Data from Businge Businge_13. code Figure 364. The Kaplan-Meier curve for survivability of ETPs ability to be built using SDK released in subsequent years: (blue) ETPs using only official APIs, (red) ETPs calling internal APIs; dotted lines are 95% confidence intervals, with plus signs, +, indicating censored data. Data from Businge Businge_13. code Figure 365. Kaplan-Meier curves for time-to-fix…. Data from Arora et al Arora_10. code Figure 366. Survival curve after adjustment for explanatory variables… code Figure 367. Cumulative incidence curves for problems reported by the splint tool in Samba and Squid (time is measured in number of snapshot releases). Data from Di Penta et al Di_penta_09. code Figure 368. Rose diagram of number of commits in each 3 hour period of a day for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code Figure 369. The Cartwright (red; dcarthwrite), wrapped Cauchy (green; dwrappedcauchy) and wrapped von Mises (blue; dvonmises) circular probability distributions for various values of their parameters. code Figure 370. Asymmetric extended wrapped forms of the Cardioid (upper), von Mises (middle) and Cauchy (lower) probability distributions for various values of their parameters. code Figure 371. Number of commits (upper) and number of commits in which a fault was detected (lower) by hour of day of the commit, for Linux. Data from Eyolfson et al Eyolfson_14. code Figure 372. Number of commits per hour for weekdays and fitted model (upper) and number of commits in which a fault was detected (lower), for Linux. Data from Eyolfson et al Eyolfson_14. code Figure 373. Number of commits per hour for each weekday, fitted using\cos(...\cos...)$(upper) and$\cos(...\cos+\sin...)$(lower), for Linux; in both cases the fitted fault model (red) has been rescaled to allow comparison. Data from Eyolfson et al Eyolfson_14. code Figure 374. Application source lines against percentage of covered lines achieved by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests. Data from Machiry et al Machiry_13. code Figure 375. Percentage of source lines covered by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests; fitted regression line and prediction points for various total source lines, red plus. Data from Machiry et al Machiry_13. code # Other techniques Figure 376. Volume of unit sphere in 1 to 50 dimensions, e.g., sphere has volume$\frac43pi\$ in three dimensions. code
Figure 377. Top levels of the decision tree built from the reopened fault data. Data from Shihab et al Shihab_10a. code
Figure 378. A Bertin plot for items included in the same data structure as ‘Antibiotics used’, for each subject, after reordering by seriate. Data from Jones Jones_09b. code
Figure 379. A visualization of the Robinson matrix based on number of times pairs of items co-occur in the same data structure (the closer to the diagonal the more often they occur together). Data from Jones Jones_09b. code

# Experiments

Figure 380. Time taken, by the same person, to implement 12 algorithms from the Communications of the ACM, with four iteration of the implementation process. Data from Zislis Zislis_73. code
Figure 381. Time taken to transfer and multiply 2-dimensional matrices of various sizes on a GTX 480 GPU. Data kindly supplied by Gregg and Hazelwood Gregg_11. code
Figure 382. Relative performance (y-axis) of libraries optimized to run on various processors (x-axis). Data from Bird Bird_10. code
Figure 383. Number of integer constants having the lexical form of a decimal-constant (the literal 0 is also included in this set) and hexadecimal-constant that have a given value. Data from Jones Jones_05a. code
Figure 384. One and two-sided significance testing. code
Figure 385. A cube plot of three configuration factors and corresponding benchmark results (blue) from Memory table experiment. Data from Citron et al Citron_03b. code
Figure 386. Design plot showing the impact of each configuration factor on the performance of Memo table on benchmark performance. Data from Citron et al Citron_03b. code
Figure 387. Interaction plot showing how cint changes with size for given values of associativity and mapping. Data from Citron et al Citron_03b. code
Figure 388. Number of Reflection benchmark results achieving a given score, reported for GTX 970 cards from three third-party manufacturers. Data extracted from UserBenchmark.com. code
Figure 389. Density plots of project bids submitted by companies before/after seeing a requirements document. Data from Jørgensen et al Jorgensen_04c. code
Figure 390. Density plot of task implementation estimates: with no instructions (red) and with instruction on what to do (blue). Data from Jørgensen el al Jorgensen_04. code
Figure 391. Examples of correlation between samples of two value pairs, plotted on x- and y-axis. code
Figure 392. Number of software faults having a given consequence, based on an analysis of faults in Cassandra. Data from Gunawi et al Gunawi_14. code
Figure 393. Performance and rental cost of early computers, with straight line fits for a few years. Data from Knight Knight_66. code
Figure 394. Feature size, in Silicon atoms, of microprocessors. Data from Danowitz et al Danowitz_12. code
Figure 395. Maximum number of records sorted in 1 minute and using 1 penny’s worth of system time (upper). SPEC2006 integer benchmark results (lower). Data from Gray et al Gray_14 and SPEC SPEC_14. code
Figure 396. Total system power consumed when sorting 10, 20, 30, 40, 50 million integers (colored pluses) using three techniques running on the same processor at different clock frequencies. Data from Götz et al Gotz_14. code
Figure 397. Power consumed by 10 Amtel SAM3U microcontrollers at various temperatures when sleeping or running. Data from Wanner et al Wanner_10. code
Figure 398. Power spectrum of electrical power consumed by an app running on a ???. Data from Saborido et al Saborido_15. code
Figure 399. Read bandwidth at various offsets for new disks sold in 2002 (upper) and 2006 (lower). Data kindly provided by Krevat Krevat_13. code
Figure 400. Average power consumed by one server’s CPU (four Pentium 4 Xeons; red) and memory (8 GB PC133 DIMMs; blue) running the SPEC CPU2006 benchmark (upper) and breakdown by system component when executing various programs. Data from Bircher Bircher_10. code
Figure 401. FFT benchmark executed 2,048 times followed by system reboot, repeated 10 times. Data kindly provided by from Kalibera_05. code
Figure 402. Percentage change, relative to no environment variables, in perlbench performance as characters are added to the environment. Data extracted from Mytkowicz et al Mytkowicz_08. code
Figure 403. Changes in SPEC CPU2006 benchmark performance caused by cache and memory bus contention for one dual processor Intel Xeon E5345 system. Data kindly provided by Babka Babka_12. code
Figure 404. Execution time of 330.art_m, an OpenMP benchmark program, using different compilers, number of threads and setting of thread affinity. Data kindly provided by Mazouz Mazouz_13. code
Figure 405. Access times when walking through memory using three fixed stride patterns (i.e., 32, 64 and 128 bytes) on a quad-core Intel Xeon E5345; grey lines at one standard deviation. Data kindly provided by Babka Babka_09. code
Figure 406. Performance variation of programs from the Talos benchmark run on original OS and a stabilised OS. Data from Larres Larres_12. code
Figure 407. Operations per second of a file-sever mounted on one of ext2, ext3, rfs and xfs filesystems (same color for each filesystem) using various options. Data kindly supplied by Huang Zhou_12. code
Figure 408. Percentage change in SPEC number, relative to version 4.0.4, for 12 programs compiled using six different versions of gcc (compiling to 64-bits with the O3 option). Data from Makarow Makarow_14. code
Figure 409. Execution time of xy file compressor, compiled using gcc using various optimization options, running on various systems (lines are mean execution time when compiled using each option). Data kindly supplied by Petkovich de_Oliveira_13. code
Figure 410. Execution time of Perlbench, from SPEC benchmark, on six systems, when linked in three different orders and address randomization on/off. Data kindly supplied by Reidemeister de_Oliveira_13. code
Figure 411. Performance of PassMark memory benchmark on 783 Intel Core i7-3770K systems; lower plot created by trimming 10% of values from the ends of what appears in the upper plot. Data kindly supplied by David Wren PassMark_14. code
Figure 412. Ubench cpu performance on small (upper) and large (lower) EC2 instances, Europe in red and US in green. Data kindly provided by Dittrich Schad_10. code
Figure 413. Lines of code that 101 professional developers, with a given number of years experience, estimate they have written. Data from Jones Jones_06aJones_08aJones_09b. code

# Overview of R

Figure 414. Plot produced by hello_world.R program. code
Figure 415. The unique bytes per window (256 bytes wide) of a pdf file. code

# Data preparation

Figure 416. Screen height and width reported by 682,000 unique devices that downloaded an App from OpenSignal in 2015 (upper), reported measurements ordered so height always the larger value (lower). Data from OpenSignal OpenSignal_15. code
Figure 417. Number of reported vulnerabilities, per day, in the US National Vulnerability Database for 2003. Data from the National Vulnerability Database NVD_14. code
Figure 418. Percentage occurrence of the first digit of hexadecimal numbers in C source and estimated from Google book data. Data from Jones Jones_05a and Michel et al Michel_11. code
Figure 419. Number of processes executing for a given amount of time, with measurements expressed using two and six significant digits. Data from Feitelson Feitelson_14. code