# Newcomb–Benford law and the detection of frauds in international trade

See allHide authors and affiliations

Edited by Alex Kossovsky, University of Panama, Panama City, Panama, and accepted by Editorial Board Member Donald B. Rubin October 30, 2018 (received for review April 17, 2018)

## Significance

The detection of frauds is one of the most prominent applications of the Newcomb–Benford law for significant digits. However, no general theory can exactly anticipate whether this law provides a valid model for genuine, that is, nonfraudulent, empirical observations, whose generating process cannot be known with certainty. Our first aim is then to establish conditions for the validity of the Newcomb–Benford law in the field of international trade data, where frauds typically involve huge amounts of money and constitute a major threat for national budgets. We also provide approximations to the distribution of test statistics when the Newcomb–Benford law does not hold, thus opening the door to the development of statistical procedures with good inferential properties and wide applicability.

## Abstract

The contrast of fraud in international trade is a crucial task of modern economic regulations. We develop statistical tools for the detection of frauds in customs declarations that rely on the Newcomb–Benford law for significant digits. Our first contribution is to show the features, in the context of a European Union market, of the traders for which the law should hold in the absence of fraudulent data manipulation. Our results shed light on a relevant and debated question, since no general known theory can exactly predict validity of the law for genuine empirical data. We also provide approximations to the distribution of test statistics when the Newcomb–Benford law does not hold. These approximations open the door to the development of modified goodness-of-fit procedures with wide applicability and good inferential properties.

The contrast of fraud in international trade, and the corresponding protection of national budgets, is a crucial task of modern economic regulations. To give an idea of the volumes involved, in 2016 the customs duties flowing into the European Union (EU) budget amounted to more than 20 billion euros and provided about 15% of the total own resources of the EU. Huge losses thus occur when the value of imported goods is underreported (e.g., ref. 1). Most statistical antifraud techniques for international transactions fall in the class of unsupervised methods, with outlier detection and (robust) cluster analysis playing a prominent role (2⇓⇓–5). The rationale is that the bulk of international trade data are made of legitimate transactions and major frauds may stand out as highly suspicious anomalies. Considerable emphasis is also put on procedures that provide stringent control of the number of false positives (6), since substantial investigations like the one reported in ref. 1 are demanding and time consuming. A related crucial requirement is the ability to deal with massive datasets of traders and to provide—as automatically as possible—a ranking of their degree of anomaly. This information is essential for the design of efficient and effective audit plans, a major task for customs offices.

In this work we consider fraud detection through the Newcomb–Benford law (NBL). This law defines a probability distribution for patterns of significant digits in real positive numbers. It relies on the intriguing fact that in many natural and human phenomena the leading—that is, the first significant—digits are not uniformly scattered, as one could naively expect, but follow a logarithmic-type distribution. We refer to refs. 7⇓⇓–10 for an historical summary of the NBL, an extensive review of its challenging mathematical properties, and a survey of its more relevant applications.

Despite its long history, the mathematical and statistical challenges of the NBL have been recognized only recently. From a mathematical perspective, appropriate versions of the law appear in integer sequences, such as the celebrated Fibonacci sequence (8) or the factorial sequence (11). The law also emerges in the context of floating-point arithmetic (12), while a deep probabilistic study was carried out by Hill (13). A seminal note by Varian (14) suggested the idea that agreement with the NBL could validate the “reasonableness” of data. Since then, it is now rather well known—mainly due to the work of Nigrini (see ref. 7, for a review of such studies)—that the NBL can be used as a forensic accounting and auditing tool for financial data. The law has been shown to be a valuable starting point for forensic accountants and to be applicable in a number of auditing contexts, such as external, internal, and governmental auditing. It has also been found successful for identifying the presence of misconduct in other domains, including the identification of irregularities in electoral data (15, 16), campaign finance (17), and economic data (18).

Although the cited advances may suggest applicability of the NBL to international trade, there remain major unanswered questions that we address in our work. The first one concerns the trustworthiness of the NBL for genuine—that is, nonfraudulent—transactions. As shown in ref. 19, no general known theory can exactly predict whether the NBL should hold in any specific application, whose data-generating process cannot be known with certainty, even in the absence of fraud or other data manipulations; see also refs. 20⇓–22 for related concerns. Our first goal is then to provide insight on the suitability of the NBL for modeling the distribution of digits of genuine transaction values arising in international trade. We use the Italian import market as a specimen for our study, but our approach is general and can be replicated for any country for which detailed customs data are available. Knowledge of the conditions under which the NBL should be expected to hold in the absence of data manipulation is an essential ingredient for the implementation of large-scale monitoring processes in which tens (or even hundreds) of thousands of traders are screened in an automatic and fast way with the aim of identifying the most suspicious cases. In *SI Appendix*, section 7 we describe a web application that has been developed to assist customs officers and auditors in this screening task, which can be executed in full autonomy on their own datasets. It may instead be very difficult to ascertain whether anomaly should be attributed to fraud or to model failure if the NBL does not provide a suitable model for genuine transactions; see also ref. 23, p. 193, for a similar concern.

Our second goal is to deepen our knowledge of the empirical behavior of NBL-conformance tests by investigating their power under different contamination schemes. The adoption of such tests for antifraud screening is based on the assumption that fabrication of data closely following the law is difficult and that fraudsters might be biased toward simpler digit distributions, such as the discrete uniform or the Dirac. We also quantify the corresponding false positive rates, to make explicit the different and possibly conflicting facets that empirical researchers have to balance in practice.

The third aim of our work is to provide corrections to test statistics when the NBL does not hold. This is typically the case for traders who operate on a limited number of products, so that there is not enough variability in their transactions. Even if the NBL is not a suitable model for genuine transaction digits, the conformance tests based on our modified statistics have the appropriate empirical size in the absence of data manipulation, while the usual tests turn out to be potentially very liberal. We argue that, having the required size under general trade conditions and being competitive in terms of power, the conformance tests based on our modified statistics are recommended. Therefore, they extend the applicability of large-scale monitoring processes of international trade data to a wider range of practical situations.

## The NBL

### Statistical Background.

Let **1** reduces to

### Relevance for International Trade.

Our applied focus is on transactions involving EU traders; we refer to *SI Appendix*, sections 3 and 7 for the institutional regulations supporting their analysis. By international trade data we mean the data collected by EU member states for imports and exports that are declared by national traders and shipping agents using the form called the Single Administrative Document (SAD). The value that we analyze for antifraud purposes is the “statistical value” reported in each SAD, which also includes the costs of insurance and freight (CIF) and is given in euros by taking into account the exchange rate (26). Our interest is then on random variables **[3]** in the context of trade, n corresponds to the number of transactions made by the trader of interest, so that

There are different economic reasons suggesting that the distribution of the significant digits contained in **[3]**, it is then sensible to anticipate good conformance to the NBL when a trader operates by importing or exporting a sufficiently large number of different goods, even if none of the product-specific marginal distributions of digits follows the law. The economic literature also shows that traders have different degrees of market power. Trading operations are affected by market and country features, such as different trade costs and different access to credit (e.g., ref. 28). Therefore, transactions made with different counterparties may be characterized by different economic processes, yielding distributions for transaction values that can be conceived to vary randomly from one product to another for each trader. The significant-digit distribution in international transactions can thus be expected to adhere to the NBL when the trader makes a sufficiently large number of operations, with a sufficiently large number of counterparties, possibly located in different countries.

## A Contamination Model for Fraud

### The Model.

We phrase our antifraud approach within the framework of a trader-specific contamination model where each fraud corresponds to an outlier. For this purpose, we need a slight change in notation and we write

For **5** has a counterpart in the transaction space defined by *SI Appendix*, section 1.

Model **5** provides a principled framework for antifraud analysis of international trade data. Indeed, trader t may be considered a potential fraudster if the null hypothesis

A useful tractable version of contamination model **5** assumes that the probability of observing a given k-ple of digits in a genuine transaction of trader t depends on the trader features only through the values of **[6]** again stating the absence of fraud. Model **7** implies that the random vector

A further bonus of models **5** and **7** is that they make clear the antifraud advantages of our methodology over the often uninformative analysis of aggregated data, as given, for example, in ref. 18. In the latter instance, for each **[6]** for some **5** and **7** acknowledge the existence of a trader-specific propensity to fraud.

### Testing the Absence of Fraud.

The usual hypothesis of interest in the antifraud literature (7, 10) is**[6]** when **[8]** for a given value of k, the simplest one being the χ^{2} statistic**[8]** is true, with **[9]**, that is, **[2]**, while the corresponding 1D marginal hypotheses are tested through

In our empirical study we also consider the multiple-stage approach proposed by Barabesi et al. (6) with the aim of introducing a more stringent control on the proportion of false discoveries. This approach tests a decreasing sequence of lower-dimensional marginals of the NBL through their exact conditional distributions. Specifically, in the simple two-step version that we consider here, the method of Barabesi et al. (6) first tests the two-digit marginal **2** of the NBL by comparing **[2]**.

Since χ^{2} tests may also have some shortcomings (ref. 10, chap. 37), additional procedures not based on **[9]** and less formal methods are considered in *SI Appendix*, sections 5 and 6. Qualitative findings are similar in all cases. Nevertheless, for our purposes it is instructive to look at the results for χ^{2} tests, because their distribution (either exact or asymptotic) is known under the NBL. We can thus look at the agreement between the empirical and the nominal distribution of the test statistics to assess whether genuine transactions actually follow the law, that is, if **[5]** (or **[7]**) is the NBL.

## Adequacy of the NBL for Trade Data

Although the theoretical results sketched in the statistical background and the subsequent economic arguments broadly motivate the adoption of the NBL as a sensible model for genuine transactions in the context of international trade, it is unclear how they may fit to empirical transactions whose generating mechanism cannot be exactly known and obviously involves only a finite number of terms. One goal of our study is then to provide evidence on the quality of the NBL assumption **1** to the digit distribution of transaction values for noncheating traders that operate in real international markets. For this purpose, we assume that our contamination model holds with **[7]** as a sensible and practically workable approximation to this model in the absence of a priori information on the trader.

We simulate nonmanipulated statistical values, according to definition **4**, for *SI Appendix*, section 2. In our experimental setting the values of *SI Appendix*, section 3, where their structure is explained. A description of our code is also given in *SI Appendix*, section 3.

For each idealized trader t and a chosen value of k, we compare the observed distribution of digits to the theoretical NBL values **1** through the test statistic **7**. Formally, let

The bulk of our results deal with the simple first-digit statistic **[11]** for a wide range of pairs

Table 1 displays the estimated sizes of the test of the first-digit marginal hypothesis for both **3**. An interesting remark is that **[3]** is faster when *SI Appendix*, section 5, we also investigate the fit of the whole empirical distribution of

Our results point to the conclusion that the NBL is not a satisfactory model when **[3]** does not hold if **6** and **8** cannot be taken any longer to be equivalent.

We conclude this section with a glimpse of the performance of the two-digit statistic

## Enemy Brothers: Power and False Positive Rate

When model **7** holds with **6** that turn out to be wrong, since they refer to traders that belong to

Our first contamination instance assumes that the first two digits of **2** for most digit pairs *SI Appendix*, section 4.

We consider the simplified case where *SI Appendix*, section 2. We restrict our analysis to the market configurations for which the NBL approximation to

Table 3 shows the estimated values of P and FPR under the uniform contamination model **12** for **[12]**. The value of FPR is much higher with **6** rejected by TS is very small and the estimate of FPR is overwhelmed by its sampling variability. The choice between

Table 4 repeats the analysis under the Dirac-type scheme **13**. The contaminant distribution is now well separated from

## Corrections to Goodness-of-Fit Statistics

We now focus on the trading configurations for which the NBL does not provide a satisfactory representation of the genuine digit distribution **[9]** to obtain valid tests of hypothesis **6**. Since **[6]**. Similar testing procedures have proved to be useful in other domains, in the case of correlated observations and other distributional misspecifications (e.g., ref. 29 and the references therein).

If t is the trader of interest, let *SI Appendix*, section 2, and the resulting statistical values are collected in vector **9** computed for trader **7**, the significant-digit random variables associated to the elements of **6** at nominal test size α, and we consider trader t a potential fraudster, if

Motivated by large-scale applications, Efron (30) describes a related methodology for empirically estimating a null distribution when the standard theoretical model (such as the NBL in the case of digit counts) does not hold. This approach uses the available data to estimate an appropriate version of the distribution of the test statistic under the null hypothesis. However, it is apparent that empirical null estimation is not directly feasible when recast in the framework of models **5** and **7**. One reason is that the method generally requires a known parametric form for the null distribution, whose parameters are then estimated from the available realizations of the test statistic. Even more fundamentally, in our applied context there is no guarantee that the proportion of genuine transactions is large for each trader, that is, that **5** and **7**, thus violating a key assumption for empirical null estimation (ref. 30, p. 98).

On the other hand, the proportion of transactions that involve manipulated data and their impact on **10**. First, both **4**, since the product of independent random variables follows the NBL if only one of the factors does, regardless of the other factors (ref. 8, p. 188). We may thus expect a reduction in the contamination effect produced by a manipulated element of

Table 5 reports the estimated sizes **15** is performed at **[14]** is computed on *SI Appendix*, section 5. In all instances, comparison with the estimated sizes of the liberal **[6]** even when the asymptotic framework does not comply with the requirements of Hill’s limit theorem.

We then compute P and FPR for test **15**, under the uniform contamination model **12** and the Dirac-type contamination scheme **13**, using the same sets of *SI Appendix*, section 5, for **15** can have severe difficulties in discriminating between **[14]**, but the specific goods for which the digit distribution is obtained usually vary from trader to trader. This variability inflates the quantile estimate

We can obtain an improved estimate of the required quantile **7**. In this specification the genuine digit distribution depends not only on **[14]**. Then,**6** is rejected at nominal test size α if

The performance of the refined test procedure **16** is displayed in Table 5 (for *SI Appendix*, section 5 (for **[15]**. Power values are comparable for the three reported tests when the genuine and the contaminant digit distributions are well separated. However, our proposals are still preferred since their FPR is considerably lower than for **16** ensures that the reduction in power with respect to the **15** and **16** are recommended whenever the attained levels of FPR can be tolerated in practice.

## Case Studies

To illustrate the use of the proposed procedure and its ability to detect relevant value manipulations, we first discuss the case of a trader extracted from an archive of fraudulent declarations provided by the Italian customs after appropriate data anonymization. The same archive was also used in ref. 6. The trader under scrutiny has *SI Appendix*, section 7 for further details. However, the plots for this trader do not provide clear evidence of substantial undervaluation or of other major anomalies, although two of the declarations displayed in Fig. 1, *Center* were found to be fraudulent after substantial investigation. Our testing procedure instead produces a strong signal of contamination of the digit distribution. In fact, restricting for simplicity to the first digit, we obtain **15**, we can thus conclude that hypothesis **6** can be safely rejected when the focus is shifted from individual transactions, as in Fig. 1, to the whole trader activity, as in our test.

The strength of evidence against the null may suggest the existence in the administrative records of this trader of a larger number of manipulated declarations than the two already detected. It also suggests that our method could be helpful in providing authorities with evidence of potential fraud among traders not previously classified as fraudsters or even not considered as suspicious. In view of contamination models **5** and **7**, and of our simulation results, we expect this information gain to be higher in the case of serial misconduct. Additional investigations for this trader are given in *SI Appendix*, section 6. Although all methods point to the same conclusion, we remark that simple graphical tools for conformance checking—such as histograms—require substantial human interpretation and thus cannot be routinely applied on thousands of traders.

We now move to (anonymized) data provided by the customs office of another EU member state, not disclosed for its specific confidentiality policy, that we label as MS2. The data were collected in the context of a specific operation on undervaluation, focusing on a limited set of products traded by fraudulent operators that have systematically falsified the import values. The traders classified as nonfraudulent were audited by the customs officers of MS2 and no indications of possible manipulation of import values were found. Although the absence of fraud can never be anticipated with certainty, we can thus place good confidence on these statements of genuine behavior. In *SI Appendix*, section 6 and Table S7 we provide empirical investigations of the first-digit distribution of the 15 traders in this small benchmark study for which **16** instead of test **15**, since the available database is limited to a basket of fraud-sensitive products, and we keep *P* value of each test, computed as *P* value from the *SI Appendix*, Table S7, whose small basket of traded products may imply spurious deviation from the NBL when the classic *SI Appendix*, section 6.

## Discussion

We have developed a principled framework for goodness-of-fit testing of the NBL for antifraud purposes, with a focus on customs data collected in international trade. Our approach relies on a trader-specific contamination model, under which fraud detection has close connections with outlier testing. We have given simulation evidence, in the context of a real EU market, showing the features of the traders for which we can expect the genuine digit distribution to be well approximated by the NBL. Our simulation experiment is an empirical study addressing this issue in detail in the context of international trade, where the contrast of fraud has become a crucial task and substantial investigations are often demanding and time consuming. We have also provided simulation-based approximations to the distribution of test statistics when the conditions ensuring the validity of the NBL do not hold. These approximations open the door to the development of goodness-of-fit procedures with good inferential properties and wide applicability.

Our methodology is general and potentially applicable to any country, or year, for which detailed customs data are available. Being mostly automatic, it is suited to be implemented in large-scale monitoring processes in which thousands of traders are screened to find the most suspicious cases. It can also be a valuable aid to the design of efficient and effective audit plans. Although we expect our general guidelines to remain valid in other empirical studies, the specific quantitative findings may clearly vary from one country (year) to another.

A bonus of our contamination approach is that it makes clear the setting in which statistical antifraud analysis takes place. Our conformance testing procedures mainly aim at the detection of serial fraudsters, for which information accumulates in the corresponding transaction records. The generation of low-price clusters of anomalous transactions is a typical consequence of this cheating behavior, and robust clustering techniques can also be used for its detection (e.g., ref. 4). However, rejection of our goodness-of-fit null hypotheses often provides more compelling evidence of fraud, also because it may not be easy to identify the low-price clusters that actually correspond to illegal declarations. Testing conformance to the NBL, or to another suitable distribution for genuine digits, thus shifts the detection focus from individual transactions to the full set of data from each trader.

A word of caution concerns the fact that not all possible frauds can be detected by our method, even when we restrict to manipulation of transaction values. For instance, we cannot expect any statistical procedure (including our own proposal) to have high power against data fabrication methods that preserve the validity of the NBL, at least approximately, and against occasional frauds for which statistical tests are not powerful enough. Therefore, we do not see our methodology as the ultimate antifraud tool, but as a powerful procedure to be possibly coupled with additional information. We support integration of the signals provided by our method with those obtained through alternative statistical techniques and with less technical model-free analyses—such as those developed in refs. 7 and 10—that can be applied on a restricted number of traders. Indeed, we see our approach as a suitable automatic tool for selecting the most interesting cases for additional qualitative and quantitative investigations, while ensuring control of the statistical properties of the adopted tests.

## Acknowledgments

We are grateful to Emmanuele Sordini for his contribution to the development of Web Ariadne, to Alessio Farcomeni for discussion on a previous draft, and to the reviewers for their helpful comments. The Joint Research Centre of the European Commission supported this work through the “Technology Transfer Office” project of the 2014–2020 Work Programme, in the framework of collaboration with EU member states customs and with the EU Anti-Fraud Office. This research line would not be feasible without factual collaboration of the customs services, enabled by the Hercule III Anti-fraud Programme of the European Union.

## Footnotes

- ↵
^{1}To whom correspondence may be addressed. Email: andrea.cerioli{at}unipr.it or domenico.perrotta{at}ec.europa.eu.

Author contributions: A. Cerioli, L.B., A. Cerasa, and D.P. designed research; M.M. contributed the study of economic implications of research; A. Cerioli, L.B., A. Cerasa, and D.P. performed research; A. Cerioli, L.B., A. Cerasa, and D.P. contributed new analytic tools; A. Cerioli, L.B., A. Cerasa, and D.P. analyzed data; and A. Cerioli, L.B., M.M., and D.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. A.K. is a guest editor invited by the Editorial Board.

Data deposition: The available pseudo-data files have been deposited at the Athena repository maintained by the Joint Research Centre (JRC).

*SI Appendix*, section 3 provides details on how to access them.See Commentary on page 11.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1806617115/-/DCSupplemental.

- Copyright © 2019 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).

## References

- ↵
- European Commission

- ↵
- Fogelman-Soulie F,
- Perrotta D,
- Piskorski J,
- Steinberger R

- ↵
- ↵
- Cerioli A,
- Perrotta D

- ↵
- Cerasa A,
- Cerioli A

- ↵
- Barabesi L,
- Cerasa A,
- Cerioli A,
- Perrotta D

- ↵
- Nigrini MJ

- ↵
- Berger A,
- Hill TP

- ↵
- Miller SJ

- ↵
- Kossovsky AE

- ↵
- ↵
- Knuth DE

- ↵
- Hill TP

- ↵
- Varian HR

- ↵
- ↵
- Fernandez-Gracia J,
- Lacasa L

- ↵
- ↵
- Michalski T,
- Stoltz G

- ↵
- Berger A,
- Hill TP

- ↵
- ↵
- Klimek P,
- Yegorov Y,
- Hanel R,
- Thurner S

- ↵
- Goodman W

- ↵
- Miller SJ

- Nigrini M

- ↵
- Durstchi C,
- Hillison W,
- Pacini C

- ↵
- Schatte P

- ↵
- European Commission

*EUR-Lex, Official Journal of the European Union*L 229:14–26. - ↵
- Samaniego RM,
- Sun JY

- ↵
- Fan H,
- Li YA,
- Yeaple SR

- ↵
- ↵
- Efron B

- ↵
- Palumbo F,
- Lauro CN,
- Greenacre MJ

- Perrotta D,
- Torti F

- ↵
- World Customs Organization

## Citation Manager Formats

## Article Classifications

- Physical Sciences
- Statistics

- Social Sciences
- Political Sciences

## See related content: