New Research In
Physical Sciences
Social Sciences
Featured Portals
Articles by Topic
Biological Sciences
Featured Portals
Articles by Topic
- Agricultural Sciences
- Anthropology
- Applied Biological Sciences
- Biochemistry
- Biophysics and Computational Biology
- Cell Biology
- Developmental Biology
- Ecology
- Environmental Sciences
- Evolution
- Genetics
- Immunology and Inflammation
- Medical Sciences
- Microbiology
- Neuroscience
- Pharmacology
- Physiology
- Plant Biology
- Population Biology
- Psychological and Cognitive Sciences
- Sustainability Science
- Systems Biology
Quantifying the sensing power of vehicle fleets
Edited by Michael F. Goodchild, University of California, Santa Barbara, CA, and approved May 14, 2019 (received for review December 19, 2018)

Significance
Attaching sensors to crowd-sourced vehicles could provide a cheap and accurate way to monitor air pollution, road quality, and other aspects of a city’s health. But in order for so-called drive-by sensing to be practically useful, the sensor-equipped vehicle fleet needs to have large “sensing power”—that is, it needs to cover a large fraction of a city’s area during a given reference period. Here, we provide an analytic description of the sensing power of taxi fleets, which agrees with empirical data from nine major cities. Our results show taxis’ sensing power is unexpectedly large—in Manhattan; just 10 random taxis cover one-third of street segments daily, which certifies that drive-by sensing can be readily implemented in the real world.
Abstract
Sensors can measure air quality, traffic congestion, and other aspects of urban environments. The fine-grained diagnostic information they provide could help urban managers to monitor a city’s health. Recently, a “drive-by” paradigm has been proposed in which sensors are deployed on third-party vehicles, enabling wide coverage at low cost. Research on drive-by sensing has mostly focused on sensor engineering, but a key question remains unexplored: How many vehicles would be required to adequately scan a city? Here, we address this question by analyzing the sensing power of a taxi fleet. Taxis, being numerous in cities, are natural hosts for the sensors. Using a ball-in-bin model in tandem with a simple model of taxi movements, we analytically determine the fraction of a city’s street network sensed by a fleet of taxis during a day. Our results agree with taxi data obtained from nine major cities and reveal that a remarkably small number of taxis can scan a large number of streets. This finding appears to be universal, indicating its applicability to cities beyond those analyzed here. Moreover, because taxis’ motion combines randomness and regularity (passengers’ destinations being random, but the routes to them being deterministic), the spreading properties of taxi fleets are unusual; in stark contrast to random walks, the stationary densities of our taxi model obey Zipf’s law, consistent with empirical taxi data. Our results have direct utility for town councilors, smart-city designers, and other urban decision makers.
Monitoring urban environments is a challenging task; pollution, infrastructural strain, and other quantities of interest vary widely over many scales—both spatial and temporal—requiring much effort to accurately measure. The field of urban sensing seeks to solve this problem (1⇓–3). With the proliferation of powerful and affordable sensors, it promises a cost-efficient way to monitor urban phenomena at the required fine-scale spatiotemporal resolutions. Even so, traditional approaches to urban sensing, which fall into two main categories (Fig. 1), have limitations. At one extreme, airborne sensors such as satellites scan wide areas, but only during certain time windows. At the other extreme, stationary sensors collect data over long periods of time, but with limited spatial range. Recently, however, “drive-by sensing” has emerged as a new sensing strategy, which offers coverage good in both space and time (4⇓⇓–7). Here, sensors are mounted on “crowd-sourced” urban vehicles, such as cars, taxis, buses, or trucks. This piggy-backing allows the sensors to scan the wide areas traversed by their hosts, allowing the spatiotemporal profile of a city to be explored with great ease and accuracy.
Comparison of different sensing methods. Airborne sensors, such as satellites, provide good spatial coverage, but their temporal coverage is limited to the time interval when the sensors pass over the location being sensed. Conversely, stationary sensors collect data for long periods of time, but have limited spatial range. Drive-by sensing offers some advantages of both methods. By using host vehicles as “data mules,” drive-by sensing offers a cheap, scalable, and sustainable way to accurately monitor cities in both space and time.
Research on drive-by sensing has so far been technological, focused on the engineering difficulties of the sensors (8), managing the dynamic network they comprise (9, 10), and parsing the data they collect (11⇓–13). Yet, the efficiency of the crowd-sourced aspect of drive-by sensing, on which the viability of the approach rests, has not been analyzed—how many vehicles are required to accurately monitor a city’s environment? The answer to this question hinges on the mobility patterns of the host fleet; wide coverage requires the vehicles to densely explore a city’s spatiotemporal “volume.” We call the extent to which a vehicle fleet achieves this their sensing power. In what follows, we present a case study of the sensing power of taxi fleets. We choose to study taxis as sensor hosts because they are pervasive in cities and because datasets characterizing their mobility patterns are publicly available.
Consider a fleet of sensor-equipped vehicles V moving through a city, sampling a reference quantity X during a time period T. We represent the city by a street network S, whose nodes represent possible passenger-pickup and -dropoff locations and whose edges represent street segments potentially scannable by the vehicle fleet during T. We use the proviso “potentially scannable” since some segments are never traversed by taxis in our datasets and so are permanently out of reach of taxi-based sensing, as further discussed in SI Appendix. To model the taxis’ movements, we introduce the taxi-drive process, a schematic of which is presented in Fig. 2 A–C. The model assumes that taxis travel to randomly chosen destinations via shortest paths, with ties between multiple shortest paths broken at random. Once a destination is reached, another destination is chosen, again at random, and the process repeats. To reflect heterogeneities in real passenger data, destinations in the taxi-drive process are not chosen uniformly at random. Instead, previously visited nodes are chosen preferentially: The probability
Taxi-drive process. A–C show a schematic of the taxi-drive process. (A) A taxi picks up a passenger at node A. Then, a destination node B (blue circle) is randomly chosen. (B) The shortest path between A and B is taken (dashed arrow). No edges have yet been sensed. (C) After the edges connecting A and B have been traversed by the sensor-equipped taxi, they become “sensed,” which we denote by coloring them red. Now, at B, the taxi proceeds to its next pickup at, say, C. There are two shortest paths connecting B and C, so one is chosen at random. This process then repeats. (D) Distribution of street-segment popularities p predicted by the taxi-drive process (blue histogram) agrees with empirical data from Manhattan (brown histogram). (E) By contrast, a random-walk model of taxi movement (i.e., a random walk performed on the street) incorrectly predicts a skewed, unimodal distribution of street-segment popularities, in qualitative disagreement with the data. For D and E, the (directed) Manhattan street network on which the taxi-drive and random-walk processes were run was obtained by using the Python package “osmnx.” The taxi-drive parameter β was 1.5, and the process was run for
Results
To compare our model to data, we quantify the sensing power of a vehicle fleet as its covering fraction
We have computed
We find that, despite its simplicity, the taxi-drive process captures the statistical properties of real taxis’ movements. Specifically, it produces realistic distributions of segment popularities
Having obtained the segment popularities
Fig. 3 compares the analytic predictions for
Sensing power
Fig. 4 tests for universality in the
Scaling collapse. Empirical street-covering fractions
The fast saturation of the
Discussion
Requiring that segments be scanned just once a day, as assumed in our analysis, could be too coarse a temporal resolution for some urban quantities which we desire to monitor. Air quality, for instance, has large temporal variations and would therefore require multiple readings dispersed evenly over time to be adequately sensed. To see if drive-by sensing can accommodate more demanding temporal requirements, we derived (SI Appendix) an expression for the adjusted sensing power
Because taxis are concentrated in commercial and tourist areas, taxi-based drive-by sensing has an inherent spatial bias. This bias could have harmful consequences, such as underserving socioeconomically disadvantaged neighborhoods. A hybrid approach to sensing could overcome this pitfall. Sensor-equipped taxis could be used to scan popular areas of a city, while the remaining hard-to-reach areas could be scanned by vehicles dedicated exclusively to sensing (as opposed to third-party vehicles on which drive-by sensing “parasitically” relies). We discuss the spatial bias of drive-by sensing comprehensively in SI Appendix.
There are many ways to extend our results. To keep things simple, we characterized the sensing power of taxi fleets with respect to the simplest possible cover metric: the raw number of segments traversed by a taxi at least once,
Taxis traveling in cities share some of the features of nonstandard diffusive processes. Like Levy walks (17) or the run-and-tumble motion of bacteria (18), their movements are partly regular and partly random. As such, they produce stationary densities on street networks that obey Zipf’s law, contrary to a standard random walk. Future work could examine if other aspects of taxis’ spreading behavior are also unusual. Perhaps the hybrid motion exemplified by taxis offers advantages in graph exploration (19), foraging (20), and other classic applications of stochastic processes (21).
The work most closely related to drive-by sensing is on “vehicle-sensor networks” (22). Here, sensors capable of communicating with each other are fitted on vehicles, resulting in a dynamic network. The ability to share information enables more efficient, “cooperative” sensing, but has the drawback of large operational cost. Most studies of vehicle-sensor networks are therefore in silico (23). Since the sensors used in drive-by sensing do not communicate, drive-by sensors are significantly cheaper to implement than vehicle-sensor networks.
Vehicles other than taxis can be used for drive-by sensing. Candidates include private cars, trash trucks, or school buses. Since putting sensors on private cars might lead to privacy concerns, city-owned buses or trucks seem better choices for sensor hosts. The mobility patterns of school buses and trash trucks are, however, different from those of taxis; they follow fixed routes at fixed times, limiting their sensing power.
The diverse data supplied by drive-by sensing have broad utility. High-resolution air-quality readings can help combat pollution, while measurements of air temperature and humidity can help improve the calibration of meteorological models (24, 25) and are useful in the detection of gas leaks (26). Degraded road segments can be identified with accelerometer data, helping inform preventive repair (27, 28), while pedestrian-density data can be helpful in the modeling of crowd dynamics (29). Finally, information on parking-spot occupancy, Wi-Fi access points, and street-light infrastructure—all obtainable with modern sensors—will enable advanced city analytics as well as facilitate the development of new big-data and internet-of-things services and applications.
In short, drive-by sensing will empower urban leaders with rich streams of useful data. Our study reveals these to be obtainable with remarkably small numbers of sensors.
Materials and Methods
We derive an expression for the sensing power of a vehicle fleet. We quantify this by their covering fraction
Imagine we have a population P of taxi trajectories. We define a taxi trajectory
Trajectories with Unit Length.
Let L be the random length of a trajectory. The special case of
Trajectories with Fixed Length.
Trajectories of fixed (i.e., nonrandom) length
Trajectories with Random Lengths.
Generalizing to random L is straightforward. Let
Extension to Vehicle Level.
Translating our analysis to the level of vehicles is straightforward. Let B be the random number of segments that a random vehicle in V covers in the reference period T (in SI Appendix, Fig. S4, we show how B are distributed in our datasets). Then, we simply replace
Model Parameters.
The parameters
Acknowledgments
We thank Allianz, Amsterdam Institute for Advanced Metropolitan Solutions, Brose, Cisco, Ericsson, Fraunhofer Institute, Liberty Mutual Institute, Kuwait–MIT Center for Natural Resources and the Environment, Shenzhen, Singapore–MIT Alliance for Research and Technology, UBER, Victoria State Government, Volkswagen Group America, and all of the members of the MIT Senseable City Laboratory Consortium for supporting this research. S.H.S. was supported by NSF Grants DMS-1513179 and CCF-1522054.
Footnotes
- ↵1To whom correspondence may be addressed. Email: kokeeffe{at}mit.edu.
Author contributions: K.P.O., A.A., P.S., and C.R. designed research; K.P.O., A.A., S.H.S., P.S., and C.R. performed research; K.P.O. and A.A. analyzed data; and K.P.O., S.H.S., and P.S. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1821667116/-/DCSupplemental.
- Copyright © 2019 the Author(s). Published by PNAS.
This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).
References
- ↵
- N. D. Lane,
- S. B. Eisenman,
- M. Musolesi,
- E. Miluzzo,
- A. T. Campbell
- ↵
- D. Cuff,
- M. Hansen,
- J. Kang
- ↵
- T. Rashed,
- C. Jürgens
- ↵
- U. Lee,
- M. Gerla
- ↵
- B. Hull et al.
- ↵
- P. Mohan,
- V. N. Padmanabhan,
- R. Ramjee
- ↵
- A. Anjomshoaa et al.
- ↵
- J. H. Ahnn,
- M. Potkonjak
- ↵
- A. Skordylis,
- N. Trigoni
- ↵
- M. J. Piran,
- G. R. Murthy,
- G. P. Babu,
- E. Ahvar
- ↵
- I. Turcanu,
- P. Salvo,
- A. Baiocchi,
- F. Cuomo
- ↵
- R. Bridgelall
- ↵
- G. Alessandroni et al.
- ↵
- ↵
- ↵
- Y. Song,
- H. J. Miller,
- X. Zhou,
- D. Proffitt
- ↵
- M. F. Shlesinger,
- J. Klafter,
- B. J. West
- ↵
- M. J. Schnitzer
- ↵
- B. Tadić
- ↵
- G. M. Viswanathan,
- M. G. Da Luz,
- E. P. Raposo,
- H. E. Stanley
- ↵
- D. Ben-Avraham,
- S. Havlin
- ↵
- D. Van Le,
- C.K. Tham,
- Y. Zhu
- ↵
- M. Gerla,
- J. T. Weng,
- E. Giordano,
- G. Pau
- ↵
- M. I. Mead et al.
- ↵
- ↵
- P. S. Murvay,
- I. Silea
- ↵
- T. M. Nadeem,
- M. T. Loiacono
- ↵
- M. Wang,
- R. Birken,
- S. S. Shamsabadi
- ↵
- M. B. Kjærgaard,
- M. Wirz,
- D. Roggen,
- G. Tröster
- ↵
- B. R. Cobb,
- R. Rumi,
- A. Salmerón
Citation Manager Formats
Sign up for Article Alerts
Article Classifications
- Physical Sciences
- Sustainability Science
- Social Sciences
- Environmental Sciences