Mapping Crops with Smartphone Crowdsourcing, Satellite Imagery, and Deep Learning

6 min readOct 1, 2020

Farmers with smartphones use the Plantix app to take photos of sick crops, and then a deep neural network diagnoses the plant with a disease or nutrient deficiency.

The Smallholder Data Gap

Smallholder farms — holdings of less than 2 hectares — produce one-third of global food consumed, employ 2 billion people, and make up 84% of the world’s farms. Yet our collective knowledge of their food production remains limited: questions like what crop types smallholders are growing and where they grow them remain unanswered, making it hard to track yield progress, study farming practices, and design agricultural policies. The data gap exists because most smallholders are located in countries where the ability to conduct surveys — the traditional way of obtaining farm-level information — is still nascent or under development.

In recent years, satellite imagery has become available at higher spatial and temporal resolutions than ever before, offering another option to observe smallholder systems at low cost and global scale. At the same time, advances in machine learning — deep learning in particular — have enabled the automatic extraction of insights from unstructured data like images, text, and audio. By applying machine learning methods to satellite imagery, researchers have created maps of forest cover and cropland all around the world. But training machine learning methods requires ground truth labels; in agriculture, where labels are usually collected through surveys, this brings us back to looking for sources of ground truth in smallholder systems.

The Potential of Smartphone Crowdsourcing

While surveys and other infrastructure are often lacking in developing countries, one technology has become widespread: the mobile phone. Even in Sub-Saharan Africa, there are now 82 mobile cellular subscriptions for every 100 people, compared to 48 percent of the population with access to electricity (World Bank Open Data 2018). Such widespread mobile phone use opens up the possibility of collecting data directly from individuals, and previous works explored this option for gathering household data and mapping poverty (Dillion 2012, Blumenstock et al. 2015). In the case of classifying crop types in satellite imagery, the ideal crowdsourced dataset would be people telling us what type of crop is growing at specific geolocated coordinates.

“Hey Siri, what’s wrong with my crop?”

Where can we find such a dataset? Enter Plantix, a free mobile app that helps farmers diagnose what’s wrong with their crops. Farmers submit a photo of their plant through the app and receive a diagnosis of disease or nutrient deficiency along with a treatment plan. To produce this diagnosis, Plantix feeds the farmer’s photo through a deep neural network that classifies the crop type and illness present in the photo. The location of the farmer’s mobile phone is logged when the photo is taken, resulting in a dataset of crop types at geolocated coordinates — exactly what we’re looking for!

The geographic distribution of submissions received by Plantix in AP and Telangana across 10 crop types.

The Plantix app has seen huge uptake in smallholder regions, where high quality farming advice and agricultural products can be hard to come by. Eighty percent of the 1.1 million monthly active users are in India, specifically in the states of Andhra Pradesh (AP) and Telangana. To test whether Plantix data can be used to map crop types, we started in these two states for the years 2017–2019, during which the app received 1.8 million submissions, and focused on the two main crops in the region: rice and cotton.

Quality Control in Crowdsourcing

Although crowdsourcing can collect a lot of data, it also has a downside — that data can be very noisy. The Plantix submissions had a few key sources of noise: mobile phone location inaccuracy due to a battery/accuracy tradeoff, farmers taking photos outside of crop fields, and the Plantix neural network sometimes misclassifying crop types. The submissions were also highly clustered around metropolitan areas, which reflected mobile phone and internet access rather than the distribution of cropland. We found that training a satellite image-based crop type classifier on the raw submissions didn’t work at all. We had to construct a training set that was better in quality and geographically representative of the region.

We trained a CNN to classify whether a Plantix submission came from inside a crop field.

Performing quality control required a mix of strategies, some more complex than others. For example, location inaccuracy was addressed simply using the location accuracy measured by the mobile OS, which gives a 67% confidence radius (in meters). Meanwhile, removing submissions taken outside of crop fields, which happened when farmers stood at the edge of fields or took a plant sample home, was more involved. We trained a convolutional neural network (CNN) to detect whether the submission came from inside a crop field using high-resolution DigitalGlobe images. Since labels for this task could not be derived from the Plantix app or farmer submissions, we labeled 3000 DigitalGlobe images to train the CNN.

In the end, we kept submissions with location accuracy of at least 50m and coming from inside crop fields. We also resampled the submissions in a spatially uniform manner to avoid overfitting to areas that are overrepresented in the dataset.

Classifying Crop Types

Next, we used the filtered submissions to supervise the classification of satellite images. Different crop types appear different in color and other wavelengths sensed by satellites, and may also be planted and harvested at different times of year. But these patterns vary by region, making the problem more suitable for supervised machine learning than rule-based or unsupervised methods. We tried a few machine learning methods and found that a 1-dimensional CNN trained on satellite image time series performed the best. The 1D CNN was able to distinguish rice, cotton, and “other crops” with 74% accuracy (the most common class, rice, was 39% of the dataset) — far from perfect, but much better than random guessing.

Satellite imagery is turned into machine-ingestible features, which are then used with crowdsourced labels to classify pixels into crop types (rice, cotton, or other). Rice was often classified along riverbanks.

Training Set vs. Validation Set Noise

We also experimented with adding low-quality submissions to the training and validation sets, and came away with two conclusions.

We added noisy Plantix submissions — submissions not taken inside a crop field — to the training and validation sets to explore the effect of dataset noise on the 1D CNN’s real and perceived performance.

First, it’s important to have a high-quality validation set to evaluate the performance of a model. In our case, adding submissions with uncertain locations or submissions not from inside a crop field to the validation set biased the validation classification accuracy downward significantly.

On the other hand, adding noisy submissions to the training set only slightly degraded performance on the validation set, suggesting that the 1D CNN was robust to moderate levels of training noise. In remote sensing applications, there is often a tradeoff between quality and quantity when gathering ground truth labels. Our experiments suggest that resources should be invested to collect high-quality labels for validation, but this calculus shifts in favor of quantity when it comes to the training set. More training points, even if lower in quality on average, can build more robust models that generalize across larger geographic regions.

We applied the crop type classifier to all of AP and Telangana (over 230,000 square km) using Google Earth Engine.

Final Thoughts

Our study is the first to explore using crowdsourced data to augment or replace ground surveys for land use mapping at a large scale. With quality control steps like filtering on location accuracy and resampling geographically, datasets like Plantix can be used to map land use, and a moderate amount of noise is acceptable in the training set. Work remains to remove lingering noise in the validation set, mask out clouds in satellite imagery, and ensure that crowdsourced samples are representative of the population. Still, as mobile phones and internet access become even more widespread around the world, crowdsourcing is poised to become an increasingly useful alternative to traditional fieldwork.

References:

World Bank Open Data, Mobile Cellular Subscriptions (per 100 people) and Access to Electricity (% of population). https://data.worldbank.org/
Dillon, B. (2012). Using mobile phones to collect panel data in developing countries. Journal of International Development 24(4), 518–527.
Blumenstock, J., Cadamuro, G., and On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science 350(6264), 1073–1076.

About the Author:

Sherrie Wang is a Ph.D. candidate at Stanford’s Institute of Computational and Mathematical Engineering and the Center on Food Security and the Environment.

Mapping Crops with Smartphone Crowdsourcing, Satellite Imagery, and Deep Learning

Written by StanfordFoodSecurity