Regional Mapping of Vineyards Using Machine Learning and LiDAR Data

Regional Mapping of Vineyards Using Machine Learning and LiDAR Data

Adriaan Jacobus Prins, Adriaan van Niekerk
Copyright: © 2020 |Pages: 22
DOI: 10.4018/IJAGR.2020100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This study evaluates the use of LiDAR data and machine learning algorithms for mapping vineyards. Vineyards are planted in rows spaced at various distances, which can cause spectral mixing within individual pixels and complicate image classification. Four resolution where used for generating normalized digital surface model and intensity derivatives from the LiDAR data. In addition, texture measures with window sizes of 3x3 and 5x5 were generated from the LiDAR derivatives. The different combinations of the resolutions and window sizes resulted in eight data sets that were used as input to 11 machine learning algorithms. A larger window size was found to improve the overall accuracy for all the classifier–resolution combinations. The results showed that random forest with texture measures generated at a 5x5 window size outperformed the other experiments, regardless of the resolution used. The authors conclude that the random forest algorithm used on LiDAR derivatives with a resolution of 1.5m and a window size of 5x5 is the recommend configuration for vineyard mapping using LiDAR data.
Article Preview
Top

Introduction

Agriculture directly contributes about 2.5% towards the gross domestic product (GDP) of South Africa (Greyling, 2015), with another 14% contributed through related manufacturing and processing (World Wide Fund for Nature 2018). Fruits and vegetables, including grapes, make up 50.8% of food production, with about 90% produced under irrigation (Tibane, 2016). Being able to accurately assess the area covered by crops (by creating crop maps) is vital to government and agricultural-related agencies (Myburgh, 2015; Yalcin & Günay, 2016). Digital crop maps are often used to obtain agriculture statistics such as crop yield, water stress and soil properties, and can be used in agricultural regions to aid decision making (Delenne, Durrieu, Rabatel, & Deshayes, 2010; Lee et al., 2010; Turker & Kok, 2013; Van Niekerk et al., 2018).

The traditional approach to mapping field boundaries is to manually digitize them from aerial or satellite imagery. However, manual digitizing is time-consuming, labour intensive, costly, subjective and open to human error (Yalcin & Günay, 2016). A variety of semi-automated image classification techniques has consequently been attempted to improve efficiencies and reduce costs (Yan, Shaker, & El-Ashmawy, 2015). Machine learning algorithms are increasingly being used for differentiating crop types from satellite imagery (Gilbertson, Kemp, & Van Niekerk, 2017; Möller et al., 2016). These non-parametric algorithms are robust under high dimensionality (i.e. large number of input variables) and are able to deal with non-normal distributed data (Al-doski, Mansor, Zulhaidi, & Shafri, 2013; Gilbertson & Van Niekerk, 2017). Popular machine learning algorithms include decision tree (DT), neural network (NN), random forest (RF), k-nearest neighbour (k-NN) and support vector machine (SVM) (Al-doski et al., 2013).

DT recursively separates a dataset into smaller subdivisions according to defined tests at each branch (node) in the tree (Friedl & Brodley 1997). The DT consists of a start node, a set of internal nodes and a set of end nodes (leaves). The starting node is created using the entire dataset and splits (based on the value of one variable) it into internal nodes, each representing a class. When an internal node represents a single class, it is turned into a leaf (end) node (Rutkowski et al. 2014). RF is an ensemble classifier consisting of multiple DTs. The DTs are generated on a subset of training samples through replacement (bootstrap aggregation, i.e. bagging). The final classification is an average of all the classifications produced by the different DTs (Möller et al. 2007). Extreme gradient boosting (XGBoost) is an extension of traditional boosting ensemble techniques that are part of the DT family. Boosting sequentially generates models and then combines the weaker performing classifiers into one strong model that constantly improves on the previous classifier errors (Xia et al. 2017). k-NN is a distance-based classification algorithm (Weinberger, Blitzer & Saul 2006) that assigns a label to the unknown sample based on the labels of the training samples that are closest in feature space (Adejuwon & Mosavi 2010). Logistic regression (LR) is a linear model used for classification. It finds a multivariate regression relationship between a dependent variable and several independent variables (Pradhan 2010). Naïve bayes (NB) is a probabilistic classifier based on Bayes’ theorem from Bayesian statistics (Zelinsky 2009), while SVM is a non-parametric supervised classification algorithm that builds a model by mapping the training dataset into higher dimensional space. It then attempts to separate the different classes using hyperplanes that minimizes classification errors (Zheng et al. 2015). NN is modelled after the constructs of the human brain, where “intelligence” is stored in neural pathways as well as in memory. In NN the knowledge is stored in weights applied to each node (neurons) (Miller, Kaminsky & Rana 1995). Another form of an NN is a deep neural network (d-NN), which refers to a multi-layered NN.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing