Optimized Data Mining Techniques for Outlier Detection, Removal, and Management Zone Delineation for Yield Prediction

Optimized Data Mining Techniques for Outlier Detection, Removal, and Management Zone Delineation for Yield Prediction

Roopa G. M., Arun Kumar G. H., Naveen Kumar K. R., Nirmala C. R.
DOI: 10.4018/978-1-5225-9632-5.ch010
(Individual Chapters)
No Current Special Offers


Enormous agricultural data collected using sensors for crop management decisions on spatial data with soil parameters like N, P, K, pH, and EC enhances crop growth for soil type. Spatial data play vital role in DSS, but inconsistent values leads to improper inferences. From EDA, few observations involve outliers that deviates crop management assessments. In spatial data context, outliers are the observations whose non-spatial attributes are distinct from other observations. Thus, treating an entire field as uniform area is trivial which influence the farmers to use expensive fertilizers. Iterative-R algorithm is applied for outlier detection to reduce the masking/swamping effects. Outlier-free data defines interpretable field patterns to satisfy statistical assumptions. For heterogeneous farms, the aim is to identify sub-fields and percentage of fertilizers. MZD achieved by interpolation technique predicts the unobserved values by comparing with its known neighbor-points. MZD suggests the farmers with better knowledge of soil fertility, field variability, and fertilizer applying rates.
Chapter Preview


Agriculture plays a major role in overall socio-economic development of a Country but nowadays agricultural contribution towards the economic growth is steadily decreasing with country’s wide ranged economic growth. But still, agriculture is demographically the largest economic sector. In most of the countries, agriculture is a combination of traditional and modern farming techniques. Current agricultural practices are neither economically nor environmentally good. The average size of land holdings is very small and is subjected to fragmentation where such small holdings are repeatedly over-manned, which results in low productivity. Thus, Precision Agriculture as a novel approach is suggested to tackle the above mentioned problems and it also creates new opportunities for data intensive science in the multi-disciplinary agro-environmental domain (Rob Lokers et al., 2016).Precision Agriculture is a generic term which includes the knowledge of plant and animal science and practical application of procedures (machines, treatments, tools, supplies). In broad sense Precision Farming or Precision Agriculture can be efficiently defined as the use of information technology to improve the decision making process in agricultural production. The data chain interacts with farm processes and farm management processes through various decision making processes in which information plays an important role (Sjaak Wolfert et al.,2017).Precision Agriculture practice contributes in improving the efficiency of production and decreasing environmental impact. Since the beginning of Precision Agriculture technology epoch, patterns of crop variability have been considered crucial for variable rate nutrient management. It also acts as a unique crop production business which is dependent on many climate and economy factors soil, climate, cultivation, irrigation, fertilizers, temperature, rainfall, harvesting, pesticide weeds and other factors (Jharna M. et al., 2017).

Components Involved in Precision Agriculture

Precision agriculture is information intensive field as it requires essential layers of data in order to provide the necessary information for precise decision making. The initial stage of implementing Precision Agriculture involves the process of collecting geo-reference crop yield data which is referred as yield-mapping that results in a document which represents the spatial pattern of crop yield and variables which are present during the plantation period. Yield maps as evidence examines sufficient farm spatial variability to implement site specific nutrient/fertilizers management. For such sufficient spatial variability soil sampling process is required to characterize soil properties to formulate management zones for application inputs. Further, to process and interpret the soil information the quantitative methods are used in digital soil mapping phase. For such given application inputs the general decision making treats enter field area as homogeneous that are unique to each zone in management zone phase . Next, Variable-Rate-Technology (VRT) permits the agriculture input like fertilizers/ pesticides and herbicides applied on-the-go all over the field at suitable rates according to the application map. Finally, in site-specific crop management phase the resource applications are matched with soil attributes and crop requirement for better yield productivity.

Outliers in Spatial Data Context

With the growth and usage of spatial data in precision agriculture, the challenges arise with the need to retrieve the useful spatial information which mainly emphasizes for applying better data pre-processing techniques. While obtaining the spatial yield datasets from remote sensor/ yield monitoring sensors and GPS various random and systematic errors may occur due to natural topographic conditions and measurement errors. Such errors have huge impact on yield measurements that creates unrealistic measurements and inaccurate inferences. In order to gain better understanding of spatial data information such errors should be removed from the crop yield dataset (Miguel et al., 2014 and Qiao Cai et al., 2013). Detection or removal of errors in spatial yield data has significant importance in site-specific crop management. In spatial data context, the observations whose non spatial attribute deviates remarkably from other observations within its spatial proximity are termed as outliers. Outlier analysis is carried out to identify abnormal activities (B. A. Sabarish et a., 2018) that comprises a small portion of the whole dataset and reside in small clusters in sparse region and behave differently relative to the majority of the normal data (Yuan Wang et al., 2016; Nita M. et al., 2015).

Complete Chapter List

Search this Book: