Soybean Price Pattern Discovery Via Toeplitz Inverse Covariance-Based Clustering

The﻿high﻿volatility﻿of﻿world﻿soybean﻿prices﻿has﻿caused﻿uncertainty﻿and﻿vulnerability﻿ particularly﻿in﻿the﻿developing﻿countries.﻿The﻿clustering﻿of﻿time﻿series﻿is﻿a﻿serviceable﻿ tool﻿ for﻿ discovering﻿ soybean﻿ price﻿ patterns﻿ in﻿ temporal﻿ data.﻿ However,﻿ traditional﻿ clustering﻿method﻿cannot﻿represent﻿the﻿continuity﻿of﻿price﻿data﻿very﻿well,﻿nor﻿keep﻿ a﻿watchful﻿eye﻿on﻿the﻿correlation﻿between﻿factors.﻿In﻿this﻿work,﻿the﻿authors﻿use﻿the﻿ Toeplitz﻿ Inverse﻿ Covariance-Based﻿ Clustering﻿ of﻿ Multivariate﻿ Time﻿ Series﻿ Data﻿ (TICC)﻿ to﻿ soybean﻿ price﻿ pattern﻿ discovery.﻿ This﻿ is﻿ a﻿ new﻿ method﻿ for﻿ multivariate﻿ time﻿series﻿clustering,﻿which﻿can﻿simultaneously﻿segment﻿and﻿cluster﻿the﻿time﻿series﻿ data.﻿Each﻿pattern﻿in﻿the﻿TICC﻿method﻿is﻿defined﻿by﻿a﻿Markov﻿random﻿field﻿(MRF),﻿ characterizing﻿the﻿interdependencies﻿between﻿different﻿factors﻿of﻿that﻿pattern.﻿Based﻿ on﻿this﻿representation,﻿the﻿characteristics﻿of﻿each﻿pattern﻿and﻿the﻿importance﻿of﻿each﻿ factor﻿can﻿be﻿portrayed.﻿The﻿work﻿provides﻿a﻿new﻿way﻿of﻿thinking﻿about﻿market﻿price﻿ prediction﻿for﻿agricultural﻿products.

The instability of soybean prices will bring huge risks to farmers, governments, consumers, and other commercial entity involved in soybean market.Therefore, accurateanalysisofsoybeanmarketpriceneedstobetakenseriously.
Compared with the traditional clustering, TICC is a new type of modelbased multivariate time series clustering method, which can find the accurate and inter pretable str ucture in the data.TICC descr ibes each cluster with different MRFs and solved the problem of simultaneous segmentation and clusteringthroughalternatingminimization,usingavariationoftheexpectation maximization(EM)algorithm. Forsimplicityofnotation,weconsideratimeseriesofTsequentialobservations, isthei-thn-dimensionaltimeseriesdata.However, insteadoflookingatx t separately,weinterceptashortsubsequenceofsizew though TICCalgorithmisrelativelyrobusttotheselectionofthiswindowsizeparameter, . TheoperationoftheTICCcanclustertheseTobservationsintoKclustersorstates, where the set of observations of cluster j is denoted as Here, the submatrix of the PQ position describes the covariance inverse matrix between timePandtimeQ.Inotherwords, Θ i isapartitionedToeplitzmatrixthatcanbe expressedinthefollowingform: TICCclusterbysolvingtwokeyproblemsinturn:Assignpointstoclusters,where wewilluseadynamicprogramming(DP)algorithm,andUp-dateClusterParameters, wherewesolvetheToeplitzgraphicallassoproblemusinganalgorithmbasedonthe alternatingdirectionmethodofmultipliers(ADMM).

Assign Points to Clusters
Given Θ j , the cost of assigning X t into cluster j is equivalent to the negative logarithmiclikelihood: Inaddition,consideringthecontinuityoftheobservations,smoothnesspenalty parameter β isimposedwhentheadjacentdataisnotbelongtothesamecluster: AtypicalAssembly-Lines-Schedulingproblemiscomprisedbythesetwocosts, which can be solved efficiently by dynamic programming algorithm.Dynamic programming (usually referred to as DP) is a powerful technique for solving complexproblemsbydecomposingthemintoasetofsimplersubproblems,solving each of those subproblems just once, and using memory-based data structures (arrays, maps, and so on) to store their solutions.(Bellman, 1957).The core of the algorithm lies on only considering the cost of the i − 1 th observation value whenassigningthe i thobservations.

EXPERIMENTS
Heilongjiangaccountsforabout60percentofChina'ssoybeanacreage,producing 1.5 million tonnes a year, about 11% of the country.In this section, we attempt to modelthecommonpatternsofsoybeanpriceinHeilongjiangprovince.InHallac's literature,TICCmethodhasshowngoodperformancewhencomparedwithseveral state-of-the-artbaselinessuchasGaussianMixtureModel(GMM)andDynamicTime Warping(DTW).Toprovideacompleteevaluationofthemodels,Hallacvalidated TICCapproachbycomparingTICCtoseveralstate-of-the-artbaselinesinaseriesof syntheticexperimentsandanautomobilesensordatasetrespectively.Andthefinding was that compared to other several well-known time series clustering approaches, TICC not only has high accuracy, but also significantly outperforms the baselines in scalability and interpretability.For these reasons, our algorithms which named Toeplitz Inverse Covariance-based Clustering (TICC) can be applied to find price patternproperly.Themainexperimentalstepsareasfollows: Step 1:Choosethefactors.

Figure 2 .
Figure 1.Multiple time series and clustering results.Each curve represents a set of data.

Figure 3 .
Figure 3. Schematic diagram of soybean market price in the same period