A technology-based startup is defined as the grouping of people around an innovative technology-based idea with a replicable and scalable business model (Nadežda et al., 2019); it is an innovative venture that provides solutions to emerging problems or creates new demands by developing new forms of business (OECD, 2005). It is widely established that entrepreneurship is important for the wealth and economic growth of countries (Cabrera & Mauricio, 2017). In this regard, the importance of startups in information technology (SITs) lies in the revitalization of economies, directly impacting the creation of jobs, products and/or services with high added value. Moreover, various World Bank studies show that emerging technological companies, in 2017, contributed more than 5% of the gross domestic product in developed countries, moving a business of 42,300 million euros in turnover, compared to 3.1%, 34,900 million, from the previous year (World Bank, 2018); likewise, financing for these SITs, by 2021, surpassed 600,000 million dollars (Jurgens, 2022). Despite the importance of startups, still eight out of 10 ventures fail in less than five years, that is, they do not reach success (Bernard & Tariskova, 2017; Honorine & Emmanuelle, 2019). To improve this alarming situation, various efforts are being developed, including management models and indicators (Gbadegeshin et al., 2022; Satyanarayana, et al., 2021), critical factors of success (Santisteban & Mauricio, 2017), promotion policies (Horne & Fichter, 2022), and the extension of financing. In general, both the government and the private sector need to estimate the future success of a venture to direct resources and minimize risks.
The methods to predict the success of a SIT can be classified into statistical methods and machine learning (ML) methods. The studies that use statistical methods are based on logistic regression and a heuristic solution approach, with 74.8% precision as their best result (Asmoro et al., 2018). On the other hand, studies based on ML, in general, obtain better results, reaching their best precision of 89% through Extreme Gradient Boosting (XGBost) and k-nearest neighbors (KNN) (Ross et al., 2021). These results show that there are still efforts to be made in improving precision, but this depends on several elements, such as factors, data set, preprocessing method, and ML method (Krishna et al., 2016; Ross et al. 2021; Tomy & Pardede, 2018); all this deserves the building of a method to obtain a predictive model and thus achieve the best results in the prediction.
In this study, the authors propose a method to build an ML-based predictive model to predict the success of a SIT, which considers the processes of data extraction, preprocessing, and prediction. The main contributions of this paper are:
- •
Providing a systematic method to build an ML model to predict the success of a SIT that is applicable to any scenario.
- •
Showing the usability of the proposed model through its application to build nine ML models, two of them hybrid for a data set of 256 SITs.
Many definitions exist for the success of a SIT. Martens et al. (2011) defined success as the growth in sales and good profitability, while Elhedhli et al. (2014) defined it as good financial performance. Santisteban, Mauricio et al. (2021) compiled nine definitions for success: