Organizational Influencers in Open-Source Software Projects

Traditional software development is shifting toward the open-source development model, particularly in the current environment of competitive challenges to develop software openly. The author employs a case study approach to investigate how organizations and their affiliated developers collaborate in the open-source software (OSS) ecosystem TensorFlow (TF). The analysis of the artificial intelligence OSS library TF combines social network analysis (SNA) and an examination of archival data by mining software repositories. The study looks at the structure and evolution of code-collaboration among developers and with the ecosystem’s organizational networks over the TF lifespan. These involved organizations play a particularly critical role in development. The research also looks at productivity, homophily, development, and diversity among developers. The results deepen the understanding of OSS communities’ collaborative developer and organization patterns. Furthermore, the study emphasizes the importance and evolution of social networks, diversity, and productivity in ecosystems.


INTROdUCTION
In an environment of technological complexity and competitive challenges, software is no longer developed only in-house by a single firm. Instead, there is a collaboration among a community of volunteers, in-house developers, developers from partner companies, universities, and even competitors (Bengtsson & Kock, 2000;Ghobadi & D'Ambra, 2012). Numerous famous firms support OSS projects and cooperate with others (Capiluppi et al., 2012).
This collaboration with competitors has given rise to the concept of "coopetition," a term describing the coexistence of competition and cooperation and the interaction between companies that have a partial congruence of interests (Linåker et al., 2016;Nguyen Duc et al., 2017). There are many examples of open-source initiatives exemplifying coopetition such as WebKit, Blink, OpenStack, and CloudFoundry Teixeira & Lin, 2014;Nguyen Duc et al., 2017). Another example of a large-scale programming cooperation is autonomous vehicles. In this context, the experience & Finnegan, 2014). However, coopetitive relationships are complex, difficult to manage, and create organizational conflicts (Tidström, 2009).
Software forges are especially useful for large OSS projects and OSS ecosystems (Cosentino et al., 2017). The platform GitHub represents the newest generation of software forges, combining traditional capabilities (e.g., free hosting and a version control system) with several social features (Guendouz et al., 2015;Squire, 2014). It also supports the Git version-control system, with its features, social interactions (e.g., bug-tracking, issue-tracking, pull requests support, and profiles), and a powerful GitHub API to provide access to metadata around its hosted software projects (Rashid & Prakash, 2022).
Software development in OSS ecosystems highly depends on social aspects such as effective interaction among people, companies, and human factors that have been identified as key to successful software projects (Amrit et al., 2014;Biçer et al., 2011;Holmstrom et al., 2006;Oliveira et al., 2018;Yilmaz et al., 2016). The success of effective and efficient software development in ecosystems continues to be influenced significantly by collaboration in social networks (Latorre & Suárez, 2017). The inadequate exchange of information and communication between developers and users, which is entrenched in social networks, can cause the downfall of software development endeavors (Charette, 2005). This is why these networks should be developed and managed purposefully to enhance the effectiveness and efficiency of software development (Pryke & Smyth, 2012). For this reason, it is essential to carefully construct, manage, and analyze these networks to maximize software development productivity (Hinds & Lee, 2009).
These social aspects are the fundamental aspect of the investigation in this research study. Social network analysis (SNA) of OSS ecosystems has the potential to reveal hidden structural issues, top influencers, and collaboration patterns (Fischbach et al., 2009;Šmite et al., 2017), knowledge of which can aid in ensuring success.
There is considerable research into the static features of the communication networks of community members and the structural characteristics of developer collaboration networks in OSS ecosystems. The development of software in large OSS ecosystems is knowledge-intensive, and the interactions between the members are complex and self-organized (Behfar et al., 2018;Shah, 2006). Research to date has focused on social project structure and clique analysis, which includes the topics of core periphery (core team and enhanced team) and cluster (Concas et al., 2017;El Asri et al., 2017;Joblin et al., 2017;Toral et al., 2009;Yu et al., 2016) in particular. Other findings support that cohesive social ties between team members in their social networks leads to more productivity (Lee et al., 2013;Singh et al., 2011). However, research on the dynamics of social networks in software development ecosystems over the entire course of the ecosystem lifecycle is missing. In addition, there is a lack of work on forming subgroups and their behavior within software development ecosystem communities (Herbold et al., 2021;Schreiber & Zylka, 2020;Seker et al., 2022).
We examine the TensorFlow (TF) OSS ecosystem community as a case study for investigating the developer and organizational collaboration network. The TF ecosystem is an important, mature, TF is an end-to-end platform for ML with a comprehensive, flexible ecosystem of tools, libraries, and a strong community. It is a structured ML platform that provides a variety of builtin capabilities, and supports add-on libraries and APIs for key company concerns, such as model building, development of deployment pipelines, and powerful experimentation. Its main feature is that it allows developers to create and deploy state-of-the-art ML-powered applications. The OSS ecosystem was initiated by Google Brain and released under the Apache 2.0 open-source license in November 2015 to accelerate its evolution with the support of a larger community. There are about 3,000 developers involved in the TF ecosystem, 86 subprojects, 3 million lines of code, and 161,000 dependent repositories. It has many supporters, including Google, Dropbox, Airbus, and JD.com. Moreover, TF is a popular and dominant ML framework in commercial production (Dinghofer & Hartung, 2020;Han et al., 2020).

ReSeARCH QUeSTIONS
Our research focus is the TF coding-collaboration relationship between developers and the organizations for which they work. We analyze the different forms of collaboration, rivalry, competition, and cooperation that take place within the OSS development of the TF ecosystem (Table 2).

MeTHOdOLOGy
Our case study combines qualitative analysis of mining software repositories and the use of SNA on publicly available data from the TF OSS ecosystem. Figure 1 summarizes the study design.

data Collection
The entire project we explored comprises three principal projects-tensorflow (basic library of TF, written in small letters), docs (documentation), and community (project for documents used by TF developer)-and 86 subprojects. We then mined the code and logs of software repositories to obtain deeper insights into the community. The git system records all commits of the ecosystem, with comments, and documents them in the changelog. Therefore, we extracted the relevant information from the changelog for our study ( Figure 2). We then connected the developers and firms that worked on the same files and constructed a social network based on the collaboration patterns.

Research Method
During this research, we constructed the collaboration network by going through the multiple stages of the primary projeact tensorflow. Our data, drawn from the extraction process, spanned the period of November 9, 2015, to July 24, 2020. TF is a fast-moving, community-supported OSS ecosystem.

No.
Research questions

RQ1
How does collaboration in the TF OSS ecosystem evolve?

RQ2
Which organizations contribute to the ecosystem?

RQ3
Which organizations collaborate in the TF OSS ecosystem?

RQ4
Is there a tendency toward clustering among developers from the same company (homophily)?
Initially, the developers were distinguished by email address and were assigned a company if indicated in the changelog records. Depending on the subsequent evaluation, a node represents a developer or a company in SNA. There is a collaboration when different developers edit a file together, designed as an edge between different nodes. Figure 3 shows the specific construction process of the network of developers and companies, with the example of different release versions and distinct developers. The different developers A, B, C, and D were employed by various software development firms, such as Company A, Company B, or Company C. Every new version is treated as an independent knowledge product produced by the developers who contributed to the code for that version. For a collaborative relationship between two developers to be established, they must both have had an active role in the release of the same version of the source code. That allows the construction of different collaboration networks of developers and companies for each time slice. In Figure 3, there is an aggregation in version 1.2 from the developers' to the companies' network because Developer A and Developer B are employed by Company A while Developer C is employed by Company B.
Based on this method and the extracted relationships, we could construct the social network at a clearly defined time ( Figure 3). Finally, we assembled the entire social network for SNA at different evolution stages by time and release dates.
In line with existing guidelines on combing digital trace data with SNA (Howison et al., 2011), we constructed the different collaboration networks of developers and companies for each time slice to examine the evolution of social networks in TF ecosystems. That way, we could assess how the collaboration has evolved and uncover interesting patterns. We analyzed the social network data and visualizations with Gephi (v0.9.2) (Bastian et al., 2009) and the plugins MultiMode Networks Transformation, Node Color Manager, and Groups by partition. Table 3 lists the major releases of TF in the main project tensorflow addressed in this study.
Furthermore, we applied different SNA methods to investigate the constructed networks more deeply. As a result, we can describe social networks through rational and structural characteristics. The rational dimension focuses on the links between pairs of individuals and can be described in  Table 4 lists the basic social network properties used in our investigation and the dimensions we encountered in our case study.
Based on these basic social network properties, many social network theories provide additional perspectives on complex social ties in ecosystems. We identified relevant SNA theories for our research, which are summarized in Table 5.

An Overview of the developer Collaboration Over Time
In this section, we address RQ1 for an overview of the developer collaboration evolution over time in the TF ecosystem. We used archival history data from the source version control system and covered more than five years (2015-2020) of the ecosystem's life. The research is based on the main versions of the TF OSS.
The entire TF ecosystem works with pull requests to ensure the project quality and goals, meaning that one TensorFlow team member will check the pull request with code changes and approve it. After the approval, the pull request is merged automatically on GitHub. Final releases are tagged with version labels (e.g., v1.1.3). After that, the master branch of TF is primed for further feature development and bug fixes to be implemented.
To get a better view of the developer collaboration over time, we analyze structural social network metrics: the sum of the relevant developers, network density, and the number of communities. We

Density
The number of direct ties in a network as a ratio of the total number of possible links (Wasserman & Faust, 1994).

Degree centrality
The number of direct ties to other nodes (Scott, 2013).

Diversity
Simpson's Diversity Index shows the diversity of the open-source developer community's diversity based on developer associated organizations (Jiang et al., 2018).

Theory concepts Definition
Clique analysis This analysis of social structure focuses on how connections between large social structures can be built with small and tight components (Kappelhoff, 1987).

Embeddedness
Trusting relationships between actors tend to expand through broker exchange. Trust acts as the primary governance structure in cooperation (structural effect of transitivity) (Uzzi, 1997).
Structural holes Individuals hold certain positional advantages or disadvantages based on how they are embedded in social structures. A structural hole is a gap between two nodes with complementary sources of information (Burt, 2009).
Power-law distribution A few actors have many incoming social ties, and predominant actors have just a few ties (Barabási & Albert, 1999).
also show the number of commit activities without merge commits to represent the developer's community productivity. Figure 4 illustrates the number of actively contributing developers (nodes) over releases. It is observed that the amount of actors involved in development increases with each new software release, signifying increasing developer engagement.
The TF ecosystem network density decreased nearly steadily with increasing developer community and collaborations during the ecosystem lifetime, as Figure 5 shows. This decrease shows that the number of direct ties in a more extensive network decreased over the different TF software releases. Figure 6 shows the increasing number of communities present throughout successive TF releases. The network remained segregated starting with the initial two releases, and we observed numerous adjacent subgraphs sprouting. Over the TF development period, an enlarged major component has emerged gradually.
The number of commit activities without merge commits represents the increasing developer activity. Figure 7 illustrates that the longer the ecosystem lives and the more developers collaborate on them across different versions, the greater the number of commits. In summary, this means the number of commit activities without merge commits can represent the entire increasing productivity of the TF ecosystem.

Identify Top Organizations that Contribute to the TF Oss ecosystem
To address RQ2, we will investigate the relationship between the developer community and its affiliated organization. The Simpson's Diversity Index was employed to better understand the TF ecosystem's diversity. Figure 8 shows that Simpson's Diversity Index increases over the different versions. The greater the diversity index, the greater the diversity of the open-source developer community and its associated organizations (Jiang et al., 2018). The dominant organizations decrease, and there is an increasing diversity of organizations over the releases. A high level of diversity shows a healthy software ecosystem (Silveira & Prikladnicki, 2019;Mens & Grosjean, 2015;Vasilescu et al., 2015). The latter is consistent with general research findings that larger open-source developer communities are more diverse than smaller communities (Jiang et al., 2018).
As a general result, the TF ecosystem demonstrated rising productivity with increased diversity and commits in the TF ecosystem community (Figure 9). Furthermore, as demonstrated in Figure 9, the developer community and diversity expand over releases. Therefore, it is apparent that the overall productivity of the TF OSS ecosystem is growing.
Our findings reveal that 20 organizations play key roles in contributing to TF source code. Table  6 reveals these firms.

The Organizations Collaborate Over Time in The TF ecosystem
To answer RQ3, we analyzed the organizations' collaboration networks over the releases of the TF ecosystem. The visualizations in Figure 10 show how key players collaborate in the TF software ecosystem.
In Figure 10 the SNA visualizations illustrate the organizations' code collaboration using combinations of color, size, and location information (Scott & Carrington, 2011). Each node represents one organization with at least one developer. A direct link shows that in the release history, one developer from a different organization edited code in the file previously written by the other developer. The node colors are based on Table 6; other organizations were colored gray. The size of a node depends on its degree-centrality; the larger the node, the more code-collaboration connections the organizations have. The higher an organization node's degree-centrality, the more it is collaborating with others. The topology of the network is centralized around the two organizations (Google and Tensorflow), whereas the other organizations are changing across the individual releases. Furthermore, the number of collaborating organizations is changing over the releases. The top organizations and their developers are responsible for most code changes within the TF OSS ecosystem; this emphasizes the importance. Here, a power-law distribution is recognizable. Google dominates the network initially; a wide variety of organizations are included in the various releases. This leads to more external input and a more dynamic TF OSS ecosystem community.

Tendency Towards Clustering from developers of the Same Organization
In theory, most organizations' social homogeneity creates strong baseline homophily in networks formed (McPherson et al., 2001). However, the visualizations presented in Figure 10 uncover the different collaborative network structure results of the TF OSS ecosystem. In addition, the sub-

Figure 10. Topologies of the Organization's Collaboration Network Over Different Release Versions
community detection process revealed a small degree of homophily in code collaboration. These organization collaboration networks are highly heterogeneous over the different TF releases.
The Figure 11 SNA visualizations illustrate the developers' code collaboration using combinations of colors for organizations and location information. Each node represents one developer associated with one organization. A direct link shows that in the development history, one developer edited code in the file previously written by another developer in the same software release. Thanks to the TF ecosystem's openness, companies can access each other's resources and share developer costs (Bengtsson & Kock, 2014;Teixeira & Lin, 2014). Figure 11 shows that the sub-communities are highly heterogenous, including developers from many firms. While Google and TensorFlow dominate early versions, new organizations gradually joined the network. In addition, developers often collaborate with peers affiliated with competing firms. For example, Intel and Microsoft work together on the TF OSS ecosystem in specific development release cycles. This collaboration leads to more external input, higher diversification, and a more dynamic TF OSS ecosystem community.

LIMITATIONS
This work analyzes the importance of top-influencing organizations within TensorFlow's software ecosystem by considering the developers' social relationships. Based on a case study approach (Runeson & Höst, 2009), we explore how organizations collaborate in the open-source space and how complex open-source ecosystems, such as TensorFlow, operate. The scheme of validity and threats distinguishes between construct validity, internal validity, external validity, and conclusion validity (Runeson, 2012;Wohlin, 2012).
Construct validity refers to how the operational measures studied represent the phenomena we investigated. One problem is that the collaboration was narrowed to working on the same source code files in the TF GitHub repository and excluded other software development activities, such as testing, code-review/code merging, design, and specification. In addition, we counted only modifications to the same file made during a specific software release as collaboration. The developer's company email address identified the individual developer, which also determined company affiliation. In addition, commissioned third-party developers of a software supplier firm could not be assigned to the original company.
Regarding research questions, we defined five metrics to get a deeper view of developer collaboration. To address this, we used basic quantitative structural network characteristics like the number of contributing nodes, the network density, and the number of communities over the different TF releases to analyze the TF developer community. Initially, the contributing nodes metric shows the number of actively contributing developers over releases and is useful to show trends in the TF developer community. We further relied on the network density that affects the number of direct ties over different software releases. To operationalize the concept of core and enhanced developers, a key measure is the number of communities. Furthermore, the number of commits without merge commits should be considered when examining the developer activity. By applying and evaluating these SNA metrics, it is possible to get an overview of the developer collaboration over time and to achieve a solid basis for SNA visualizations. In addition, applying SNA to developer networks suggests that the metrics are reliable and valid (Meneely & Williams, 2011). Moreover, we used Simpson's Diversity Index to analyze deeper the TF developer community for diversity and to apply an established indicator of the healthiness of the software ecosystem (Silveira & Prikladnicki, 2019). Furthermore, our methods were supported by visualizing the results using SNA graphs; other visualizations and measures could be helpful.
Regarding the internal validity of our study, the results could be inaccurate if we have misinterpreted the SNA visualizations and other metrics. We carried the process out with various measurements and implemented different methods to minimize this limitation. To prevent bias in the results, we endeavored to make the entire research traceable and straightforward in this paper.

Figure 11. Collaboration Network in Different Release Versions with Developer Nodes Associated with 20 top-Influencing Organizations
External validity concerns the degree to which the results of a study can be generalized (Yin, 2014). This research focuses on a single case study of a large open-source software development ecosystem. While it is possible to generalize the results to other software development ecosystems, the external validity of findings is reasonable, and further investigation is required to explore other collaborations related to OSS.
We were intent on developing conclusion validity that would enable us to draw the correct conclusions about relationships in data. In addition, in our study, we were concerned with the ability to replicate our findings. Through a combination of SNA visualizations and other measures, we sought to achieve a balance that would provide an accurate perspective of the research results. To ensure accuracy, each step in the case study was rigorously validated, and researchers regularly carried out assessments.

CONCLUSION
Through a longitudinal study of the TF OSS ecosystem for over five years, this research explored the collaboration of developers and organizations' coding activities. We used a combination of SNA methods and mined data from the TF software repository to gain insight into structural evolution. As a machine learning platform with an ever-expanding range of functionalities, TF necessitates significant maintenance resources and presents an opportunity to leverage the collective strengths of the open-source ML community.
The outcomes show that with the network size expansion (number of nodes), the number of communities increases over the TF releases. Because of these trends, the commit number and the changed source lines are also rising. This trend decreases the network density, which means many clusters connect. For a deeper understanding, we examined the developers and their affiliations with an organization, using Simpson's Diversity Index over the evolution of the TF ecosystem. Data collected from the index shows that the ecosystem will experience an expansion in diversity throughout its lifespan as developers become more abundant. As a general result, with increasing diversity and commits to the ecosystem, the entire TF ecosystem's productivity increases. Furthermore, our findings reveal 20 top-influencing organizations are contributing to the source code. These companies are partly in a competitive relationship but also develop together and are critical for the evolution of the TF ecosystem. This coopetition illuminates how the identified organization collaborates regarding their strategic importance in framework software development projects.
We also analyzed the organizations' collaboration network. While Google dominates the network, various TF releases include numerous organizations. Another remarkable aspect is that the top-influencing organizations contribute more than others to the source code. Therefore, these top organizations play a critical role in developing TF software. However, we identified a slight trend that the influence of the top-contributing organizations decreases with increasing diversity. We also observed a low degree of homophily in code cooperation providing advantages in terms of access, low barriers, and external resources for the entire TF ecosystem.
These results help better understand software developers' and organizations' code-collaborative patterns in OSS ecosystems' evolution. The methodical SNA approach also visualizes human and organizational activities in software development. The findings of this work are essential and deepen the understanding of ecosystem evolution for practitioners and academics in the software engineering discipline. We also established clear theoretical references to essential theories in SNA and software engineering.
Further research is needed to bolster and advance our TF ecosystem study findings. Moreover, further case-study research is needed to explore open coopetition in greater depth and its implications for OSS ecosystems.