Degree of Similarity of Web Applications

Degree of Similarity of Web Applications

Doru Anastasiu Popescu (Faculty of Mathematics and Computer Science, University of Pitesti, Romania) and Dragos Nicolae (National College “Radu Greceanu” – Slatina, Romania)
DOI: 10.4018/978-1-4666-4490-8.ch029
OnDemand PDF Download:
$37.50

Abstract

In this chapter, the authors present a way of measuring the similarity between two Web applications. For this, they define the degree of similarity between two Web applications, taking into account only the Webpages composed of HTML tags. The authors also introduce an algorithm used to calculate this value, its implementation being made in the Java programming language.
Chapter Preview
Top

The Degree Of Similarity

Let WA1 and WA2 be two web applications. The application WA1 is considered to be composed of the web pages p1, p2, ..., pn and the application WA2 composed of the web pages q1, q2, ..., qm. We will also establish a set TG of tags.

For a web page pi we build a sequence with all its tags, excluding those which are also in TG, keeping their order and removing their attributes.

Definition 1

For two sequences of tags T1 and T2, associated to the web pages pi from WA1 and qj from WA2, we define the degree of similarity between pi and qj, written nrij, as being the number equal to the maximum length of a common subsequence of tags for T1 and T2.

Definition 2

For a web page p from WA1, we define de similarity degree of p with WA2 as being the number: degpage(p,WA2)=k/NT, where k=max{nrij | 0 < j < m+1}, NT is the number of tags from p which are not in TG and i is an index, 0 < i < n+1 for which p=pi.

Definition 3

We define the degree of similarity between WA1 and WA2 as being the number: deg(WA1,WA2)=s/n, where s=degpage(p1,WA2) + degpage(p2,WA2) + ... + degpage(pn,WA2).

  • Remark 1: 0< deg(WA1,WA2) ≤ 1.

  • Remark 2: If deg(WA1,WA2) = 1, then for any web page pi from WA1, there is a web page qj in WA2 so that T1 is a subsequence of T2, where T1 is the sequence of tags from pi, which are not in TG, and T2 is the sequence of tags from qj, which are not in TG.

Complete Chapter List

Search this Book:
Reset