Near Duplicate Detection-Based Image Spam Filters

Near Duplicate Detection-Based Image Spam Filters

Copyright: © 2017 |Pages: 14
DOI: 10.4018/978-1-68318-013-5.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A picture is worth a thousand words. Spam images give us many hints; one of them is that they are duplicates. Spam images are often generated from the same templates (which are designed by spammers) as they are sent to various recipients at the same time in batches. Various spam images are generated by randomization of the contents of these templates; as a result a similarity or uniqueness is present among the spam images. This similarity property in visually similar spam images can be exploited by the spam detectors for discriminating them from ham. The spam detectors can further trained on new data, if the spam images are generated from different templates, which is not a frequent phenomenon as it is resources intensive. The detection schemes that exploit the near duplicate characteristics of image spam, uses different types of image characteristics to calculate the similarity among spam images. This chapter provides the details of near duplicate detection based image spam filters, a literature review on these spam filters and their limitations.
Chapter Preview
Top

5.1. Examples Of Similar Images

Some of the examples of similar images which may be generated by spammers using different randomizing techniques are as follows.

5.1.1. Illustration Substitution

It means some of the illustrations are changed with some other illustrations and rest all the other things remains the same. It can be easily understood by the Figure 1 in which text matter in both the images are same, but the bunch of leaves at the top right corner is replaced by the ball of red ribbon. These two images are almost same only differs at the top right corner. The feature of these images differs very little due to the changes made by the spammer. So the technique based on near duplicate detection will easily detect these images made from the same base or template image.

Figure 1.

Spam images with illustration substitution

978-1-68318-013-5.ch005.f01
<b>5.1.2. Different Visual Features</b></div><p>It means visual features are changed with minor text changes rest all the other things remains the same. It can be easily understood by the Figure 2 in which the text matter in both the images are same with little variations, while images differs in visual at the left and the right side. The text changes are depicted by the red color while the visual features are marked by the blue color rectangles. So these two images are also made of the same base image.</p><div class="xmlReaderFig"><i>Figure 2. </i><div class="xmlReaderFig"><p>Spam images with almost identical text content but totally different visual features</p></div><div style="width: 100%;"><a href="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f02.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" target="_blank"><img src="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f02.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" alt="978-1-68318-013-5.ch005.f02" style="max-width: 100%;" /></a></div></graphic></div></div><div><h3 id="5.1.3.-text-and-background-color-changes">5.1.3. Text and Background Color Changes</h3><p>It means visual features are same but some of the words and their colors are changed. It can be easily understood by the Figure 3 given below in which words in both the images are differs. The changed words / sentences are depicted by the red colors. So these two images are also made of the same base image.</p><div class="xmlReaderFig"><i>Figure 3. </i><div class="xmlReaderFig"><p>Examples of word substitution, illustration alteration/replacement, text and background color changes</p></div><div style="width: 100%;"><a href="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f03.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" target="_blank"><img src="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f03.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" alt="978-1-68318-013-5.ch005.f03" style="max-width: 100%;" /></a></div></graphic></div><p>From these examples of similar images we can say, spammers tend to produce many small variations for a template image in order to circumvent simple signature-based anti-spam filters using different tricks like translation, rotation, scaling, local changes and adding random noises etc.</p></div><span class="tophash"><a href="#" class="navlinkc">Top</a></span><h2 id="5.2.-previous-work">5.2. Previous Work</h2><div class="headerdivider"></div><p>Near duplicate spam detection methods exploit the uniqueness in the received spam images. These techniques are based on the assumption that, although spammers add randomization to the spam images generated from templates, they still want to deliver clear information to end users. At the same time, they want to use efficient methods to generate huge volume of unique spam images without obscuring the template image too much. Generally, a set of similar spam images with various minor changes implies a common origin of image spam.</p><p>The various datasets used and the results (range wherever applicable) achieved by works described below are presented in Table 1.</p><p>To achieve low false positives, the authors (Wang, 2007) exploited the similarity property in spam images using three filters namely, Color histogram, Harr wavelet, and Orientation histogram in the suggested model. Figure 4 and Figure 5 shows the proposed basic image spam detection system framework and the image spam filter respectively (Wang, 2007).</p><div class="xmlReaderFig"><i>Figure 4. </i><div class="xmlReaderFig"><p>Image spam detection system architecture</p></div><div style="width: 100%;"><a href="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f04.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" target="_blank"><img src="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f04.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" alt="978-1-68318-013-5.ch005.f04" style="max-width: 100%;" /></a></div></graphic></div><div class="xmlReaderFig"><i>Figure 5. </i><div class="xmlReaderFig"><p>An image spam filter</p></div><div style="width: 100%;"><a href="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f05.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" target="_blank"><img src="https://igiprodst.blob.core.windows.net:443/source-content/9781683180135_147407/978-1-68318-013-5.ch005.f05.png?sv=2015-12-11&sr=c&sig=wRWGARMD8e4%2FyxrfKx9Cj1ggckYqT36aaqwufeisUyY%3D&se=2019-12-23T08%3A12%3A32Z&sp=r" alt="978-1-68318-013-5.ch005.f05" style="max-width: 100%;" /></a></div></graphic></div></div><div id="table-of-contents"><h2>Complete Chapter List</h2><div class="search-contents"><span class="text"> Search this Book: </span><span class="text-box-container"><input id="txtKeywords" type="text" maxlength="50" onkeypress="return SearchBookFulltextHandleEnter(event, 147407);" placeholder="Full text search terms" title="Full text search terms" class="full-text-search-box" /></span><div class="inline-block search-contents-xs-full-width"><span class="search"><span class="search-button" onclick="RemoveSpecialCharacters();SearchBookFulltext(147407);"></span></span><span class="reset"><span onclick="RemoveSpecialCharacters();SearchBookFulltextReset();" class="link-gray-s">Reset</span></span></div></div><div id="searchResults"></div><div id="full-toc"></div><div id="loading-toc" class="text-align-center"><div class="loading-icon-lg"></div></div><script type="text/javascript"> $(document).ready(function () { var bookId = 147407; var titleId = 179486; var subjectId = 0; var compactView = 'True'; var onDemandDiscountDisplayPrice = ''; var onDemandDisplayPrice = '$37.50'; var chapterCount = 7; var isbn = 9781683180135; var bookClassificationId = 26921; var isPublished = 1; if (chapterCount !== 0) { GetBookToc(bookId, titleId, isbn, subjectId, compactView, onDemandDiscountDisplayPrice, onDemandDisplayPrice, bookClassificationId, isPublished); } else { GetBookTocFromSubmissionSystem(bookId, titleId, isbn, subjectId, compactView, onDemandDiscountDisplayPrice, onDemandDisplayPrice, bookClassificationId); } }); </script></div></div></div></div><script type="text/javascript"> MenuAdjust(); $(window).on('resize orientationChange', function (event) { MenuAdjust(); }); //Shopping cart - Adding item and displaying banner, signup, etc. var userAuthenticated = "false"; function setAddToCartOnClick(sender) { var lblTitle = document.getElementById('h1Title'); var coverImage = $("#imgCover"); var tdpid = sender.getAttribute('tdpid'); AddShoppingCartSessionItemDynamic(tdpid, sender.getAttribute("a"), null, 1, lblTitle.innerText, coverImage.attr('src'), coverImage.attr('alt'), userAuthenticated); } </script><footer class="footer"><div class="container"><div class="row"><div class="top-margin"><div class="col-md-6"><div class="footer-header"> Learn More </div><div class="text"><a href="/about/" class="footer-link">About IGI Global Scientific Publishing</a> | <a href="/about/partnerships/" class="footer-link">Partnerships</a> | <a href="/about/memberships/cope/" class="footer-link">COPE Membership</a> | <a href="/contact/" class="footer-link">Contact Us</a> | <a href="/about/staff/job-opportunities/" class="footer-link">Job Opportunities</a> | <a href="/faq/" class="footer-link">FAQ</a> | <a href="/about/staff/" class="footer-link">Management Team</a></div><div class="footer-header header-margin-top"> Resources For </div><div class="text"><a href="/librarians/" class="footerlink">Librarians</a> | <a href="/publish/" class="footerlink">Authors/Editors</a> | <a href="/distributors/" class="footerlink">Distributors</a> | <a href="/course-adoption/" class="footerlink">Instructors</a> | <a href="/about/rights-permissions/translation-rights/" class="footerlink">Translators</a></div><div class="footer-header header-margin-top"> Media Center </div><div class="text"><a href="/symposium/" class="footer-link">Webinars</a> | <a href="/newsroom/" class="footer-link">Blogs</a> | <a href="/catalogs/" class="footer-link">Catalogs</a> | <a href="/newsletters/" class="footer-link">Newsletters</a></div><div class="footer-header header-margin-top"> Policies </div><div class="text"><a href="/about/rights-permissions/privacy-policy/" class="footer-link">Privacy Policy</a> | <a href="/cookies-agreement/" class="footer-link">Cookie & Tracking Notice</a> | <a href="/about/rights-permissions/content-reuse/" class="footer-link">Fair Use Policy</a> | <a href="/accessibility/" class="footer-link">Accessibility</a> | <a href="/about/rights-permissions/ethics-malpractice/" class="footer-link">Ethics and Malpractice</a> | <a href="/about/rights-permissions/" class="footer-link">Rights & Permissions</a></div><div class="text copyright-text">Copyright © 1988-2025, IGI Global Scientific Publishing - All Rights Reserved</div></div><div class="col-md-6 td-r"><div class="td-r-t"><div class="footer-rightside-container"><div class="left"><div><a href="http://www.facebook.com/pages/IGI-Global/138206739534176?ref=sgm" target="_blank" rel="noopener" aria-label="Facebook"><span class="fb"></span></a>  <a href="http://twitter.com/igiglobal" target="_blank" rel="noopener" aria-label="Twitter"><span class="tw"></span></a>  <a href="https://www.linkedin.com/company/igiglobal" target="_blank" rel="noopener" aria-label="LinkedIn"><span class="linkedin"></span></a></div><div><a href="http://www.world-forgotten-children.org" target="_blank" rel="noopener"><img src="https://coverimages.igi-global.com/images/proud-supporter-of-wfcf-20250113.png" alt="World Forgotten Children's Foundation" title="Proud Supporter of the World Forgotten Children's Foundation" width="175" /></a></div><div class="cope-logo" class="margin-top-5"><a href="https://publicationethics.org/category/publisher/igi-global" target="_blank" rel="noopener"><img src="https://coverimages.igi-global.com/images/cope-logo-footer-white.png" alt="Committee on Publication Ethics (COPE)" title="Committee on Publication Ethics (COPE)" /></a></div></div><div class="right"></div></div></div></div></div></div></div></footer><div class="aspNetHidden"><input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="679D6B48" /><input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="69SwLszcLDOz8B8n9fmQyeA78hUa/NTYratzS5FMOU6rzrd6tg9kqh37pQMoxGvUTnK74UbIQPAatlwZAzdRLtwX8DrQ/sFYAOSTQlZxfChlvJBsaiIrQPangiJU4Zx3xDOCBTtOU1vNdJTwGmmFBqRDT7jZoPkA8voNtkVU0avydUMa5ZH2CRNYK5DiCJdkZO4sf4TNC5zxP1a4SwQu0bcCWM2BLK5YLjTE9lXzYrWUUCqdyDu9zEi7casGZ5hIOusDUTOmusUXqCD9KIwkuax4N/iQrRE0KfL70DTgMqbjo4dNWIyoVMUXSFGyGx4FAUEUz4lXRYIahRh1VrJoMNFOWUONH4e6ygbQHKUm3queYhlJy1/Vbzk8IgH8rNG8hmk/1eVaMjCy0cvxsJoI6cKoQw4=" /></div></form></body></html>