|Year : 2011 | Volume
| Issue : 5 | Page : 423-429
A Fake Content Remove Scheme using Binomial Distribution Characteristics of Collective Intelligence in Peer-to-Peer Environment
ByungRae Cha1, Sun Park2, JongWon Kim3
1 SCENT Center, GIST, GwangJu, Korea
2 Institute Research of Information Science and Engineering, Mokpo National University, GIST, GwangJu, Korea
3 School of Information and Communications, GIST, GwangJu, Korea
|Date of Web Publication||24-Nov-2011|
SCENT Center, GIST, GwangJu
| Abstract|| |
A P2P (Peer-to-Peer) network can be created or destroyed automatically because it is based on the structural characteristic of being promoted by peer communities' free participation. While users can share resources they want in a P2P, there are also many resources they do not want such as fake contents. As one method of removing fake contents, it is suggested to use collective intelligence in P2P environment. This study is intended to design an efficient contents reputation system using dynamic user's collective intelligence in P2P environment. Collective intelligence has been applied to decision making more importantly thinks to the emergence of new information and communication technology and experts use this term to create a noble insight by combining a group of people's behaviors, selection and ideas. Collecting a group of peoples decision making allows one to induce a statistical conclusion individuals does not know. To verify this, we simulate the real reputation of contents by trustworthy evaluation and unbiased evaluation.
Keywords: Collective intelligence, Component, Fake content remove scheme, Peer-to-peer
|How to cite this article:|
Cha B, Park S, Kim J. A Fake Content Remove Scheme using Binomial Distribution Characteristics of Collective Intelligence in Peer-to-Peer Environment. IETE J Res 2011;57:423-9
|How to cite this URL:|
Cha B, Park S, Kim J. A Fake Content Remove Scheme using Binomial Distribution Characteristics of Collective Intelligence in Peer-to-Peer Environment. IETE J Res [serial online] 2011 [cited 2013 May 20];57:423-9. Available from: http://www.jr.ietejournals.org/text.asp?2011/57/5/423/90151
| 1. Introduction|| |
A peer-to-peer, commonly abbreviated to P2P, is any distributed network architecture composed of participants that make a portion of their resources (such as processing power, disk storage or network bandwidth) directly available to other network participants, without the need for central coordination instances (such as servers or stable hosts)  . . Peers are both suppliers and consumers of resources, in contrast to the traditional client-server model where only servers supply, and clients consume. P2P was popularized by file sharing systems like Napster. P2P file sharing networks have inspired new structures and philosophies in other areas of human interaction. In such social contexts, - P2P as a meme refers to the egalitarian social networking that is currently emerging throughout society, enabled by internet technologies in general.
A P2P network provides the structural possibility of mutually sharing and distributing knowledge content which is represented by UCC, video, digital music and file, as well as computer and network resources in a variety of environment in the form of N vs. N. However, a P2P network based on "open, dynamic, and anonymous" characteristics are restricted by exposure to a potential security threat in actuality without a reliable ID creation and management and support of its proper authentication.
A P2P network can be created or destroyed automatically because it is based on the structural characteristic of being promoted by peer communities' free participation. The fundamental reason why such an autonomous structure is possible is because P2P basically supports anonymity. It provides the advantage of improving network scalability and securing easy and equitable accessibility for all peer communities who need specific resources or services. While users can share resources they want in a P2P, there are also many resources they do not want such as fake contents. As one method of removing fake contents, it is suggested to use collective intelligence in P2P environment.
| 2. Related Work|| |
There have been several studies and protocols about managing reputation on P2P networks. We discuss them briefly in this section. These studies can be classified into unstructured or structured according to the base architecture of P2P networks. Because many famous P2P file sharing applications , are implemented on unstructured P2P networks for practical reason, most previous works ,, about reputation management systems are based on unstructured P2P networks. Among them, Xrep  is similar to ours in terms of using combined reputations of peers and resources to recognize untrustworthy resources regardless of its provider. But, it has several weak points: Not scalable, not use the reputation information effectively and lacks a reliable method to verify the trustworthiness of voters. Recently, several reputation systems in structured P2P networks have been proposed. EigenTrust  and PeerTrust  are reputation management systems in structured P2P networks such as CAN  and P-Grid  , respectively.
One of the earliest works in this area is the protocol by Aberer and Despotovic  which aims to identify dishonest peers by a complaint-based system. A shortcoming of this protocol is that it maintains only the negative feedbacks, providing no means for a trustworthy peer to be distinguished from a newcomer. The trust evaluation is also rather simplistic, classifying every peer either as trustworthy or untrustworthy. Another protocol is the EigenTrust scheme proposed by Kamvar et al.  , which evaluates the trust information provided by peers according to their trustworthiness (i.e., using the trust ratings for credibility). The core of the protocol is a special normalization process where the trust ratings held by a peer are normalized to have their sum equal to 1. Although it has some interesting properties, this normalization may result in the loss of important trust information. For e.g., if there are 'n' identical trust ratings in the database, their normalized value will be I/n. whether the originals were the highest or the lowest possible value. Another proposal with a similar scope is the protocol of Damiani et al.  , which assesses the trustability of a file to he downloaded by "voting" of the peers. The protocol makes no distinction between the votes from trustworthy and non-trustworthy peers, and there is no authentication of the vote messages. Also, no quantitative trust metric is specified for choosing among alternative versions. An important idea of  is to maintain reputations for resources as well as for peers. A study with a different hut relevant scope is a recent paper of Xiong and Liu  on trust evaluation in P2P e-commerce communities. Although they do not deal with the details of trust evaluation functions, they run an experimental system which utilizes a modification of the P-Grid scheme of  .
| 3. Design of Content Reputation System Using Collective Intelligence|| |
This study is intended to design an efficient contents reputation system using dynamic user's collective intelligence in P2P environment. Collective intelligence has been applied to decision making more importantly thinks to the emergence of new information and communication technology and experts use this term to create a noble insight by combining a group of people's behaviors, selection and ideas. Collecting a group of peoples decision making allows one to induce a statistical conclusion individuals does not know. It is collective intelligence that participants infer a new conclusion individually. Although collective intelligence has existed before, with the possibility of collecting thousands or even millions of people's information in the web, a new possibility of jumping has opened. The reputation system is designed to evaluate content downloaded from a P2P by collecting user's inspection results. This system evaluates content by collecting several user's opinions. To protect content from some anonymous malicious users, it uses collective intelligence which can collect several user's evaluation.
In order to achieve this, this study suggests a trust manager model which uses content information and host information to be downloaded as shown in [Figure 1], and evaluates the evaluation of users downloading the same content as shown in [Figure 2] and [Figure 3]. First of all, the terms of evaluation and reputation will be defined before the reputation system is designed.
|Figure 1: User's evaluation information collection model for query and reputation evaluation of downloaded contents.|
Click here to view
|Figure 2: Trustworthy management model for collected fair evaluation about downloaded same content by normal peers.|
Click here to view
|Figure 3: Trustworthy management model for collected malicious evaluation about downloaded same content by malicious peers.|
Click here to view
Definition 1:- Evaluation and Reputation
Evaluation (eval i ) is defined as individual user's decision making on downloaded content. Reputation (Rep i ) is defined as a statistical conclusion drawn from the collection of individual users decision making (population of n>30), which is referred to as normalization.
3.1 Content Event Information and Host Information
To construct collective intelligence and use its decision making, information should be randomly collected from an individual user. For this, the transaction of content event and host information is used. Content event information creates one transaction which organizes several information on situations in case users download content by P2P application programs of file sharing. Then the transaction of host information on downloaded users is also created to assist it.
3.1.1 Content Event Information
Content information does not mean content itself, but a transaction created by events between downloaded content from P2P and users. Content event information consists of downloaded content information, time stamp and event flag. The content information consists of content ID and name, hash value, download count number and hash value check.
The event flag consists of 5 bits such as Download Completed, open, exist, delete and check as shown in [Figure 4]. The Download-Completed flag represents the event completing content downloading; the open flag, the event about whether users open downloaded content or not; the exist flag, the event about the existence of downloaded content in the downloaded files; the delete flag, about the deletion of downloaded content, and the check flag, the integrity of content at the time of downloading. With the combination of these flags, it is possible to create event information provided by information which can perceive situations between downloaded content and users. This information used decision making of collective intelligence. e.g., If logical operator AND of flags of Download Completed, Open, and Delete is 1, the content evaluation value is False as shown in equation (3).
3.1.2 Host Information
Content event information refers to a transaction which represents the information of created host. Host information is used to support user's optional anonymity and distinguish users and consists of IP, Mac Address, System Name, a date certified by a trust manager and a date of creating an event.
The places to store reputation information of contents are divided into a global storage and a local storage. The local storage individually managed stores individual's content evaluation information about shared content resources. The global storage periodically collects evaluation information of individually shared contents, and evaluates content reputation and stores information by the content reputation system. Also it provides reputation information for the local storage of an individual user providing content evaluation information. Then, the information of the local storage can be shared globally by broker as shown in [Figure 5].
| 4. Fake Content Remove by Collective Intelligence|| |
The evaluation results of collective intelligence can provide some discriminative information on malicious users evaluated malicious by collective intelligence. [Figure 6] shows the evaluation model of content evaluated by malicious users.
|Figure 6: Malicious evaluation model about downloaded contents by malicious user.|
Click here to view
It is possible to classify fake content in a small number of malicious users using collective intelligence. Moreover, it allows fake content to be deleted and fair content to be maintained and managed.
4.1 Normalization of CLT
Since the standard of evaluating files is a binomial distribution in a P2P content reputation system, a central limit theorem (CLT)  is applied. The Laplace theorem demonstrates that as sample n is big enough, its distribution becomes close to the normal distribution.
To say it simply, when the sample n is sampled from the whole population, its distribution becomes closed to the normal distribution. In other words, when 'n' is big enough, reliability of statistics can be assumed under the assumption that it is approximated to the normal distribution. Here although a big enough population, 'n', is different depending on the characteristics of a population, at least n>30 constitutes CLT.
However, the fact that 'n' is big enough does not guarantee un-biasedness in sampling. For example, if sampling is done only for the specific area and period, its questionnaire results can have a bias because of this limitation. Thus it should be careful to apply CLT to all cases of n > t (where t is a big enough positive integer) because it neglects such an error.
4.2 Criteria of a Fake Content
To distinguish fake content using collective intelligence, strong boundary should be prepared by Equation (6): The first evaluation based on the population of n>30 and the second evaluation by binomial values (true or false) about the content of a population as shown in [Figure 7].
When the population of the first evaluation is n <30, it will have major evaluation and be shown in Equation (7). In this case, a weak boundary is made only to distinguish fair content from fake contents by the second evaluation.
1) In case of population n>30
In case the population of the first evaluation is n>30, the evaluation of a population will follow normal distribution N(0, 1) and decision making by collective intelligence will not have a bias (Equation (6)), which refers to reputation Repj. Therefore it is possible to clearly distinguish whether downloaded content are fair or fake by the second evaluation.
- Evaluation of popular: T>70%
In case downloaded content by more than 70% of a population are judged fair in the second evaluation by the decision making of collective intelligence, even when random malicious users pretend that fair content are fake content, downloaded content can be guaranteed as fair content. Also in this case, since downloaded content are fair, users evaluating these content as fake can be included and managed in the candidate list of malicious users.
- Evaluation of popular: T<20%
In case downloaded content by less than 20% of a population are judged fair in the second evaluation by the decision making of collective intelligence, any malicious actions of random malicious users cannot guarantee fair content, and although a few correct evaluation is made, downloaded content will be judged fake. On the contrary, in this case, since downloaded content are fake, users evaluating these content as fair one can be included and managed in the candidate list of malicious users.
2) In case of population n<30
In case of n<30 of a population in the first evaluation, the evaluation of the population does not follow normal distribution N(0, 1). Thus, it is necessary to distinguish fair content from fake content under the assumption that decision making by collective intelligence has a bias.
- Evaluation of popular: T>70%
In case downloaded content by more than 70% of a population are judged fair in the second evaluation by the decision making of collective intelligence, randomly downloaded content can be guaranteed as fair content.
- Evaluation of popular: T<40%
However, in case downloaded content by more than 40% of a population are judged fair in the second evaluation by the decision making of collective intelligence, downloaded content cannot be guaranteed as fair content.
4.3 Normalization Through Trust Management
In case of n<30 of a population in the first assessment, a trust manager actively searches other trust manager distributed using ID-HDT of content to normalize the assessment of binomial distribution to one of normal distribution. As shown in [Figure 8], the normalization process is conducted to share the assessment information by trust managers with the same ID-HDT and expand to normal distribution. Trust management brokers create reputation information using assessment information collected and share it with searched trust managers.
In case of n<30 of a population, for trust management brokers to normalize the decision making of binomial distribution to one of normal distribution, trust managers are searched and expanded to n>30 of a population, as shown in Equation (8). Before only population size is expanded, un-biasedness should be removed in population construction by IP address of host information for the specific area and period of a population. The un-biasedness about the specific period will be solved with system stability according to the operation of reputation system.
| 5. Simulation|| |
We perform a simulation for content reputation system using Matlab in Windows Environment. And it simulated using poisson distribution and random function of probability to evaluate and download any random 10 contents on 100 sites in 4 areas.
5.1 Simulation of Real Reputation by Truthworthy Evaluation
[Figure 9] presents truthworthy rate of any 10 contents and shows almost the same truthworthy rate 85%, 86% and 84% for any contents number 6, 7 and 8. We applied proposed scheme in this paper and presents reputation rate in [Figure 10]. In [Figure 8], content 6, 7 and 8 have the same truthworthy rate of content, but reputation rate was presented different opinions of 90%, 47% and 63%. Content 7 cannot trusts for truthworthy rate 86% by lowest reputation rate 47%. By contraries, content 6 can trusts for trustworthy rate 85% by highly reputation rate 90%. And [Figure 11] presents real reputation rate between truthworthy rate and reputation rate of contents in simulation.
5.2 Simulation of Unbaised Evaluation
[Figure 12] presents pie chart for 4 area distribution rate of downloaded any 10 contents. It presents that unbiasedness was removed in population construction by the specific area of a population before only population size is expanded as shown in [Figure 12]. And [Figure 13] presents 10 contents distribution rate of four download site areas.
| 6. Conclusions|| |
P2P application programs leading file sharing reveal a big problem of distributing fake content imprudently or contaminating fair content only by providing a file sharing function. This study suggests the model to distinguish fake content which becomes a problem of general P2P application programs by collective intelligence and provide sharing limit and deletion information. And therefore, we can prevent untrustworthy files from spreading even in case of allowing malicious peers to evaluate malicious opinion for fair contents. Moreover, since it controls downloading by classifying fake content in the side of file sharing, it will greatly contribute to decrease in bandwidth and traffic.
| 7. Acknowledgment|| |
- This work was supported in part by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2009-353-D00048).
- This research was also supported in part by the Korea Communications Commission (KCC), Korea, under the R&D program supervised by the KCA(Korea Communications Agency) (KCA-2011-09913-05006).
| References|| |
|1.||R Schollmeier, "A Definition of Peer-to-Peer Networking for the Classification of Peer-to-Peer Architectures and Applications". Proceedings of the First International Conference on Peer-to-Peer Computing, IEEE 2002, 2002. |
|2.||Gnutella homepage. Available from: http://www.gnutella.com. [Last cited on 10 Apr 2010]. |
|3.||Kazza homepage. Available form: http://www.kazaa.com. [Last cited on 10 Apr 2010] |
|4.||E Damiani, D C di Vimercati, S Paraboschi, P Samarati, and F Violante, "Reputation-based approach for choosing reliable resources in peer-to-peer networks". Proceedings of the 9th ACMConference on Computer and Communications Security, 2002. |
|5.||S D Kamvar, M T Schlosser, and H Garcia-Molina, "The eigentrust algorithm for reputation management in p2p networks". Proceedings of the 12th International World Wide Web Conference, May 2003. |
|6.||Selcuk, E Uzun, and M Pariente, "A reputation-based trust management system for p2p networks," Proceedings of the International Workshop on Global and Peer-to-Peer Computing, IEEE/ACM CCGRID, 2004. |
|7.||L Xiong, and L, Liu, "Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities." IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 7, pp. 843-57, July 2004. |
|8.||S Ratnasamy, P Francis, M Handley, R Karp, and S Shenker, "A scalable content addressable network," Proceedings of the ACM 2001 SIGCOMM Conference, August 2001. |
|9.||K Aberer, "P-grid: A self-organizing access structure for P2P information systems," Proceedings of ACM Conference on Information and Knowledge Management (CIKM), 2001. |
|10.||K Aberer, and Z Despotovic, "Managing trust in a peer-2-peer information system," In Ninrh inremarional conference on information ond knowledge managemem (CIKM), 2001. |
|11.||S D Kamvar, M T Schlosser, and H Garcia-Molina, "The eigentrust algorithm for reputation management in P2P networks," In Proc. of rhe Tweph International World Wide Web Conference (WWW2003), 2003. |
|12.||E Damiani, D C di Vimercati, S Paraboschi, P Samarati, and E Violante, "Reputation-based approach for choosing reliable resources in peer-to-peer networks." In Pmc. of rhp 9fh ACM Conference on Compurer and Communicorions Security, 2002. |
|13.||L Xiong, and L Liu, "A reputation-based trust model for peer-to-peer ecommerce communities". In IEEE Conference on E-Commerce (CEC'O3), 2003. |
| Authors|| |
ByungRae Cha is a research professor at Super Computing & Collaboration Environment Technology (SCENT) Center, GIST, Korea. He received the Ph.D. degree in Computer Engineering from National Mokpo University in 2004 and the M.S. degree in Computer Engineering from Honam University in 1997. Prior to becoming a research professor at GIST, he has worked as a research professor in Department of Information and Communication Eng., Chosun University, and professor in Department of Computer Engineering, Honam University, Korea. His research interests include Computer Security of IDS and P2P, Neural Networks Learning, Cloud Computing, and Future Internet.
Sun Park is a research professor at Institute Research of Information Science and Engineering, Mokpo National University, Korea. He received the Ph.D degree in Computer & Information Engineering from Inha University in 2007, the M.S. degree in Information & Communication Engineering from Hannam University in 2001, and the B.S. degree in Computer Engineering from Jeonju University in 1996. Prior to becoming a researcher at Mokpo National University, he has worked as a postdoctoral at Chonbuk National University, and professor in Dept. of Computer Engineering, Honam University, Korea. His research interests include Data Mining, Information Retrieval, and IT-MT (marine technology) Convergence technology.
JongWon Kim received the B.S., M.S. and Ph.D. degrees from Seoul National University, Seoul, Korea, in 1987, 1989 and 1994, respectively, all in control and instrumentation engineering. In 1994-1999, he was with the Department of Electronics Engineering at the KongJu National University, KongJu, Korea, as an Assistant Professor. From 1997 to 2001, he was visiting the Signal and Image Processing Institute (SIPI) of Electrical Engineering - Systems Department at the University of Southern California, Los Angeles, CA. USA, where he has served as a Research Assistant Professor since Dec. 1998. From September 2001, he has joined as an Associate Prof. at the Department of Information & Communications, Gwangju Institute of Science and Technology (GIST, formerly known as K-JIST), Gwangju, Korea, where he is now serving as a Professor. He is focusing on networked media systems and protocols including multimedia signal processing and communications. Dr. Kim is a senior member of IEEE, a member of ACM, SPIE, KICS, IEEK, KIISE, and KIPS.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8], [Figure 9], [Figure 10], [Figure 11], [Figure 12], [Figure 13]