Comments
Description
Transcript
Advances in Environmental Biology
Advances in Environmental Biology, 8(17) September 2014, Pages: 1245-1250 AENSI Journals Advances in Environmental Biology ISSN-1995-0756 EISSN-1998-1066 Journal home page: http://www.aensiweb.com/AEB/ A Recommendation System Presenting to Improve Performance of Search Engines Based on Web Mining and Formal Models 1Maryam Jafari, 2Ali Harounabadi and 3Seyyed Javad Mirabedini 1 Department of computer engineering, Bushehr branch, Islamic Azad University, Bushehr -IRAN Department of computer engineering, Islamic Azad University, Central Tehran branch- IRAN 3 Department of computer engineering, Islamic Azad University, Central Tehran branch- IRAN 2 ARTICLE INFO Article history: Received 25 September 2014 Received in revised form 26 October 2014 Accepted 25 November 2014 Available online 29 December 2014 Key words: Color Petri Net, Recommendation System, Web Mining, Web Personalization ABSTRACT Web personalized has an important role to improve interaction between users and web with the aim providing of favorable results according to users interest. Nowadays, searching engines are main tools to seek information through the web and despite extensive using it cannot provide the results of searching according most of users. Thus search engines personalize is considering essential with the aim of helping users to find information according their interests. In this paper in addition to frequency and observation duration of web pages parameters, observation date parameter of web pages that saved in the log files, are used to analyze web users' behavior and to make users' profile. Also, it is presented a personalized search engine with modeling by color Petri nets. This proposed method identifies the degree of users’ interest to various category of pages based on user's search record, after a period of the time and recommend pages related to his interest. Obtained results show the accuracy criteria improvement compared with previous methods by using of color Petri net. © 2014 AENSI Publisher All rights reserved. To Cite This Article: Maryam Jafari, Ali Harounabadi and Seyyed Javad Mirabedini., A recommendation system presenting to improve performance of search engines based on web mining and formal models. Adv. Environ. Biol., 8(17), 1245-1250, 2014 INTRODUCTION Recommendation systems use ideas and opinion of groups of users to help them individually in group to identify and determine their content more effective and efficient. Providing arbitrary information or required by users without their explicit request is the aim of a web personalize system. In fact, a recommendation system is a powerful mechanism to do information filtration. To provide intelligent personalize online systems as recommendation systems based on web, generally modeling of the users’ behavior is necessary. Recently, web mining is more considerable, because it provide web personalize requirements. This study consist of web mining, recommendation systems, session of user and Petri nets. Each one will be review in this paper. Web mining: Web mining is Employing data mining methods for automatically discover and extract information from documents and Web services [11]. Web recommendation systems: It predict users’ requirements and provide it as recommend toward of users guidance. It is expected that these kinds of systems have bright future in the e-commerce and search engines. Proposed items can be some productions as books, movies, music and…, and online resources such as web pages or online activities as path predict. Totally, a web recommendation system is a form of two modules: offline and online. Offline module preprocess data to product user models, while online module during working, use these models to identify user’s purposes and predict a list of recommends and update it. The issue of a recommendation system can be considered in the simplest and the most common state as estimate issue of the rank of an user's interesting about items that he has not visit yet [17]. Session: The session of user is a set of visited pages by that user during a special visit of website [5] a set of visited Corresponding Author: Maryam Jafari, Department of computer engineering, Bushehr branch, Islamic Azad University, Bushehr –IRAN E-mail: [email protected] 1246 Maryam Jafari et al, 2014 Advances in Environmental Biology, 8(17) September 2014, Pages: 1245-1250 pages by special user is considered as his session when those pages have been asked in a time interval less or equal a special time [12]. Petri Nets: Petri net is a tool to study systems. Petri net theory allows that a system can be a model by that, and in fact this is a mathematical presentation of that system. Some useful information is obtained; from dynamic behavior and modeled system structure by analyze Petri net, that, this information can be used to evaluation and guesses to system improvement or changes [1]. Color Petri net: The expansion of these nets has been done with the purpose of creating a modeling language. We named these kinds of nets “color Petri net” because they provide the possibility use of tokens that carried different values of data and separable from each other. A model of color Petri net in graphical form has been represented by a two partial pointing graph. In the model, places by circle and transitions by rectangular are indicated. Tokens are presented into the places as black tokens. Related works: These years, there is more focused on making recommendation systems, and generally, personalization methods that don't need to achieve clear information of users [6]. These type of systems get the behavior model and users' interest from him implicitly. In these systems is not expressed the degree of the user's interest about different elements explicitly and system create a model of their interests by observing users' behavior implicitly. In fact, input of these systems just consist of user's transactions or in the simple word, observed items by each user. In Yektaparast thesis it is made the user's profile by using of pages weight criteria and put users with same interests in the same clusters by helping of clustering techniques. Then according to degree of users' interest in each cluster, proposes commercial to them. To modelling this method it is used Petri nets. In this way that each user that is Member of one or more cluster with their rank of belonging that present it by using fuzzy colored Petri net [2]. Chen presents a method to web structure modeling by using of Petri net. In this method, places present web pages in site and transitions present the link between pages. In this paper emphasis is on the using way of pars algorithm to recovery content of web sites pages, Analysis of the content and find intersection matrix that present the structure of web. Also, it shows that how we can do identifying process of main page and evolution path by availability feature [3]. In the methods of data mining based on what kind of data they research about, divided in three categories as web content mining, web usage mining and web structure mining [16]. Recently, web usage mining techniques have been used expansively to discover users' motive patterns. ZHOU ordinal patterns mining [18], LIN mining of association rules [8] and PHATAK and MOBASHER's clustering different available patterns [13], [10] discover that can be used to proposes personalization of web recommendation systems. Most of the activity-based recommender systems, recommender systems are based on Web usage mining. These systems work on web servers and these have access to the information users who use the Web site [4]. Moatamedimehr is proposed a hybrid algorithm based on distributed learning automata, graph segmentation and PageRank ranking algorithm. In this paper an algorithm has been proposed by combination of users' usage data and web pages structural data [9]. MATERIALS AND METHODS As other recommendation systems, this proposal system also includes offline and online phases. In the offline phase, first, we save different attributes of web site that categorized according to content and include duration of observation, date of observation and frequency of each page by user and ranking pages of each category in the base of these attributes. In the online phase, user after entering, selects a category of pages and observes the wish pages. Then system follow the user’s session and find his/her interest to each category according to the average of the pages’ duration and frequency of each category and the system recommend to user a list of pages. Figure 1. shows general schematic of the proposal recommendation system. In the previous researches were used of criteria such as the duration to observe each page and its frequency, to compute rate of user's interest to pages. In the proposed method is also used observation date criterion of pages to computations. The reasons of selecting these criteria are as follow: Page frequency: indicate the duration of visiting the web page. Frequency is become normal by the number of all visited 1247 Maryam Jafari et al, 2014 Advances in Environmental Biology, 8(17) September 2014, Pages: 1245-1250 pages in a session. Fp(P) is the page frequency [7]. Fig. 1: General schematic of proposal recommendation system. (1) Observation duration of page by user: It is define as the devoted duration on a page. It should be consider that maybe fast jump over one page is because of small length if the page. Therefore, the observation duration of page should been normal by the length of the page that is the number of the bytes [14]. Dp(p) is the observation duration of the page that is shown in the equation (2): (2) Whereas the important of two cases are equal, then in this system, we use the average of frequency harmonic and the duration to show the rate of the user's interest. w(P)= [ ×fp(P)×dp(P)] / [ fp(P)+dp(P)] (3) Date: Date presents its importance such that the request possibility of new pages is more than old pages for users because users seek new information. Showing importance of new pages, we multiply them in large numbers. This rate of interest is also called weight of the pages. W= i w i i+1 (4) i 0 i Mapping the elements of proposed system to Petri components: In the situation that we face with different data of type, we model it by using of putting color token in the color Petri nets. Petri nets have been created by two main components. These components are: a set of places (p) and a set of transition (T). The relation between places and transition are determined by two functions: input and output that connect transitions to places. Places: in the proposed method, pages are put in the place frame in the simulation tool of color Petri net. Transitions: the duty of hypertext is create the interaction between pages and it has been shown as transition. Arcs: they show the interaction direction between pages with arc. Tokens: each one carry out the information of user's session by itself. Proposed method procedure: User can choose one of the categories of pages as political, social, entertainment or commercial, after enter to the site. Each one of this category of pages is subnet that consists of places as page. These places are connected by transition (link between pages). After entering to one of the categories, some of pages are reviewed and the attributes of those pages consider in user's interest. In this way that observation duration is measured by a timer. Also the frequency and the last observation date of page are sent to the page/ attribute matrix as token and the information of this matrix is updated. Figure 2. shows page/ attribute matrix. 1248 Maryam Jafari et al, 2014 Advances in Environmental Biology, 8(17) September 2014, Pages: 1245-1250 Observation duration of page Page's frequency The last observation date of page Page 1 Page 2 . . Page N Fig. 2: page / attribute matrix. Because of the web pages increasing in the virtual world; keeping all attributes of all users' observed pages due to increase the time complexity of proposed method. So, just attributes of pages are kept in the page/attribute matrix that observed nowadays. According to formula (4), weight of all pages are computed and arranged in the descendent order. So, sum of the observation duration and sum of observation frequency of each category that computed by counter, in the end, calculate the rate of the user's interest to current category by using formula (3). Determining that how much pages are proposed to user in each step of each category, we are doing as: Assume that the rate of users’ interest to each one of 4 pages category that obtained by using of formula (3), sequential is A, B, C and D. we want to propose to user maximum M new pages of each category. We should multiply all obtained numbers in M and transform the result to integer by the round off method. The result number is the maximum pages that should choose from page / attribute matrix of each category and show to the user. There is plan subnets in each category of pages that sort pages based on weigh, computing the number of proposed pages and presenting interest pages to users. There is plan subnets in each category of pages that sort pages based on weigh, computing the number of proposed pages and presenting interest pages to users. Results: To proposed method modeling by using of color Petri net, has been used CPNTools software. Here, the meaning of color is various pages category. Figure 3. and Figure 4. are related to the social group that are two samples of 8 modeled subnets. In the proposed method is assumed that the number of web pages fix every day. This system is implemented in the 100 days period to various users. Fig. 3: Social subnet that modeled with CPN Tools. Fig. 4: Plan Social subnet that modeled with CPN tools. To evaluate a recommendation system is used from recovery precision criteria that it compares the precision of proposed system performance with precision of static algorithms [15]. Precision= (5) In the static algorithms, the elements of their proposed method have been chosen just with according to 1249 Maryam Jafari et al, 2014 Advances in Environmental Biology, 8(17) September 2014, Pages: 1245-1250 observations frequency of previous users and it is same to all users. Fig. 5: Comparing precision of proposed method with static algorithm. It is observed in the graph 5 that after some days, system finds the user's interest and proposes the pages that he likes. Fig. 6: Comparing proposed method with previous algorithms. In the graph 6, comparing of the proposed method with previous algorithms is presented that just use of frequency parameters and observation duration to weighting. Conclusions: Considering the science development in the various contexts and consuming time in applying researches, and also the probability of problems happening or Runtime error in the researches implementation, we need to the simulation instead of implementation. The aim of this project is introduction of method to improve web recommendation system performance by using of formal model of color Petri net. Analyses show that because of used history parameter in the introduced method, it has high accuracy and is appropriate to web personalization. REFERENCES [1] Bahadori, M., 2013. Modeling of operator's geodesic behavior in website by color Petri nets whit the aim of identifying operator's interest to present dynamic web pages recommendation system, thesis of M.S.C, Islamic Azad university branch of Dezfool. [2] Yektaparast, A., 2012. A colored petri net behavior of web users using clustering to targeted advertising, thesis of: m.s.c, islamic azad university science and research branch of Khouzestan. [3] Chen, P., C. Sun And S. Yang, 2008. Modeling and Analysis the Web Structure Using Stochastic Timed Petri Nets, Journal of software, academy publisher, 3(8): 19-26. [4] Cule, B. And B. Goethals, 2010. Mining Association Rules in Long Sequences, in: Advances in Knowledge Discovery and Data Mining, 6118/2010, Springer Berlin/ Heidelberg, 300-309. [5] Ghaderyan, M., 2008. Improving of operator model in website as automatic by using of semantics with special meaning of domain, M.S.C, Amirkabir university of technology, 2008. [6] Godoy, D. And A. Amandi, 2009. Interest Drifts in User Profiling: A Relevance-Based Approach and Analysis of Scenarios, In: The Computer Journal Advance, 52(7): 771-788. [7] Khademali, Z., 2012. creating a profile to operator according to his interconnection in web by using of intelligent algorithm, M.S.C, Islamic Azad university branch of DEZFOOL. [8] Lin, W., S.A. Alvarez And C. Ruiz, 2000. Collaborative Recommendation Via Adaptive Association Rule Mining, In: Second International Workshop: Web Mining for ECommerce, Boston, MA, USA. [9] Moatamedimehr, S., M. Taran, A. BaradaranHashemi And M. Mabidi, 2010. Web recommender systems 1250 Maryam Jafari et al, 2014 Advances in Environmental Biology, 8(17) September 2014, Pages: 1245-1250 using a Distributed Learning Automata and Partitioning Graph, Fourth Iranian Conference of Data Mining, Sharif University of Technology. [10] Mobasher, B., H. Dai, T. Luo, And M. Nakagawa, 2001. Effective Personalization Based On Association Rule Discovery From Web Usage Data, In: Proc. of the 3rd International Workshop on Web Information and Data Management, ACM Press, Atlanta, GA, USA, 9-15. [11] Mustapasa, O., D. Karahoca, A. Karahoca, A. Yucel And Uzunboylu, 2010. Huseyin. Implementation of Semantic Web Mining on E-Learning, In: Proc. of Social and Behavioral Sciences, 2: 5820-5823. [12] Pierrakos, D., G. Paliouras, C. Papatheodorou And C. Spyropoulos, 2003. Web Usage Mining as a Tool for Personalization: A Survey, in: User Modeling and User-AdaptedInter action, 13: 311-372. [13] Phatak, D.S. And R. Mulvaney, 2002. Clustering For Personalized Mobile Web Usage, In: Proc. of the IEEE FUZZ’02, Hawaii, USA, 705-710. [14] Rahmani, S. And M. Meybodi, 2009. Web personalization by using of developed council, third conference about Iran data mining, Amirkabir university of technology. [15] Rashidi, F., 2012. Constructing a Profile for User According to his Web Interactions Using Intelligent Algorithms, M.S.C, Islamic Azad university science and research branch of Khozestan. [16] Sugiyama, K., K. Hatano And M. Yoshikawa, 2004. Adaptive Web Search Based on User Profile Constructed without Any Effort from Users, 13th international conference on World Wide Web, ACM New York, USA, 675-684. [17] Zhan, S., F. Gao, C. Xing And L. Zhou, 2006. Addressing Concept Drift Problem in Collaborative, in: Proc. of the ECAI, Workshop on Recommender Systems in conjunction with the 17th European Conference on Artificial Intelligence Riva del Garda, Italy, 34-39. [18] Zhou, B., S.C. Hui And K. Chang, 2004. An Intelligent Recommender System Using Sequential Web Access Patterns, In: Proc. of the 2004 IEEE Conference on Cybernetics and Intelligent Systems, Singapore, 1-3.