Optimizing QoS-Aware Semantic Web Service Composition Freddy Lécué The University of Manchester Booth Street East, Manchester, UK {(firstname.astname)@manchester.ac.uk} Abstract. Ranking and optimization of web service compositions are some of the most interesting chaenges at present. Since web services can be enhanced with forma semantic descriptions, forming the semantic web services, it becomes conceivabe to expoit the quaity of semantic inks between services (of any composition) as one of the optimization criteria. For this we propose to use the semantic simiarities between output and input parameters of web services. Couping this with other criteria such as quaity of service (QoS) aow us to rank and optimize compositions achieving the same goa. Here we suggest an innovative and extensibe optimization mode designed to baance semantic fit (or functiona quaity) with non-functiona QoS metrics. To aow the use of this mode in the context of a arge number of services as foreseen by the strategic EC-funded project SOA4A we propose and test the use of Genetic Agorithms. Key words: Semantic Web, Web service, Service composition, Quaity of service and composition, Automated reasoning. 1 Introduction The Semantic Web [1], where the semantic content of the information is tagged using machine-processabe anguages such as the Web Ontoogy Language (OWL) [2], is considered to provide many advantages over the current formatting ony version of the Word-Wide-Web. OWL is based on concepts from Description Logics [3] and ontoogies, forma conceptuaization of a particuar domain. This aows us to describe the semantics of services, e.g., their functionaity in terms of input and output parameters, preconditions, effects and invariants. Such descriptions can then be used for automatic reasoning about services and automating their use to accompish inteigent tasks such as seection, discovery and composition. Here we focus on web service composition and more specificay on its functiona eve, where a set of services is composed to achieve a goa on the basis of the semantic simiarities between input and output parameters as indicators of service functionaity. To measure semantic simiarity, we use the concept of (functiona) semantic ink [4], defined as a semantic connection (i.e., part of data fow) between an output and an Foundation Project: Supported by European Commission VII Framework IP Project Soa4A. 367
input parameter of two services. Web service compositions coud thus be estimated and ranked not ony aong we known non functiona parameters such as Quaity of Services (QoS) [5] but aso aong the dimension of semantic simiarity as indicator of functiona fit [6]. Considering semantics on connections of services is usefu in case the information required and provided by services does not match perfecty in every data fow. This is the case of semantic-based description of services. In this work we propose to unify both types of criteria in an innovative and extensibe mode aowing us to estimate and optimise the quaity of service compositions. Maximizing the quaity of service composition using this mode is essentiay a muti-objective optimization probem with constraints on quaity of services and semantic inks, which is known to be NP-hard [7]. Most approaches in the iterature addressing optimization in web service composition are based on Integer inear Programming (IP) e.g., [8]. However, IP approaches have been shown to have poor scaabiity [5] in terms of time taken to compute optima compositions when the size of the initia set of services grows. Such a case can arise in the future semantic web, where a arge number of semantic services wi be accessibe gobay. This is the vision of SOA4A, a strategic EC-funded project. Rapid computation of optima compositions is especiay important for interactive systems providing service composition faciities for end users, where ong deays may be unacceptabe. Here we demonstrate that the optimisation probem can be automated in a more scaabe manner using Genetic Agorithms (GAs), and propose an approach to tacke QoS-aware semantic web service composition. The remainder of this paper is organised as foows. In the next section we briefy review i) semantic inks, ii) their common descriptions and iii) the web service composition mode. Section 3 introduces the quaity criteria for QoS-aware semantic web service composition. Section 4 detais the GA-based evoutionary approach, incuding the strategies of the crossover, mutation and fitness function. Section 5 reports and discusses resuts from the experimentations. Section 6 briefy comments on reated work. Finay section 7 draws some concusions and taks about possibe future directions. 2 Background In this section we describe how semantic inks can be used to mode web service composition. In addition we remind the definition of Common Description in semantic inks. 2.1 Semantic Links between Web Services In the semantic web, input and output parameters of services referred to concepts in a common ontoogy 1 or Terminoogy T (e.g., Fig.2), where the OWL-S profie [9] or SA- WSDL [10] can be used to describe them (through semantic annotations). At functiona eve web service composition consists in retrieving some semantic inks [4] noted s i,j (Fig.1) i.e.,. s i,j = s i, Sim T (Out s i, In s j ), s j (1) 1 Distributed ontoogies are not considered here but are argey independent of the probem addressed in this work. 368
between output parameters Out s i T of services s i and input parameters In s j T of other services s j. Thereby s i and s j are partiay inked according to a matching function Sim T. Given a terminoogy T, [11] and [12] vaue the range of the atter function aong five matching types: i) Exact i.e., Out s i In s j, ii) PugIn i.e., Out s i In s j, iii) Subsume i.e., In s j Out s i, iv) Intersection i.e., (Out s i In s j ) and v) Disjoint i.e., Out s i In s j. In 0 s i In k s i Out 0 s i s i Service Semantic Link s i,j (Sim T (Out s i, In s j )) Out s i In s j In 0 s j s j Service Out s j In n s i Out n s i In n s j Semantic Link s Service Input Parameter Output Parameter Fig. 1. A Semantic Link s i,j between Services s i and s j. Exampe 1 (Matching Type) Suppose s 1,2 (Fig.3) be a semantic ink between two services s 1 and s 2 such that the output parameter NetworkConnection of s 1 is (semantic) inked to the input parameter SowNetworkConnection of s 2. According to Fig.2 this ink is vaued by a Subsume matching type since N etworkconnection SowN etworkconnection. The matching function Sim T enabes, at design time, finding some types of semantic compatibiities (i.e., Exact, PugIn, Subsume, Intersection) and incompatibiities (i.e., Disjoint) among independenty defined service descriptions. In this direction the atter function can be used to define the quaity of data fow in web service composition at semantic eve. 2.2 Common Description of a Semantic Link Besides computing the matching type of a semantic ink, authors of [13] suggest computing a finer eve of information i.e., the Extra and Common Descriptions between Out s i and In s j of a semantic ink s i,j. They adapt the definition of syntactic difference [14] for comparing ALE DL descriptions and then obtaining a compact representation. The Extra Description In s j \Out s i : In s j \Out s i. = min d {E E Out s i In s j Out s i } (2) refers 2 to information required by In s j but not provided by Out s i to ensure a correct data fow between s i and s j. The Common Description of Out s i and In s j, defining as their Least Common Subsumer [15] cs, refers to information required by In s j and provided by Out s i 3. 2 with respect to the subdescription ordering d. 3 In case Out s i In s j, In s j\out s i is repaced by its more genera form i.e., In s j\cs(in s j, Out s i). 369
NetworkConnection netp ro.p rovider netspeed.speed SowNetworkConnection NetworkConnection netspeed.ads1m Ads1M Speed mbytes.1m Fig. 2. Sampe of an ALE Domain Ontoogy T. Exampe 2 (Extra & Common Description) Suppose s 1,2 in Exampe 1. On the one hand the Extra Description missing in Network- Connection to be used by the input parameter SowNetworkConnection is defined by SowN etworkconnection\n etworkconnection i.e., netspeed.ads1m. On the other hand the Common Description is defined by cs(sown etworkconnection, NetworkConnection) i.e., NetworkConnection. 2.3 Modeing Web Service Composition In this work, the process mode of web service composition and its semantic inks is specified by a statechart [16]. Its states refer to services whereas its transitions are abeed with semantic inks. In addition some basic composition constructs such as sequence, conditiona branching (i.e., OR-Branching), structured oops, concurrent threads (i.e., AND-Branching), and inter-thread synchronization can be found. To simpify the presentation, we initiay assume that a considered statecharts are acycic and consists of ony sequences, OR-Branching and AND-Branching. Exampe 3 (Process Mode of a Web Service Composition) Suppose a composition extending Exampe 1 with six more services s i,1 i 8, eight more semantic inks s i,j. Its process mode is depicted in Fig.3. T 2 T 3 T 6 Sow s 2 s 3 s 6 s 1,2 Network s 2,3 s 5,6 s 6,8 Connection s 3,5 T 1 T 5 T 8 Network AND s 1 OR-Branching s 5 Branching s 8 Connection s 1,4 Semantic Link s T 4 s 4 Input Parameter s 4,5 s 5,7 s T 7,8 7 s 7 Output Parameter T: Task s: Service Fig. 3. A (Concrete) Web Service Composition. The exampe 3 iustrates a composition wherein tasks T i and abstract semantic ink s A i,j have been respectivey concretized by one of their n candidate services (e.g., s i) 370
and n 2 candidate inks (e.g., s i,j ). Indeed some services with common functionaity, preconditions and effects athough different input, output parameters and quaity can be seected to perform a target task T i and obtaining a concrete composition. Such a seection wi have a direct impact on semantic inks invoved in the concrete composition. In the foowing we assume that compositions of tasks (achieving a goa) have been pre-computed. This computation can be performed by tempate-based and parametricdesign-based composition approaches [17]. So the contro fow of the pre-computed compositions is fixed since it is pre-defined. The choice of the service (that fufis a given task) wi be done at composition time, based on both quaity of i) services and ii) their semantic inks (i.e., quaity of data fow). 3 Quaity Mode Here we present a quaity criterion to vaue semantic inks. Then we suggest to extend it with the non functiona QoS to estimate both quaity eves of any compositions. 3.1 Quaity of Semantic Link We consider two generic quaity criteria for semantic inks s i,j defined by s i, Sim T ( Out s i, In s j ), s j : its i) Common Description rate, and ii) Matching Quaity. Definition 1 (Common Description rate of a Semantic Link) Given a semantic ink s i,j between s i and s j, the Common Description rate q cd (0, 1] provides one possibe measure for the degree of simiarity between an output parameter of s i and an input parameter of s j. This rate is computed using the foowing expression: q cd (s i,j ) = cs(out s i, In s j ) In s j \Out s i + cs(out s i, In s j ) (3) This criterion estimates the proportion of descriptions which is we specified for ensuring a correct data fow between s i and s j. The expressions in between refer to the size of ALE concept descriptions ([18]. p.17) i.e.,,, A, A and r is 1; C D = C + D ; r.c and r.c is 1 + C. For instance Ads1M is 3 in the ontoogy iustrated in Fig. 2. Definition 2 (Matching Quaity of a Semantic Link) The Matching Quaity q m of a semantic ink s i,j is a vaue in (0, 1] defined by Sim T ( Out s i, In s j ) i.e., either 1 (Exact), 3 4 (PugIn), 1 2 (Subsume) or 1 4 (Intersection). The discretization of the matching types foows a partia ordering [19] where the assignment of vaues to matching types is driven by the data integration costs. Behind each matching type, tasks of XML (Extensibe Markup Language) data type integration and manipuation are required. The PugIn matching type is more penaized than the Exact matching type in this mode. Indeed the data integration process is ower (in term of computation costs) for the Exact matching type than for PugIn matching type. 371
Contrary to q cd, q m does not estimate simiarity between the parameters of semantic inks but gives a genera overview (discretized vaues) of their semantic reationships. Given these quaity criteria, the quaity vector of a semantic ink s i,j is defined by: q(s i,j ). = ( q cd (s i,j ), q m (s i,j ) ) (4) The quaity of semantic inks can be compared by anaysing their q cd and q m eements. For instance q(s i,j ) > q(s i,j ) if q cd(s i,j ) > q cd (s i,j ) and q m(s i,j ) > q m (s i,j ). Aternativey we can compare a weighted average of their normaised components in case the vaue of the first eement of s i,j is better than the first eement of s i,j but worse for the second eement [20]. Exampe 4 (Quaity of Semantic Links) Let s 2 be another candidate service for T 2 in Fig.3 with NetworkConnection as an input. The ink s 1,2 between s 1 and s 2 is better than s 1,2 since q(s 1,2) > q(s 1,2 ). In case s i, s j are reated by more than one ink, the vaue of each criterion is retrieved by computing their average. This average is computing by means of the And- Branching row of Tabe 1, independenty aong each dimension of the quaity mode. 3.2 QoS-Extended Quaity of Semantic Link We extend the atter quaity mode by expoiting the non functiona properties of services (aso known as QoS attributes [21] - given by service providers or third parties) invoved in each semantic ink. We simpify the presentation by considering ony: Execution Price q pr (s i ) R + of service s i i.e., the fee requested by the service provider for invoking it. Response Time q t (s i ) R + of service s i i.e., the expected deay between the request and resut moments. A quaity vector of a service s i is then defined as foows: q(s i ). = (q pr (s i ), q t (s i )) (5) Thus a QoS-extended quaity vector of a semantic ink s i,j : q (s i,j ). = (q(s i ), q(s i,j ), q(s j )) (6) Given an abstract ink between tasks T i, T j, one may seect the ink with the best functiona quaity (matching quaity, common description rate), and non-functiona (the cheapest and fastest services) quaity vaues, or may be a compromise (depending on the enduser preferences) between the four by couping (4) and (6) in (6). Moreover the seection coud be infuenced by predefining some constraints e.g., a service response time ower than a given vaue. Exampe 5 (QoS-Extended Quaity of Semantic Link) Suppose T 2 and its two candidate services s 2, s 2 wherein q(s 2) < q(s 2 ). According to exampe 4, s 2 shoud be preferred regarding the quaity of its semantic ink with s 1, whereas s 2 shoud be preferred regarding its QoS. So what about the best candidate for s A 1,2 regarding both criteria: q? Before addressing this question in Section 4 through equation (9), we first focus in quaity of composition in Section 3.3. 372
3.3 Quaity of Composition We present definitions for comparing and ranking different compositions aong the common description rate and matching quaity dimension. The rues for aggregating quaity vaues (Tabe 1) for any concrete composition c are driven by them. In more detais the approach for computing semantic quaity of c is adapted from the appication-driven heuristics of [6], whie the computation of its non functiona QoS is simiar to [22]. Definition 3 (Common Description rate of a Composition) The Common Description rate of a composition measures the average degree of simiarity between a corresponding parameters of services inked by a semantic ink. The Common Description rate Q cd of both a sequentia and AND-Branching composition is defined as the average of its semantic inks common description rate q cd (s i,j ). The common description rate of an OR-Branching composition is a sum of q cd (s i,j ) weighted by p si,j i.e., the probabiity that semantic ink s i,j be chosen at run time. Such probabiities are initiaized by the composition designer, and then eventuay updated considering the information obtained by monitoring the workfow executions. Definition 4 (Matching Quaity of a Composition) The matching quaity of a composition estimates the overa matching quaity of its semantic inks. Contrary to the common description rate, this criteron aims at easiy distinguishing and identifying between very good and very bad matching quaity. The matching quaity Q m of a sequentia and AND-Branching composition is defined as a product of q m (s i,j ). A different (non empty) matching quaities invoved in such compositions require to be considered together in such a (non-inear) aggregation function to make sure that compositions that contains semantic inks with ow or high matching quaity wi be more easiy identified, and then pruned for the set of potentia soutions. The matching quaity of an OR-Branching composition is defined as its common description rate by changing q cd (s i,j ) by q m (s i,j ). Detais for computing Execution Price Q pr and Response Time Q t can be found in Tabe 1, and further expained in [22]. Quaity Criterion Composition Functiona Non Functiona Construct Q cd Q m Q t Q pr Sequentia/ 1 s i,j s i,j q cd (s i,j) s i,j q m(s i,j) s i q t(s i) s i q pr(s i) AND- Branching max s q t(s) OR-Branching s i,j q cd (s i,j).p si,j s i,j q m(s i,j).p si,j s i q t(s i).p si s i q pr(s i).p si Tabe 1. Quaity Aggregation Rues for Semantic Web Service Composition. Using Tabe 1, the quaity vector of any concrete composition can be defined by: Q(c). = (Q cd (c), Q m (c), Q t (c), Q pr (c)) (7) 373
Athough the adopted quaity mode has a imited number of criteria (for the sake of iustration), (4), (5), (6) as we as (7) are extensibe: new functiona criteria can be added without fundamentay atering the service seection techniques buit on top of the mode. In this direction the binary criterion of robustness [13] in semantic inks can be considered 4. In addition, other non-functiona criteria such as reputation, avaiabiity, reiabiity, successfu execution rate, etc., can aso be considered in such an extension. 4 A Genetic Agorithm Based Optimization The optimization probem (i.e., determining the best set of services of a composition with respect to some quaity constraints) which can be formaized as a Constraints Satisfaction Optimization Probem (T, D, C, f) where T is the set of tasks (variabes) in the composition, D is the set of services domains for T (each D i representing a set of possibe concrete services that fufi the task T i ), C is the set of constraints and f is an evauation function that maps every soution tupe to a numerica vaue, is NPhard. In case the number of tasks and candidate services are respectivey n and m, the naive approach considers an exhaustive search of the optima composition among a the m n concrete compositions. Since such an approach is impractica for arge-scae composition, we address this issue by presenting a GA-based approach [23] which i) supports constraints on QoS and aso on quaity of semantic inks and ii) requires the set of seected services as a soution to maximize a given objective. Here compositions refer to their concrete form. 4.1 GA Parameters for Optimizing Composition By appying a GA-based approach the optima soution (represented by its genotype) is determined by simuating the evoution of an initia popuation (through generation) unti surviva of best fitted individuas (here compositions) satisfying some constraints. The survivors are obtained by crossover, mutation, seection of compositions from previous generations. Detais of GA parameterization foow: s 2 s 2 s 1 T 1 T 2 s 3 s 4 s 5 s 7 T 3 T 5 T 6 T: Task s: Service s 6 T 7 s 8 T 4 T 8 Seected s i for T i Fig. 4. Genotype Encoding for Service Composition. Genotype: it is defined by an array of integer. The number of items is equa to the number tasks invoved in the composition. Each item, in turn, contains an index 4 Contrary to [6], we did not consider robustness because of its strong dependency with the matching quaity criterion. Indeed they are not independent criteria since the robustness is 1 if the matching type is either Exact or PugIn, and 0 otherwise. 374
to an array of candidate services matching that task. Each composition, as a potentia soution of the optimization probem, can be encoded using this genotype (e.g., Fig.4 is encoding the genotype of composition in Fig.3). Initia Popuation: it consists of an initia set of compositions (characterized by their genotypes) wherein services are randomy seected. Goba, Loca Constraints have to be met by compositions c e.g., Q cd (c) > 0.8. Fitness Function: this function is required to quantify the quaity of any composition c. Such a function f needs to maximize semantic quaity attributes, whie minimizing the QoS attributes of c: f(c) = ω ˆQ cd cd (c) + ω m ˆQm (c) (8) ω pr ˆQpr (c) + ω t ˆQt (c) where ˆQ {pr,t,cd,m} refer to Q normaized in the interva [0, 1]. ω [0, 1] is the weight assigned to the th quaity criterion and {pr,t,cd,m} ω = 1. In this way preferences on quaity of the desired compositions can be done by simpy adjusting ω e.g., the Common Description rate coud be weighted higher. In addition f must drive the evoution towards constraint satisfaction. To this end compositions that do not meet the constraints are penaized by extending (8) wrt. (9). ˆQ max f (c) = f(c) ω pe {pr,t, cd,m} ( ˆQ ˆQ max (c) ˆQ min (c) ) 2 (9) where, ˆQmin are respectivey the maximum and minima vaue of the th quaity constraint, ω pe weights the penaty factor and ˆQ {pr,t,cd,m} is defined by: max max ˆQ ˆQ ˆQ if ˆQ > ˆQ = 0 if ˆQmin ˆQ max ˆQ (10) ˆQ min ˆQ min if ˆQ < ˆQ Contrary to [5], compositions that vioate constraints do not receive the same penaty. Indeed the factor ω pe is further penaized in (9). This function avoids oca optima by considering aso compositions that disobey constraints. Unfortunatey, (9) contains a penaty for concrete compositions, which is the same at each generation. If, as usua, the weight ω pe for this penaty factor is high, there is a risk that aso concrete composition vioating the constraints but cose to a good soution coud be discarded. The aternative is to adopt a dynamic penaty, i.e., a penaty having a weight that increases with the number of generations. This aows, for the eary generations, to aso consider some individuas vioating the constraints. After a number of generations, the popuation shoud be abe to meet the constraints, and the evoution wi try to improve ony the rest of the fitness function. The dynamic fitness function (to be maximized) is: f gen (c, gen) = f(c) ω pe. maxgen. {pr,t, cd,m} ( ˆQ ˆQ max (c) min ˆQ (c) ) 2 (11) 375
gen is the current generation, whie maxgen is the maximum number of generations. Operators on Genotypes: they define authorized aterations on genotypes not ony to ensure evoution of compositions popuation aong generations but aso to prevent convergence to oca optimum. We use: i) composition mutation i.e., random seection of a task (i.e., a position in the genotype) in a concrete composition and repacing its service with another one among those avaiabe, ii) the standard two-points crossover i.e., randomy combination of two compositions and iii) seection of compositions which is fitness-based i.e., compositions disobeying the constraints are seected proportionay from previous generations. Stopping Criterion: it enabes to stop the evoution of a popuation. First of a we iterate unti the constraints are met (i.e., Q = 0 {pr, t, cd, m}) within a maximum number of generations. Once the atter constraints are satisfied we iterate unti the best fitness composition remains unchanged for a given number of generations. 4.2 GA for Optimizing Composition in a Nutshe The execution of the GA consists in i) defining the initia popuation (as a set of compositions), and computing the fitness function (evauation criterion) of each composition, ii) evoving the popuation by appying mutation and crossover of compositions (Tasks with ony one candidate service are disregarded), iii) seecting compositions, iv) evauating compositions of the popuation, and v) back to step (ii) if the stopping criterion is not satisfied. Section 5 further detais the parameters. In case no soution exists, users may reax constraints of the optimization probem. Instead, fuzzy ogic coud be used to address the imprecision in specifying quaity constraints, estimating quaity vaues and expressing composition quaity. 5 Experimenta Resuts We anayze the performances of our approach by i) discussing the benefits of combining QoS and functiona criteria, ii) observing the evoution of the composition quaity f in (11) (equa weights are assigned to the different quaity criteria) over the GA generations by varying the number of tasks, iii) studying the behaviour of our approach regarding the optimisation of arge scae compositions, iv) evauating performance after decouping the GA and the (on-ine) DL reasoning processes, and v) comparing the convergence of our approach (11) with [5]. Before turning our attention to the atter five sets of experiments, we first draw the context of experimentation. 5.1 Context of Experimentation Services, Semantic Links and their Quaities. Services 5 are defined by their semantic descriptions using an ALE ontoogy (formay defined by 1100 concepts and 390 properties, 1753 individuas, without data property), provided by a commercia partner. We 5 The choice of proprietary services has been motivated by the poor quaity of existing benchmark services in terms of number of services or expressivity (imited functiona specification, 376
have incorporated estimated vaues for Qos parameters (price and response time). Common description rate and matching quaity of semantic inks are computed according to an on-ine DL reasoning process. Impementation Detais. The common description rate (3) is cacuated by computing the Extra Description (2), the Least Common Subsumer [15], and the size ([18] p.17) of DL-based concepts. These DL inferences and the matching types have been achieved by a DL reasoning process i.e., an adaptation of Fact++ [24] for considering DL difference. The aggregation rues of Tabe 1 are then used for computing each quaity dimension of any composition. Finay the combination of QoS with semantic cacuation is computed by means of (11), thus obtaining the fina quaity score for the composition. Our GA is impemented in Java, extending a GPL ibrary 6. The optima compositions are computed by using an eitist GA where the best 2 compositions were kept aive across generations, with a crossover probabiity of 0.7, a mutation probabiity of 0.1, a popuation of 200 compositions. The rouette whee seection has been adopted as seection mechanism. We consider a simpe stopping criterion i.e., up to 400 generations. We conducted experiments on Inte(R) Core(TM)2 CPU, 2.4GHz with 2GB RAM. Compositions with up to 30 tasks and 35 candidates per task (35 2 candidate semantic inks between 2 tasks) have been considered in Sections 5.2, 5.3, 5.5, 5.6, especiay for obtaining convincing resuts towards their appicabiity in rea (industria) scenarios. Experiment resuts reported in Section 5.2 provide some benefits of combining QoS and functiona criteria for the overa quaity of composition, whereas those in Sections 5.3, 5.4, 5.5 and 5.6 are reated to scaabiity. 5.2 Benefits of Combining QoS and Functiona Criteria Fig.5 reports the benefits of combining QoS and functiona criteria. In more detais, we studied the impact of functiona quaity on the costs of data integration, which enabing the end-to-end composition of services. The data integration process aigns the data fow specification of a composition by manipuating and transforming the semantic descriptions of contents of outgoing and incoming messages of annotated services. Our approach and [5] are compared on ten compositions c i,1 i 10 with Q cd (c i ) = 10 1 i and Q m (c i ) = 10 i 10 as functiona quaity to refect graduay better quaity. On the one hand, as expected, the costs of data integration is ow (actuay trivia) for both approaches, regarding the composition with the best quaity (i.e., c 10 ). Indeed the parameters of services match exacty, hence no further specification is needed.. On the other hand, these costs decrease with the functiona quaity of compositions in our approach, whereas they are steady but very high for compositions computed by [5] (purey based on non functiona quaity of composition). This is due to i) the ack of specification of functiona quaity (hence a hard task to buid semantic data fow from scratch), and ii) the manua approach used to ink data in compositions. no binding, restricted RDF-based description). We pan further experimentations with OWL-S TC 3.0 (http://www.semwebcentra.org/frs/?group_id=89) and SA-WSDL TC1 (http://projects.semwebcentra.org/projects/sawsd-tc/). 6 http://jgap.sourceforge.net/ 377
Avg. Times (s) in Logarithm Scae 10000 1000 100 10 Our Approach [CanPEV05] c 1 c 2 c 3 c 4 c 5 c 6 c 7 c 8 c 9 c 10 Web Service Composition Syntactic based service description Fig. 5. Costs of Data Integration (through Data Fow Specification). Appropriate quaities of semantic inks are very usefu i) to discover data fow in composition, ii) to ease the specification (semi-automated process with Assign/Copy eements + XPath/XQuery processes a a BPEL4WS) of syntactic (and heterogeneous) data connections (for the persons who deveop these mappings), so imiting the costs of data integration. Indeed the better the quaity of semantic inks the better the semantic mapping between the outgoing and incoming (SA-WSDL for instance) messages. 5.3 Evoution of the Composition Quaity Fig.6 reports the evoution of the composition quaity over the GA generations, by varying the number of tasks. This iustrates different eves of convergence to a composition that meets some constraints and optimizes its different criteria by maximizing the common description and matching quaity whie minimizing price and response time. Fitness Function (%) (Average of 50 Executions) 100 80 60 40 20 A Composition of 10 Tasks A Composition of 20 Tasks A Composition of 30 Tasks 0 0 50 100 150 200 250 300 350 400 Generation Fig. 6. Evoution of the Composition Quaity. Tabe 2 and Fig.6 present the computation costs and the number of generations required to obtain the maxima fitness vaue. The more tasks (and services) the more time consuming to converge to the optimum. Obviousy, the popuation size and the number of generations shoud be extended to reach the optimum of more compex compositions. 5.4 Towards Large Scae based Compositions In this experiment we suggest to study the behaviour of our approach regarding the optimisation of compositions with a arge number of tasks (up to 500 tasks) and candidate 378
Tasks Num. Max. Fitness (%) Generation Num. Time (ms) 10 99 120 1012 20 97 280 1650 30 95 360 3142 Tabe 2. Overview of Computation Costs. services (500). To this end we focus on its scaabiity and the impact of the number of generations as we as the popuation size on the GA success. Tasks Num. Max. Fitness (%) Generation Num./ Time (ms) Popuation Size 100 85 400/200 4212 96 700/400 9182 300 47 400/200 5520 95 1500/500 19120 500 24 400/200 7023 95 3000/1000 51540 Tabe 3. Large Scae Compositions. As iustrated in Tabe 3, increasing both the number of generations and the popuation size does actuay resut in better fitness vaues for probems with a arger number of tasks and candidate services. For exampe, regarding the optimisation of a composition of 500 tasks with 500 candidate services, a number of generations of 400 and a popuation size of 200 do resut in a ow fitness vaue of 24% of the maximum, whereas considering a number of generations of 3000 and a popuation size of 1000 achieve 95% of the maximum. Note that better fitness vaues can be reached by further increasing the sizes of generations and popuations. However doubing these sizes ony improves the fitness vaue by 2%. This shows that each optimisation probem converges to a imit. 5.5 Decouping GA Process and DL Reasoning Since our approach is mainy depending on DL reasoning (i.e., Subsumption for q m, Difference and cs for q cd ) and the GA-based optimization process, we suggest to decoupe and detai the computation costs of Tabe 2 in Fig.7. DL reasoning is the most time consuming process in optimisation of QoS-aware semantic web service composition wherein the number of tasks and candidate services are greater than 10 and 35. This is caused by the critica compexity of q cd computation through DL Difference (even in ALE DL). 5.6 Convergence of GA-Based Approaches In this experiment, we compare the convergence of our approach (11) with the main aternative at present [5]. To this end the functiona criteria of our approach are disregarded in order to focus ony on the GA-driven aspects of the optimisation process. 379
Computation Cost (ms) (Average of 50 Executions) 3000 2500 2000 1500 1000 500 0 DL Difference for q cd DL cs Computation for q cd DL Subsumption for q m Pure GA Process DL GA DL GA DL GA Task Num. = 10 Task Num. = 20 Task Num. = 30 Fig. 7. DL and GA Processes in our Approach. Tasks Max. Generation Approach Time (ms) Num. Fitness (%) Num. 10 Our Mode (11) 99 120 1012 [5] 97 156 1356 20 Our Mode (11) 97 280 1650 [5] 94 425 2896 30 Our Mode (11) 95 360 3142 [5] 85 596 6590 Tabe 4. Comparing GA-based Approaches (Popuation size of 200). According to Tabe 4, the advantage of our approach is twofod. Firsty we obtain better fitness vaues for the optima composition than the approach of [5]. Secondy, our approach converges faster than the approach of [5]. In addition our function avoids getting trapped by oca optimums by i) further penaizing compositions that disobey constraints (the factor of ω pe in (9) and (11)) and ii) suggesting a dynamic penaty, i.e., a penaty having a weight that increases with the number of generations. These resuts support the adoption of our mode in the cases where a arge number of tasks and services are considered. 6 Reated Work Review of existing approaches to optimising web service compositions reveas that no approach has specificay addressed optimisation of service composition using both QoS and semantic simiarities dimensions in a context of significant scae. Indeed main approaches focus on either QoS [5, 8] or on functiona criteria such as semantic simiarities [6] between output and input parameters of web services for optimising web service composition. In contrast, we present an innovative mode that addresses both types of quaity criteria as a trade-off between data fow and non functiona quaity for optimizing web service composition. Soving such a muti-criteria optimization probem can be approached using IP [8, 6], GA [5], or Constraint Programming [25]. The resuts of [5] demonstrate that GAs are better at handing non-inearity of aggregation rues, and provide better scaing up to 380
a arge number of services per task. In addition they show that dynamic programming (such as IP-based approaches) is preferabe for smaer compositions. We foow [5] and suggest the use of GAs to achieve optimization in web service composition, yet we aso extend their mode by i) using semantic inks to consider data fow in composition, ii) considering not ony QoS but aso semantic quaity (and contraints) of composition, iii) revisiting the fitness function to avoid oca optima soution (i.e., compositions disobeying constraints are considered). The optimization probem can be aso modeed as a knapsack probem [26], wherein [27] performed dynamic programming to sove it. Unfortunatey the previous QoSaware service composition approaches consider ony inks vaued by Exact matching types, hence no semantic quaity of compositions. Towards the atter issue [6] introduces a genera and forma mode to evauate such a quaity. From this they formuate an optimization probem which is soved by adapting the IP-based approach of [8]. A quaity criteria are used for specifying both constraints and objective function. 7 Concusion We studied QoS-aware semantic web service composition in a context of significant scae i.e., how to effectivey compute optima compositions of QoS-aware web services by considering their semantic inks. On the one hand the benefits of a significant domain such as the Web is cear e.g., supporting a arge number of services providers, considering arge number of services that have same goas. On the other hand, the benefits of combining semantic inks between services and QoS are as foowing: The computation of web services composition whist optimising both the non functiona quaities and the quaity of semantic fit aong non-trivia data fow, where the information required and provided by services does not match perfecty in every datafow, using semantic-based description of services. By addressing non trivia data fow in composition, we aimed at imiting the costs of (semantic heterogeneity) data integration between services by considering appropriate quaity of semantic inks. To this end we have presented an innovative and extensibe mode to evauate quaity of i) web services (QoS), ii) their semantic inks, and iii) their compositions. In regards to the atter criteria the probem is formaized as an optimization probem with mutipe constraints. Since one of our main concerns is about optimization of arge-scae web service compositions (i.e., many services can achieve a same functionaity), we suggested to foow a GA-based approach, faster than appying IP. The experimenta resuts have shown an acceptabe computation costs of our GAbased approach despite the time consuming process of the on-ine DL reasoning. In case of semantic ink -ess modes, the benefits are mainy based on penaizing constraint vioation (of the fitness function) which makes the approach faster than [5]. In future work we wi consider a finer difference operator, which is aso easy-tocompute in expressive DLs. Determining the most appropriate parameters for the GA phase requires further experimentations. References 1. Berners-Lee, T., Hender, J., Lassia, O.: The semantic web. Scientific American 284(5) (2001) 34 43 381
2. Smith, M.K., Wety, C., McGuinness, D.L.: Ow web ontoogy anguage guide. W3c recommendation, W3C (2004) 3. Baader, F., Nutt, W. In: The Description Logic Handbook: Theory, Impementation, and Appications. (2003) 4. Lécué, F., Léger, A.: A forma mode for semantic web service composition. In: ISWC. (2006) 385 398 5. Canfora, G., Penta, M.D., Esposito, R., Viani, M.L.: An approach for qos-aware service composition based on genetic agorithms. In: GECCO. (2005) 1069 1075 6. Lécué, F., Detei, A., Léger, A.: Optimizing causa ink based web service composition. In: ECAI. (2008) 45 49 7. Papadimtriou, C.H., Steigitz, K.: Combinatoria Optimization: Agorithms and Compexity. Prentice-Ha (1982) 8. Zeng, L., Benataah, B., Dumas, M., Kaagnanam, J., Sheng, Q.Z.: Quaity driven web services composition. In: WWW. (2003) 411 421 9. Ankoenkar, A., Paoucci, M., Srinivasan, N., Sycara, K.: The ow-s coaition, ow-s 1.1. Technica report (2004) 10. Kopecký, J., Vitvar, T., Bournez, C., Farre, J.: Sawsd: Semantic annotations for wsd and xm schema. IEEE Internet Computing 11(6) (2007) 60 67 11. Paoucci, M., Kawamura, T., Payne, T., Sycara, K.: Semantic matching of web services capabiities. In: ISWC. (2002) 333 347 12. Li, L., Horrocks, I.: A software framework for matchmaking based on semantic web technoogy. In: WWW. (2003) 331 339 13. Lécué, F., Detei, A.: Making the difference in semantic web service composition. In: AAAI. (2007) 1383 1388 14. Brandt, S., Kusters, R., Turhan, A.: Approximation and difference in description ogics. In: KR. (2002) 203 214 15. Baader, F., Sertkaya, B., Turhan, A.Y.: Computing the east common subsumer w.r.t. a background terminoogy. In: DL. (2004) 16. Hare, D., Naamad, A.: The statemate semantics of statecharts. ACM Trans. Softw. Eng. Methodo. 5(4) (1996) 293 333 17. E., M.: Reusabe Components For Knowedge Modeing Case Studies. Parametric Design Probem Soving, IOS Press (Netherands) (1999) 18. Küsters, R.: Non-Standard Inferences in Description Logics. Voume 2100 of Lecture Notes in Computer Science. Springer (2001) 19. Lécué, F., Detei, O.B.A.,, Léger, A.: Web service composition as a composition of vaid and robust semantic inks. IJCIS 18(1) (March 2009) 20. Hwang, C.L., Yoon., K.: Mutipe criteria decision making. Lecture Notes in Economics and Mathematica Systems (1981) 21. O Suivan, J., Edmond, D., ter Hofstede, A.H.M.: What s in a service? Distributed and Parae Databases 12(2/3) (2002) 117 133 22. Cardoso, J., Sheth, A.P., Mier, J.A., Arnod, J., Kochut, K.: Quaity of service for workfows and web service processes. J. Web Sem. 1(3) (2004) 281 308 23. Godberg, D.E.: Genetic Agorithms in Search, Optimization, and Machine Learning. Addison-Wesey Pubishing Company, Inc., Reading, MA (1989) 24. Horrocks, I.: Using an expressive description ogic: FaCT or fiction? In: KR. (1998) 636 649 25. Hassine, A.B., Matsubara, S., Ishida, T.: A constraint-based approach to web service composition. In: ISWC. (2006) 130 143 26. Yu, T., Lin, K.J.: Service seection agorithms for composing compex services with mutipe qos constraints. In: ICSOC. (2005) 130 143 27. Arpinar, I.B., Zhang, R., Aeman-Meza, B., Maduko, A.: Ontoogy-driven web services composition patform. Inf. Syst. E-Business Management 3(2) (2005) 175 199 382