Overview of omponent Searc System Tetsuo Yamamoto*,Makoto Matsusita**, Katsuro Inoue** *Japan Science and Tecnology gency **Osaka University ac part nalysis part xperiment onclusion and Future work Motivation Reuse of Software omponents is a tecnique of developing new software s by using te s developed in te past. xample of reusable s: source code, document.. improves productivity and quality, and cuts down development cost as a result. However, reuse of s is not utilized effectively. developer doesn t know existence of desirable s. ltoug tere are a lot of s, tese s are not organized. In order to take advantage of reuse, it is required to manage s and searc suitable easily Researc aim We ave built te system wic ave functions as follows ollects software s eagerly witout preserving teir inerent structures Manages te information automatically Provides be suitable for User s request Targets Intranet closed software development inside a company Internet Large open source software development web site SourceForge, Jakarta Project. etc. 4 ac part nalysis part xperiment onclusion and Future work 5 (Software Product rcive analysis and Retrieval System for Java) Java Software Product rciving, analyzing and Retrieving System Many s are analyzed automatically. searc engine is built based on te analysis information. omponent: a source code of class or interface Features Keyword searc Two ranking metods Frequency in use of a word Use relation nalyzed information omponents using/used by a Package ierarcy 6
Structure of Library (Java source files) File omponent analysis part extract s from a file store analyzed information to clustering and rank s using nalyzed information omponent information atabase store analyzed information and omponent retrieval part searc s in correspondence wit query from rank s based on frequency in use of a keyword aggregate two rankings User Result Query User interface part deliver query to retrieval part sow searc results Hit s Query 7 Ranking searc results. omponent suited to a user request Ranking based on frequency in use of a word Keyword Rank (KR). omponent used mostly Ranking based on use relation omponent Rank (R) We make it ig ranking tat te bot and are ig Searc results are sown to aggregate two ranks 8 ac part nalysis part xperiment onclusion and Future work omponent analysis part xtract and its information from a Java source file Te process xtract a Index te xtract use relations lustering similar s Rank s based on use relations (R metod) 9 xtract and index a xtracting Find class or interface block in a java source file Location information in te file (start line number, end line number) Indexing xtract index key from te public final class Sort { quicksort private static void quicksort( ) { int pivot; quicksort( ); quicksort( ); word kind Index key a word and te Sort lass name kind of it quicksort omment No reserved words are quicksort Metod name extracted pivot Variable name ount frequency in use of te quicksort Metod call word Index key Science and Tecnology, frequency Software ngineering Laboratory, epartment of omputer Science, Graduate Scool of Information Osaka University xtract use relations xtract use relations among s using semantic analysis Make grap from use relations Node: dge: use relation public class Test extend ata{ public static void main( ) { Sort.quicksort(super.array); ata Inerit Field access Test omponent grap Sort Metod call Inerit Interface implementation Variable type Inst creation Field access Metod call Te kind of use relation
Similar Similar is copied or minor modified We merge similar s into single Merged ave use relations tat all before merging ave G F omponent grap F G F G lustered grap lustering s We measure caracteristics metrics to merge s Te difference ratio of eac metrics Metrics complexity Te number of metods, cyclomatic, etc. represent a structural caracteristic Token-composition Te number of appears of eac token represent a surface caracteristic 4 Ranking based on use relation omponent Rank (R) Reusable ave many use relation Te example of use is muc General purpose Sopisticated We measure use relation quantitatively, and rank s Te used by many s is important Te used by important is also important Katsuro Inoue, Reisi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsusita, Sinji Kusumoto: "omponent Rank: Relative Signific Rank for Software omponent Searc", IS, Portland, OR, May 6,. Propagating weigts.4.7...7.. d-oc weigts are assigned to eac node 5 6 Propagating weigts Propagating weigts..75.7.5.75.75.5.7.45.45.75 Te node weigts are re-defined by te incoming edge weigts We get new node weigts 7 8
Propagating weigts.4...4..4. We get stable weigt assignment next-step weigts are te same as previous ones omponent Rank : order of nodes sorted by te weigt 9 ac part nalysis part xperiment onclusion and Future work omponent retrieval part Searc s from database, rank s Te process Searc s Ranking suited to a user request ggregate two ranks (R and KR) Searc s Searc query Words a user input Te kind of an index word, package name omponents contain given query are searced from atabase Ranking suited to a user request alculation of KR value Keyword Rank (KR) omponents wic contain words given by a user are searced Rank s using te value calculated from index word weigt Index word weigt Many frequency in use of a word contained particular s word represent te function suc as lass name Sort te sum of all given word weigt TF-IF weigting using full-text searc engine alculate weigt W ct wit c word t TFi Te frequency wit wic a kind i of word t occurs in c IF te total number of s / te number of s containing word t kwiweigt of a kind i allkind w ct = ( kwi TFi) IF KR value is te sum of all word W ct te kind of a word lass name Interface name Metod name Package name Import Metod call Field access Variable type Inst creation Local var access omment oc comment Line comment String weig t 5 5 5 4
ggregate two ranks orda ount metod ggregate two ranks KR and R ggregation metod orda ount metod known a voting system Use for single or multiple-seat elections Tis form of voting is extremely popular in determining awards Rank s bot KR and R Using KR and R, te tat be suitable user s request, reusable and sopisticated Tere are voters and 5 candidates (from to ) ac voter rank candidates point for last place, points for second from last place, and N points for first place st=5points nd=4points 5++6+4=8points 8points 8points points 6points s t s t s t n d r d r d 4t 4t 5t ggregation 5t 5 6 User interface ac part nalysis part xperiment onclusion and Future work Receive a user s query and provide te searc results troug Web browser Microsoft Internet xplore, Mozilla, etc. Te process Parse query word and te searc condition Sow rank ordered results Sow analyzed information of te Used by/using te Metrics 7 8 nalyzed information information are as follows Metrics Te number of metod, variable LO, cyclomatic tc. (measurable metrics in te itself) omponents used by/using te Sow lists of nodes followed use relation omponents tat are similar to te Sow lists of similar s Package browsing Te naming structure for Java packages is ierarcical user can searc lists of s in same package of a easily 9
Screensot (top page) Screensot (searc results) Screensot (source code) Screensot (similar s) 4 Screensot (using te ) Screensot (used by te ) 5 6
Screensot (package browsing) ac part nalysis part xperiment onclusion and Future work 7 8 xperiment(/) xperiment(/) omparison wit Google Register about, s get from Internet Query words calculator applet and cat server client alculate relev ratio of rank iger : Te is reusable source code Google is a web searc engine dd java source term to te query words Follow one link from te result web page xample calculator applet 9 its 7 suited s xample cat server client 69 its 57 suited s Using, suited is ig order relev ratio = order 4 5 6 7 8 9 - Te number of relevant rank order xample SPRS-J Ratio.8.86.75.78 - Google Ratio.67.6.67 7.6 6 xample Ratio Google ratio.4... 9 4 onclusion and Future work We developed searc engine Using, retrieval of s used well is enabled easily. Future work Morpological analysis of Index keyword ollaborative filtering Investigate best ranking metod Te value of weigt ggregation ranks valuation of Usability 4