Available online at www.sciencedirect.com
ScienceDirect Procedia Technology 25 (2016) 310 – 317
*OREDO&ROORTXLXPLQ5HFHQW$GYDQFHPHQWDQG(IIHFWXDO5HVHDUFKHVLQ(QJLQHHULQJ6FLHQFHDQG 7HFKQRORJ\5$(5(67
$QHIILFLHQWSULYDF\SUHVHUYLQJVHDUFKVFKHPHZLWKDFFHVVFRQWURO IRUFORXGGDWDFHQWHUV 7UHVD0DU\*HRUJH9 6KDPQD6-XELODQW-.L]KDNNHWKRWWDP Department of Computer Science and Engineering, Musaliar College of Engineering and Technology, Pathanamthitta 689653, India
$EVWUDFW 7KHLQWHUQHWDQGWKHHPHUJHQFHRIVRFLDOQHWZRUNVSURGXFHWHUDE\WHVRIGDWDHYHU\GD\,QWKLVELJGDWDVFHQDULRWKHDELOLW\WR RXWVRXUFHWKHGDWDWRDFORXGVWRUDJHIDFLOLW\VDYHVWKHGDWDPDQDJHPHQWDQGVWRUDJHIDFLOLW\FRVW6RPHPDMRUFKDOOHQJHVZLWK WKLV VFKHPH DUH SURYLGLQJ VHFXULW\ DQG HQVXULQJ WKH SULYDF\ RI WKH RXWVRXUFHG GDWD $OWKRXJK GDWD VHFXULW\ FDQ EH DFKLHYHG WKURXJK HQFU\SWLRQ VHDUFKLQJ RQ HQFU\SWHG GDWD EHFRPH D FRPSOH[ WDVN 7KH SURSRVHG ZRUN VXJJHVWV DQ HIILFLHQW VHDUFKLQJ VFKHPHIRUHQFU\SWHGFORXGGDWDEDVHGRQKLHUDUFKLFDOFOXVWHULQJRIGRFXPHQWV7KHKLHUDUFKLFDOFOXVWHULQJPHWKRGSUHVHUYHVWKH VHPDQWLF UHODWLRQVKLS EHWZHHQ WKH GRFXPHQWV LQ WKH HQFU\SWHG GRPDLQ WR VSHHG XS WKH VHDUFK SURFHVV &RQVHTXHQWO\ WKH SURSRVHG V\VWHP KDV OLQHDU FRPSXWDWLRQDO FRPSOH[LW\ GXULQJ WKH VHDUFK SKDVH LQ UHVSRQVH WR DQ H[SRQHQWLDO LQFUHDVH LQ WKH QXPEHURIGRFXPHQWV7KHV\VWHPDOVRHQVXUHVGDWDSULYDF\E\SURYLGLQJRQO\OLPLWHGDFFHVVRIWKHGRFXPHQWVWRWKHGLIIHUHQW W\SHVRIXVHUVE\LPSOHPHQWLQJDFFHVVFRQWUROPHFKDQLVPVUHVXOWLQJLQPRUHVHFXUHGGDWDVWRUDJHLQWKHFORXG 7KH$XWKRUV3XEOLVKHGE\(OVHYLHU/WG © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license 3HHUUHYLHZXQGHUUHVSRQVLELOLW\RIWKHRUJDQL]LQJFRPPLWWHHRI5$(5(67 (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of RAEREST 2016 Keywords:VHDUFKDEOHHQFU\SWLRQPXOWLNH\ZRUGVHDUFKKLHUDUFKLFDOFOXVWHULQJDFFHVVFRQWURO
,QWURGXFWLRQ $IXQGDPHQWDODSSOLFDWLRQRIFORXGFRPSXWLQJLVWKHDELOLW\WRRXWVRXUFHUHPRWHGDWDWRH[WHUQDOFORXGVHUYHUVWR HQDEOHVFDODEOHGDWDVWRUDJH7KHFORXGVHUYHUFDQSURYLGHDKXJHVWRUDJHVSDFHDQGKLJKFRPSXWDWLRQDOSRZHU>@
&RUUHVSRQGLQJDXWKRU E-mail address:YWUHVDPJ#JPDLOFRP
2212-0173 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the organizing committee of RAEREST 2016 doi:10.1016/j.protcy.2016.08.112
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
311
$FFRUGLQJO\HQWHUSULVHVDQGXVHUVZKRRZQDODUJHDPRXQWRIGDWDFDQRYHUFRPHWKHLUKDUGZDUHOLPLWDWLRQV$VWKLV WHFKQLTXHLVEHFRPLQJPRUHDQGPRUHSRSXODUWKHGDWDYROXPHLQFORXGVWRUDJHIDFLOLWLHVLVH[SHULHQFLQJDGUDPDWLF JURZWK $PDMRUFRQFHUQUHJDUGLQJWKHXVHRIFORXGFRPSXWLQJIRUGDWDVWRUDJHLVWKDWWKHRXWVRXUFHGGDWDPD\FRQWDLQ VHQVLWLYH LQIRUPDWLRQ VXFK DV SKRWRV HPDLOV EDQN VWDWHPHQWVHWF ,I WKHGDWD LV VWRUHG LQD SXEOLF FORXGZKLFK LV DFFHVVLEOH WR VHYHUDO RWKHU SHRSOH ZLWKRXW HIILFLHQW SURWHFWLRQ PHFKDQLVP LW FDQ OHDG WR VHYHUH SULYDF\ DQG FRQILGHQWLDOLW\ YLRODWLRQV >@ 7KH WUDGLWLRQDO ZD\ WR SUHYHQW VHQVLWLYH GDWD LV HQFU\SWLRQ 7KH GRFXPHQWV DUH HQFU\SWHG EHIRUH RXWVRXUFLQJ WKHP WR WKH FORXG 7KLV KRZHYHU LQWURGXFHV IXUWKHU FRPSOH[LWLHV GXULQJ WKH VHDUFK RSHUDWLRQ RQ HQFU\SWHG GDWD ZKHQ OHJLWLPDWH XVHUV QHHG DFFHVV WR WKRVH GRFXPHQWV 0DQ\ UHVHDUFKHUV KDYH LQYHVWLJDWHGRQWKLVLVVXHLQWKHUHFHQWGD\VDQGSURSRVHGVHYHUDOFLSKHUWH[WVHDUFKVFKHPHVEDVHGRQFU\SWRJUDSK\ WHFKQLTXHV >@ >@ +RZHYHU WKHVH PHWKRGV QHHG H[WHQVLYH FRPSXWDWLRQV DQG VXIIHU IURP KLJK WLPH FRPSOH[LW\ +HQFH WKHVH PHWKRGV DUH QRW VXLWDEOH IRU D ELJ GDWD HQYLURQPHQW >@ $QRWKHU PDMRU GUDZEDFN LV WKDW WKH UHODWLRQVKLSEHWZHHQWKHGRFXPHQWVLVFRQFHDOHGGXULQJWKHHQFU\SWLRQSURFHVV0DLQWDLQLQJVXFKDUHODWLRQVKLSLV LPSRUWDQWDVLWUHSUHVHQWVWKHSURSHUWLHVRIWKHGRFXPHQWV ,W LV DOVR QHFHVVDU\ WRSURYLGH FRQWUROOHGDFFHVV WR WKHRXWVRXUFHG FORXG GDWD WR GLIIHUHQW FODVVHVRI XVHUV 7KH V\VWHP PXVW SUHYHQW XQDXWKRUL]HG XVHUV IURP XSORDGLQJ FRUUXSWHG GRFXPHQWV WR WKH FORXG VHUYHU )RU H[DPSOH FRQVLGHUDXQLYHUVLW\FORXGLQZKLFKWKHVWXGHQWPDUNOLVWVDUHVWRUHGLQWKHFORXG,QVXFKDVFHQDULRWKHVWXGHQWV PXVWEHSUHYHQWHGIURPXSORDGLQJWKHLURZQPDUNOLVWVWKHUHE\RYHUZULWLQJWKHRULJLQDOFRS\7RSUHYHQWWKLVWKH V\VWHP ZLOO SURYLGH RQO\ GRZQORDG SULYLOHJHV WR WKH VWXGHQW XVHUV RI WKH FORXG 3URSHU LPSOHPHQWDWLRQ RI DFFHVV FRQWUROPHFKDQLVPVZLOOHQVXUHVXFKOLPLWHGDFFHVVWRWKHGLIIHUHQWFODVVRIFORXGXVHUV 7KHSURSRVHGV\VWHPXVHVDVHDUFKLQJVFKHPHEDVHGRQPXOWLNH\ZRUGUDQNHGVHDUFK,QDGGLWLRQDKLHUDUFKLFDO FOXVWHULQJPHWKRGLVXVHGWRFOXVWHUWKHGRFXPHQWVEDVHGRQDUHOHYDQFHVFRUH7KHUHLVDOVRDOLPLWRQWKHPD[LPXP VL]HRIHDFKFOXVWHU,IWKHVL]HRIDFOXVWHUH[FHHGVWKLVOLPLWWKHFOXVWHULVIXUWKHUGLYLGHGLQWRVXEFOXVWHUVXQWLOWKH VL]H RI HDFK FOXVWHU IDOO EHORZ WKH WKUHVKROG YDOXH 'XULQJ WKH VHDUFK SKDVH WKH V\VWHP LWHUDWLYHO\ GHWHUPLQHV WKH PRVWUHOHYDQWFOXVWHU2QO\WKRVHGRFXPHQWVLQWKDWFOXVWHUQHHGWREHVHDUFKHGWKHUHE\LWUHGXFHVWKHRYHUDOOVHDUFK WLPH 5HODWHGZRUNV 0DQ\ UHVHDUFKHV KDYH SURSRVHG VHYHUDO PHWKRGV IRU VHDUFK RQ HQFU\SWHG GDWD LQ WKH FORXG 6RPH RI WKHP DQG WKHLUGUDZEDFNVDUHGLVFXVVHGEHORZ 2.1. Searchable encryption based on single keyword ,QWKHPHWKRGSURSRVHGE\6RQJHWDO>@HDFKZRUGLQWKHGRFXPHQWLVHQFU\SWHGLQGHSHQGHQWO\7KLVUHTXLUHV VFDQQLQJ RI WKH HQWLUH GDWD FROOHFWLRQ ZRUG E\ ZRUG 7KH PDMRU GUDZEDFN RI WKLV PHWKRG LV WKH KLJK VHDUFK FRVW UHVXOWLQJIURPWKHVFDQQLQJRIHQWLUHGRFXPHQW&DVKHWDO>@SURSRVHGDV\PPHWULFVHDUFKDEOHHQFU\SWLRQVFKHPH 7KRXJKLWSURYLGHVKLJKHIILFLHQF\IRUODUJHGDWDEDVHVLWODFNVDUDQNPHFKDQLVP,IDODUJHQXPEHURIGRFXPHQWV FRQWDLQWKHVHDUFKHGNH\ZRUGWKHXVHUKDVWRPDQXDOO\VHOHFWZKDWWKH\DFWXDOO\ZDQWZKLFKLQWXUQLQFUHDVHWKH RYHUDOOVHDUFKWLPH 2.2. Searchable encryption based on multiple keywords &DR HW DO >@SURSRVHG DQ DUFKLWHFWXUHZKLFK SHUIRUP PXOWLNH\ZRUG VHDUFK DQG DOVRVXSSRUW UHVXOW UDQNLQJ E\ XVLQJNQHDUHVWQHLJKERUDOJRULWKP+RZHYHUWKHVHDUFKWLPHRIWKLVPHWKRGJURZVH[SRQHQWLDOO\LQUHVSRQVHWRDQ H[SRQHQWLDOO\ LQFUHDVLQJ VL]H RI WKH GRFXPHQW FROOHFWLRQV 6XQ HW DO >@ SURSRVHG D QHZ DUFKLWHFWXUH 7KRXJK LW SURYLGHV EHWWHU HIILFLHQF\ WKH UHOHYDQFH EHWZHHQ WKH GRFXPHQWV LV LJQRUHG DQG KHQFH LW GRHV QRW UHWXUQ WKH PRVW UHOHYDQWUHVXOWV
312
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
2.3. Boolean Symmetric Searchable Encryption 7DULN0RDWD]DQG$EGXOODWLI6KLIND>@SURSRVHGDV\VWHPIRUVHDUFKLQJPXOWLSOHNH\ZRUGVRYHUHQFU\SWHGGDWD XVLQJ%RROHDQ6\PPHWULF6HDUFKDEOH(QFU\SWLRQ%66( ,WXVHV*UDP6FKPLGWSURFHVVWRRSWLPL]HWKHVHDUFK SURFHVV,WFRQVLGHUVDUELWUDU\ERROHDQH[SUHVVLRQVVXFKDVFRQMXQFWLRQVDQGGLVMXQFWLRQVRINH\ZRUGVDQGWKHLU FRPSOHPHQWRQNH\ZRUGV 2.4. Fuzzy Keyword Search 7KHDERYHPHQWLRQHGVHDUFKLQJVFKHPHVZLOOUHWULHYHILOHVRQO\EDVHGRQH[DFWPDWFKRIWKHNH\ZRUG$Q\W\SRV DQGLQFRQVLVWHQFLHVLQWKHIRUPDWZLOOQRWUHWXUQWKHUHTXLUHGGRFXPHQWV-/LHWDO>@SURSRVHGDZLOGFDUGEDVHG WHFKQLTXH WR FUHDWH HIILFLHQWIX]]\NH\ZRUG VHWV WKDW FDQEHXVHG IRU PDWFKLQJ UHOHYDQW GRFXPHQWV :KHQHYHU WKH H[DFWPDWFKVHDUFKIDLOVWKHVHDUFKUHVXOWLVSURYLGHGEDVHGRQWKHIX]]\NH\ZRUGGDWDVHW 6\VWHPPRGHODQGSUREOHPIRUPXODWLRQ 7KH SURSRVHG V\VWHP XVHV D YHFWRU VSDFH PRGHO LQ ZKLFK HYHU\ GRFXPHQW LV UHSUHVHQWHG E\ D YHFWRU (YHU\ GRFXPHQWFDQEHVHHQDVDSRLQWLQDKLJKGLPHQVLRQDOVSDFH7KHGRFXPHQWVDUHFODVVLILHGLQWRFDWHJRULHVE\XVLQJD FOXVWHULQJPHWKRG7KHSURSRVHGV\VWHPXVHVDKLHUDUFKLFDOFOXVWHULQJLQGH[LHDKLHUDUFK\RIFOXVWHUVDWGLIIHUHQW OHYHOVLVXVHG(DFKFOXVWHUKDVDFRQVWUDLQWRQWKHPLQLPXPUHOHYDQFHVFRUHEHWZHHQWKHGRFXPHQWVLQWKDWFOXVWHU :KHQDQHZGRFXPHQWLVDGGHGWRWKHFOXVWHUWKHFRQVWUDLQWPD\JHWEURNHQ,QVXFKDFDVHDQHZFOXVWHUFHQWHUZLOO EHDGGHGWRWKHV\VWHP$IWHUWKDWDOOWKHFOXVWHUFHQWHUVZLOOEHUHVHOHFWHGDQGDOOWKHGRFXPHQWVZLOOEHUHDVVLJQHG 7KHPD[LPXPVL]HRIWKHFOXVWHULVDOVRIL[HGIRUHDFKOHYHO,IWKHVL]HRIDFOXVWHUH[FHHGVWKHPD[LPXPOLPLWWKDW FOXVWHUZLOOEHGLYLGHGLQWRPXOWLSOHVXEFOXVWHUV:KHQDVHDUFKLVEHLQJSHUIRUPHGRQO\WKRVHGRFXPHQWVLQWKH UHOHYDQWFOXVWHUVQHHGWREHVHDUFKHGWKHUHE\LWUHGXFHVWKHRYHUDOOVHDUFKWLPH 'XULQJWKHVHDUFKSKDVHWKHUHOHYDQFHVFRUHEHWZHHQWKHVHDUFKTXHU\DQGWKHFOXVWHUFHQWHUVRIWKHILUVWOHYHO LQGH[ LV FRPSXWHG 7KH FOXVWHU FHQWHU ZLWK PD[LPXP UHOHYDQFH VFRUH ZLOO EH VHOHFWHG DQG WKLV SURFHVV ZLOO EH LWHUDWLYHO\UHSHDWHGIRUWKHFKLOGUHQLQWKHQH[WOHYHOFOXVWHUVXQWLOWKHVPDOOHVWFOXVWHULQWKHORZHVWOHYHOLVIRXQG,I WKLVFOXVWHUGRHVQRWFRQWDLQWKHGHVLUHGGRFXPHQWWKHV\VWHPZLOOWUDFHEDFNWRWKHSDUHQWRIWKHVPDOOHVWFOXVWHU 7KLVSURFHVVLVUHSHDWHGXQWLOWKHGHVLUHGGRFXPHQWLVIRXQGRUWKHURRWFOXVWHULVUHDFKHG 3.1. System architecture 7KHV\VWHPDUFKLWHFWXUHLVFRPSRVHGRIPDLQO\IRXUHQWLWLHVDVVKRZQLQ)LJ7KH\DUHWKHGDWDRZQHUWKHGDWD XVHUWKHFORXGVHUYHUDQGWKHFORXGPDQDJHU7KHGDWDRZQHULVWKHPRGXOHUHVSRQVLEOHIRUFROOHFWLQJGRFXPHQWV SHUIRUPLQJ WKH HQFDSVXODWLRQ EXLOGLQJ WKH GRFXPHQW LQGH[ DQG RXWVRXUFLQJ WKH HQFU\SWHG GRFXPHQW WR WKH FORXG VHUYHU7KHGDWDXVHULVWKHFRQVXPHURIWKHGRFXPHQWVDQGWKH\PXVWKDYHQHFHVVDU\DXWKRUL]DWLRQEHIRUHDFFHVVLQJ WKLVGDWD7KHFORXGVHUYHULVWKHHQWLW\ZKLFKSURYLGHVDKXJHVWRUDJHVSDFHDQGQHFHVVDU\FRPSXWDWLRQDOUHVRXUFHV IRU WKH FLSKHUWH[W VHDUFK7KH FORXG PDQDJHU LV UHVSRQVLEOH IRU HQVXULQJ DFFHVV FRQWURO ,W EORFNV DOOXQDXWKRUL]HG UHTXHVWVIRUWKHGDWDE\FKHFNLQJWKHSULYDF\VHWWLQJVRIHDFKXVHU:KHQWKHFORXGVHUYHUUHFHLYHVDUHTXHVWIRUD GRFXPHQWWKLVUHTXHVWLVYHULILHGE\WKHFORXGPDQDJHU8SRQVXFFHVVIXOYHULILFDWLRQWKHFORXGVHUYHUUHWXUQVWKH UHTXLUHGGRFXPHQWV
313
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
)LJ6\VWHPDUFKLWHFWXUH
,PSOHPHQWDWLRQGHWDLOV 4.1. MRSE-HCI architecture 7KHSURSRVHGV\VWHPXVHV0XOWLNH\ZRUG5DQNHG6HDUFKRYHU(QFU\SWHGGDWDEDVHGRQ+LHUDUFKLFDO&OXVWHULQJ ,QGH[ 056(+&, VFKHPH LQ ZKLFK WKH YHFWRU VSDFH PRGHO LV DGRSWHG IURP WKH 0XOWLNH\ZRUG 5DQNHG 6HDUFK RYHU (QFU\SWHG GDWD 056( >@ DQG WKH LQGH[LQJ LV EDVHG RQ +LHUDUFKLFDO ,QGH[LQJ 6WUXFWXUH +&, >@ 7KH GHWDLOHGGHVFULSWLRQLVDVIROORZV(YHU\GRFXPHQWLVLQGH[HGE\DYHFWRUDQGHDFKGLPHQVLRQRIWKHYHFWRUUHIHUVWR D NH\ZRUG 7KH YDOXH RI HDFK GLPHQVLRQ LQGLFDWHV ZKHWKHU WKH NH\ZRUG DSSHDUV LQ WKH SDUWLFXODU GRFXPHQW 7KH TXHU\LVDOVRUHSUHVHQWHGLQDVLPLODUZD\DVDYHFWRU7KHOHQJWKVRIWKHGRFXPHQWYHFWRUVDUHQRUPDOL]HGDQGKHQFH WKH GLVWDQFH RI SRLQWV LQ WKH QGLPHQVLRQDO VSDFH UHIOHFWV WKH UHOHYDQFH RI FRUUHVSRQGLQJ GRFXPHQWV 'XULQJ WKH VHDUFKSKDVHWKHFORXGVHUYHUFRPSRQHQWFRPSXWHVWKHUHOHYDQFHVFRUHEHWZHHQWKHTXHU\YHFWRUDQGWKHGRFXPHQWV YHFWRU E\ FRPSXWLQJ WKHLU LQQHU SURGXFW :KHQ WKH GRFXPHQWV DUH VWRUHG LQ WKH FORXG LQ DQ HQFU\SWHG IRUP WKH VHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWVZLOOEHORVW+RZHYHUWKHSURSRVHGV\VWHPXVHVDFOXVWHULQJPHWKRG,Q WKHQGLPHQVLRQDOVSDFHWKHSRLQWVRIKLJKO\UHOHYDQWGRFXPHQWVDUHYHU\FORVHWRHDFKRWKHUWKHUHE\WKHVHPDQWLF UHODWLRQVKLSEHWZHHQWKHGRFXPHQWVLVSUHVHUYHG :KHQWKHYROXPHRIGDWDLQWKHFORXGH[SHULHQFHVDGUDPDWLFJURZWKWKHWUDGLWLRQDOVHDUFK DSSURDFKHVZLOOEH YHU\LQHIILFLHQWDQGKDVDQH[SRQHQWLDOJURZWK7RLPSURYHWKHVHDUFKHIILFLHQF\DKLHUDUFKLFDOFOXVWHULQJPHWKRGLV XVHG7KHKLHUDUFKLFDODSSURDFKFOXVWHUVWKHGRFXPHQWVEDVHGRQWKHUHOHYDQFHVFRUHDWGLIIHUHQWOHYHOV:KHQWKH VL]H RI WKH FOXVWHU UHDFKHV WKH PD[LPXP FOXVWHU VL]H WKUHVKROG WKH V\VWHP SDUWLWLRQV WKH FOXVWHUV LQWR VXEFOXVWHUV XQWLO WKH FULWHULRQ LV VDWLVILHG :KHQ WKH GRFXPHQWV DUH EHLQJ XSORDGHG WKH GDWD RZQHU DOVR EXLOGV DQ HQFU\SWHG LQGH[$V\PPHWULFNH\HQFU\SWLRQDOJRULWKPLVXVHGDQGWKHGRFXPHQWVDUHHQFU\SWHGXVLQJVRPHUDQGRPQXPEHUV DQGDVHFUHWNH\:KHQWKHGDWDXVHUQHHGVDSDUWLFXODUGRFXPHQWDTXHU\LVVXEPLWWHGWRWKHFORXGVHUYHU7KHFORXG VHUYHUZLOOUHWXUQWKHWDUJHWGRFXPHQWWRWKHGDWDXVHU 7KHIXQFWLRQVRIWKHGLIIHUHQWFRPSRQHQWVDUHGHVFULEHGEHORZ .H\JHQ7KLVIXQFWLRQZLOOJHQHUDWHWKHVHFUHWNH\݇ݏXVHGWRHQFU\SWWKHLQGH[DQGWKHGRFXPHQWV)RUWKLVD ͳሻELWYHFWRULQZKLFKHDFKHOHPHQWLVDQLQWHJHURUDQGWZRLQYHUWLEOH ͳሻ ൈ ሺ ͳሻPDWULFHV M1DQGM2ZKRVHHOHPHQWVDUHUDQGRPLQWHJHUVDUHJHQHUDWHG ,QGH[ 7KLVSKDVHJHQHUDWHV WKH HQFU\SWHG LQGH[E\ XVLQJ WKH DERYH JHQHUDWHG VHFUHWNH\ 7KH FOXVWHULQJ SURFHVV
314
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
DOVRWDNHVSODFHLQWKLVSKDVH7KHLQGH[DOJRULWKPLVDVIROORZV $WRNHQL]HUDQGDSDUVHUWRROVDUHXVHGWRH[WUDFWDOOWKHNH\ZRUGVSUHVHQWLQWKHGRFXPHQW 7KHGRFXPHQWVDUHWUDQVIRUPHGLQWRDFROOHFWLRQRI'RFXPHQW9HFWRUV'9 $ 4XDOLW\ +LHUDUFKLFDO &OXVWHULQJ 4+& PHWKRG LV XVHG WR JHQHUDWH WKH LQIRUPDWLRQ DERXW 'RFXPHQWV &ODVVLILFDWLRQ'& DQGWKHFROOHFWLRQRI&OXVWHU&HQWHUV9HFWRUV&&9)ሼܿଵ ܿ ڮ ሽ 7KH GDWD RZQHU SHUIRUPV WKH GLPHQVLRQH[SDQGLQJ DQG YHFWRU VSOLWWLQJ SURFHGXUH RQ HYHU\ GRFXPHQW YHFWRU D 'XULQJGLPHQVLRQH[SDQGLQJSURFHGXUHHDFKYHFWRULQ&&9LVH[WHQGHGWRሺ ͳሻELWORQJ YHFWRU ZKHUH WKH YDOXH LQ ሺͲ ሻGLPHQVLRQ LV DQ LQWHJHU QXPEHU JHQHUDWHG UDQGRPO\ DQGWKHODVWGLPHQVLRQLVVHWWR E 'XULQJWKHYHFWRUVSOLWWLQJSURFHGXUHHYHU\H[WHQGHGGRFXPHQWYHFWRULVVSOLWLQWRWZRሺ ͳሻELWORQJ YHFWRUV ᇱ DQG ᇱᇱ XVLQJ WKH DERYH JHQHUDWHGሺ ͳሻELW YHFWRUDV D VSOLWWLQJ LQGLFDWRU (QFU\SWLRQ7KHSODLQGRFXPHQWVHW'LVHQFU\SWHGXVLQJDQ\VHFXUHV\PPHWULFHQFU\SWLRQDOJRULWKPVXFKDV$(6 7KHHQFU\SWHGGRFXPHQWLVWKHQRXWVRXUFHGWRWKHFORXG 7UDSGRRU:KHQDXVHUVXEPLWVDTXHU\WKHFORXGPDQDJHUZLOODQDO\VHWKHTXHU\DQGYHULI\WKDWWKHUHTXHVWFRPH IURP DQ DXWKHQWLFDWHG XVHU7KHNH\ZRUGV LQ WKHTXHU\ DUH DQDO\]HG ZLWK WKHKHOSRIGLFWLRQDU\ ': DQG DTXHU\ YHFWRU49LVJHQHUDWHGZKLFKLVWKHQH[WHQGHGWRDሺ ͳሻELWYHFWRU 6HDUFK:KHQWKHFORXGVHUYHUUHFHLYHVWKHTXHU\YHFWRUWKHUHOHYDQFHVFRUHEHWZHHQWKHTXHU\YHFWRUDQGLQGH[ YHFWRURIFOXVWHUVDUHFRPSXWHGLQDKLHUDUFKLFDOPDQQHU,WILQDOO\FKRVHVWKHFOXVWHUZLWKPD[LPXPUHOHYDQFHVFRUH DVWKHWDUJHWFOXVWHUDQGVHDUFKIRUWKHUHTXLUHGGRFXPHQW,IWKHGRFXPHQWLVQRWIRXQGLWEDFNWUDFNVDQGFKRRVHD GLIIHUHQWFOXVWHUZLWKQH[WKLJKHVWVFRUH7KLVSURFHVVLVUHSHDWHGXQWLOWKHWDUJHWGRFXPHQWLVIRXQG 'HFU\SWLRQ7KLVFRPSRQHQWLVXVHGE\WKHGDWDXVHUWRGHFU\SWWKHUHWXUQHGGRFXPHQW7KHVHFUHWNH\LVH[FKDQJHG WRWKHXVHUWKURXJKDVHFXUHPHFKDQLVP 4.2. Relevance measure ,QWKHSURSRVHGV\VWHPWKHFRQFHSWRIFRRUGLQDWHPDWFKLQJLVXVHGDVDUHOHYDQFHPHDVXUH7KHUHOHYDQFHVFRUH EHWZHHQGRFXPHQWdiDQGTXHU\ݍ௪ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ ା௩ାଵ
ܴௗ ൌ ሺݍ௪ǡ௧ ൈ ݀ǡ௧ ሻ ௧ୀଵ
7KHUHOHYDQFHVFRUHEHWZHHQTXHU\ݍ௪ DQGFOXVWHUFHQWHU݈ܿǡ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ ା௩ାଵ
ܴ ൌ ሺݍ௪ǡ௧ ൈ ݈ܿǡǡ௧ ሻ ௧ୀଵ
7KHUHOHYDQFHVFRUHEHWZHHQGRFXPHQW݀ DQG݀ LVGHWHUPLQHGDVGHVFULEHGLQ(TXDWLRQ ା௩ାଵ
ܴௗௗ ൌ ሺ݀ǡ௧ ൈ ݀ǡ௧ ሻ ௧ୀଵ
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
315
4.3. Quality Hierarchical Clustering Algorithm 6RPH RI WKH PRVW ZLGHO\ XVHG DQG SRSXODU FOXVWHULQJ DOJRULWKPV DUH K-means DQG K-medoids ,Q WKHVH DOJRULWKPVWKHYDOXHRIkLVIL[HGHDUOLHU+RZHYHULQDELJGDWDVFHQDULRLWLVLPSRVVLEOHWRSUHGLFWWKHYDOXHRIN HDUO\ 7KH FOXVWHUV DUH WR EH JHQHUDWHG G\QDPLFDOO\ +HQFH D dynamic K-means algorithm LV XVHG 7R NHHS WKH FOXVWHUVGHQVHDQGFRPSDFWDPLQLPXPUHOHYDQFHWKUHVKROGYDOXHLVPDLQWDLQHG:KLOHSHUIRUPLQJWKHFOXVWHULQJ SURFHVVWKHUHOHYDQFHVFRUHEHWZHHQHDFKGRFXPHQWDQGLWVFOXVWHUFHQWHULVFRPSXWHGDQGLIWKLVYDOXHLVOHVVWKDQ WKH PLQLPXP WKUHVKROG YDOXH D QHZ FOXVWHU LV DGGHG DQG DOO WKH GRFXPHQWV DUH UHDVVLJQHG DFFRUGLQJO\ 7KLV SURFHGXUHLVH[HFXWHGLWHUDWLYHO\XQWLODVWDEOHYDOXHRIkLVUHDFKHG 4.4. Search Algorithm 7RVHDUFKIRUDSDUWLFXODUGRFXPHQWWKHFORXGVHUYHUILUVWQHHGVWRILQGWKHFOXVWHUWKDWPRVWPDWFKWKHTXHU\7KH FORXGVHUYHUXVHVWKHFOXVWHULQGH[ܫ DQGDQLWHUDWLYHSURFHGXUHDVGHVFULEHGEHORZWRILQGWKHWRSPDWFKHGFOXVWHU 7KHFORXGVHUYHUILUVWFRPSXWHVWKHUHOHYDQFHVFRUHYDOXHEHWZHHQTXHU\ܶ௪ DQGHQFU\SWHGYHFWRUVRIWKH ILUVWOHYHOFOXVWHUFHQWHUVLQFOXVWHULQGH[ܫ DVGHVFULEHGLQ(TXDWLRQ,WWKHQFKRRVHVWKHiWKFOXVWHUFHQWHU ܫǡଵǡ ZLWKWKHKLJKHVWVFRUH )RUHDFKFKLOGFOXVWHUFHQWHUVRIWKHDERYHVHOHFWHGFOXVWHUFHQWHUWKHFORXGVHUYHUFRPSXWHVWKHUHOHYDQFH VFRUH EHWZHHQܶ௪ DQG HYHU\ HQFU\SWHG YHFWRUV RI FKLOG FOXVWHU FHQWHUV DQG ILQDOO\ JHWV WKH FOXVWHU FHQWHU ܫǡଶǡ ZLWKWKHWRSVFRUH 7KHDERYHSURFHGXUHLVLWHUDWHGXQWLOWKHXOWLPDWHFOXVWHUFHQWHU ୡǡଵǡ୧ LQODVWOHYHOOLVDFKLHYHG 5HVXOWVDQGDQDO\VLV 5.1. Search Efficiency 7KHHIILFLHQF\RIWKHV\VWHPZDVWHVWHGZLWKDWZROHYHOFOXVWHULQJPRGHO7KHQXPEHURIRSHUDWLRQQHHGHGIRU WKHHQWLUHVHDUFKSURFHVVFDQEHFRPSXWHGDVGHVFULEHGLQ(TXDWLRQ7RLQFUHDVHWKHVHDUFKHIILFLHQF\WKHV\VWHP XVHVDVWDWLFGLFWLRQDU\RINH\ZRUGVZKLFKGRHVQRWHIIHFWLYHO\FRQWULEXWHWRWKHVHDUFKSURFHVV7KHWHUPVOLNHµIRU¶ µDQG¶ HWF LQ WKH VHDUFK TXHU\ ZLOO EH UHPRYHG DQG D PRGLILHG TXHU\ YHFWRU ZLOO EH FRQVWUXFWHG 7KH VXEVHTXHQW FRPSDULVRQVDUHPDGHRQO\ZLWKWKHPRGLILHGTXHU\YHFWRU/HWxGHQRWHWKHVL]HRIWKHVWDWLFGLFWLRQDU\wGHQRWH WKHQXPEHURITXHU\NH\ZRUGVuGHQRWHWKHQXPEHURINH\ZRUGVLQWKHPRGLILHGTXHU\YHFWRUnGHQRWHWKHWRWDO QXPEHURIGRFXPHQWVLQWKHGRFXPHQWVFROOHFWLRQkGHQRWHWKHQXPEHURIFDWHJRULHVLQWKHILUVWOHYHOFOXVWHUDQGt GHQRWHWKHDYHUDJHQXPEHURIGRFXPHQWVLQWKHVXEVHTXHQWFOXVWHU
ܱݏ݊݅ݐܽݎ݁ሺܵ݁ܽݏݏ݁ܿݎ݄ܿݎሻ ൌ ݔ כ ݓ ሺ ݓെ ݑሻ݇ ሺ ݓെ ݑെ ͳሻݐ
7KHQXPEHURIRSHUDWLRQVUHTXLUHGE\DV\VWHPZLWKRXWDQ\FOXVWHULQJWHFKQLTXHLVGHVFULEHGLQ(TXDWLRQ ܱݏ݊݅ݐܽݎ݁ሺ݁݉݁ݐݏݕݏ݃݊݅ݐݏ݅ݔሻ ൌ ݔ כ ݓ ሺ ݓെ ݑሻ݊
'XULQJ WKH VHDUFK VWHS WKH H[LVWLQJ V\VWHP FRPSDUHV WKH TXHU\ YHFWRU ZLWK WKH HQWLUH GRFXPHQWV FROOHFWLRQ ZKHUHDV WKH SURSRVHG V\VWHP FRPSDUHV LW RQO\ ZLWK WKH UHOHYDQW FOXVWHU OHDGLQJ WR VLJQLILFDQW UHGXFWLRQ LQ VHDUFK WLPH 5.2. Performance analysis 7RWHVWWKHSHUIRUPDQFHRIWKHSURSRVHGV\VWHPDQH[SHULPHQWDOVHWXSZDVEXLOWDVIROORZV$QDSSOLFDWLRQ
316
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
VLPXODWLQJWKHDFWLYLWLHVRIDXQLYHUVLW\ZDVFUHDWHG7KHFORXGVWRUDJHSODWIRUPIRUWKHV\VWHPZDVSURYLGHGE\WKH *RRJOHSXEOLFFORXG7KHGDWDRZQHUVRIWKHV\VWHPDUH 7KHXQLYHUVLW\ZKLFKRZQVWKHPDUNOLVWVDQGFHUWLILFDWHVRIDOOWKHSDVVHGRXWDQGSUHVHQWO\VWXG\LQJ VWXGHQWV 7KHFROOHJHZKLFKXSORDGVWKHVHVVLRQDOPDUNVDQGRWKHUVWXGHQWVSHFLILFGRFXPHQWVRIDOOWKHVWXGHQWV
Searchtime
7KHGDWDVHWIRUWKHSHUIRUPDQFHDQDO\VLVZDVEXLOWIURPWKHDERYHPHQWLRQHGW\SHVRIGRFXPHQWV7KHV\VWHP ZDVWHVWHGZLWKDOLQHDULQFUHDVHLQWKHQXPEHURIGRFXPHQWVDQGWKHFRUUHVSRQGLQJVHDUFKWLPHVZHUHHVWLPDWHG,W LVHYLGHQWIURP)LJWKDWWKHSURSRVHGV\VWHPRXWSHUIRUPVWKHH[LVWLQJV\VWHPZLWKRXWFOXVWHULQJ7KHV\VWHPZDV DOVRWHVWHGZLWKDQH[SRQHQWLDOJURZWKLQWKHQXPEHURIGRFXPHQWV)LJVKRZVWKDWWKHSURSRVHGV\VWHPZLWK FOXVWHULQJKDVDOLQHDUJURZWKLQVHDUFKWLPHZKLOHWKHV\VWHPZLWKRXWFOXVWHULQJKDVDQH[SRQHQWLDOJURZWKLQVHDUFK WLPH 12000 10000 8000 6000 4000 2000 0
without clustering
10 20 30 40 50
with hierarchica lclustering
Numberofdocuments(x100) )LJ&RPSDULVRQRIVHDUFKWLPHZLWKDOLQHDUJURZWKLQGRFXPHQWVFROOHFWLRQ
Searchtime
20000 15000 10000
without clustering
5000
with clustering
0 148 403 109629808103 Numberofdocuments
)LJ&RPSDULVRQRIVHDUFKWLPHZLWKDQH[SRQHQWLDOJURZWKLQGRFXPHQWVFROOHFWLRQ
5.3. Security analysis $ GHGLFDWHG PRGXOH FDOOHG FORXG PDQDJHU LV DGGHG WR WKH SURSRVHG V\VWHP WR YHULI\ WKH DXWKHQWLFLW\ RI WKH DUULYLQJ UHTXHVWV 7R HQVXUH WKH FRQILGHQWLDOLW\ DQG SULYDF\ RI WKH GRFXPHQWV VWRUHG LQ WKH FORXG VHUYHU DOO WKH GRFXPHQWV DUH HQFU\SWHG XVLQJ D V\PPHWULF HQFU\SWLRQ DOJRULWKP EHIRUH XSORDGLQJ LW WR WKH FORXG ,Q DGGLWLRQ WR WKDWWKHFORXGVWRUDJHSURYLGHUDOVRSHUIRUPVDWZROHYHOHQFU\SWLRQRQWKHGRFXPHQWVDQGUHWXUQVDSXEOLFNH\WR WKHFORXGPDQDJHU$OOWKHNH\VDUHPDQDJHGE\WKHFORXGPDQDJHUDQGRQO\SHRSOHZLWKVXIILFLHQWDFFHVVULJKWVFDQ
V. Tresa Mary George et al. / Procedia Technology 25 (2016) 310 – 317
317
GHFU\SW WKH GRFXPHQW &RQVHTXHQWO\ WKH V\VWHP HQVXUHV WKDW HYHQ LI DQ LQWUXGHU DFFHVVHV WKH GRFXPHQW GLUHFWO\ IURPWKHFORXGVHUYHUWKH\FDQQRWJHWWKHSODLQWH[WRIWKHGRFXPHQWV &RQFOXVLRQDQGIXWXUHZRUN 7KHSUREOHPRIVHDUFKLQJDQGVHFXUHO\DFFHVVLQJWKHHQFU\SWHGGDWDLQWKHFORXGLVDQDO\]HG,WLVXQGHUVWRRGWKDW PDLQWDLQLQJWKHVHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWVUHGXFHWKHVHDUFKWLPHIRUDGRFXPHQW7KHSURSRVHG ZRUN LV EDVHG RQ PXOWL NH\ZRUG UDQNHG VHDUFK RYHU HQFU\SWHG GDWD 7KH XVH RI KLHUDUFKLFDO FOXVWHULQJ PHWKRG WR FOXVWHUWKHGRFXPHQWVSUHVHUYHVWKHVHPDQWLFUHODWLRQVKLSEHWZHHQWKHGRFXPHQWV7KHH[SHULPHQWDOUHVXOWVSURYH WKDWWKHSURSRVHGV\VWHPKDVDOLQHDUJURZWKLQWLPHFRPSOH[LW\ZKHQWKHVL]HRIWKHGRFXPHQWVFROOHFWLRQLQFUHDVHV H[SRQHQWLDOO\,WDOVRLPSOHPHQWVDGHGLFDWHGPRGXOHQDPHGFORXGPDQJHUWRHQVXUHWKHSULYDF\RIFORXGGDWDE\ JUDQWLQJRQO\OLPLWHGDFFHVVWRWKHGRFXPHQWVFROOHFWLRQWRGLIIHUHQWFODVVHVRIXVHUV$VIXWXUHZRUNPRUHVHFXUH DOJRULWKPV FDQ EH GHYHORSHG IRU LPSURYLQJ WKH SULYDF\ RI WKH XSORDGHG GRFXPHQWV 0RUH VHFXUH DFFHVV FRQWURO VFKHPHV VXFK DV '\QDPLF ,QIRUPDWLRQ )ORZ 7UDFNLQJ ',)7 WHFKQLTXHV >@ ZLWK FDSDELOLWLHV WR UHFRJQL]H WKH DGYDQFHGYXOQHUDELOLWLHVFDQDOVRERRVWXSWKHRYHUDOOSHUIRUPDQFHRIWKHV\VWHP 5HIHUHQFHV >@ ;LDQ & /X < + /L = 'HFHPEHU $GDSWLYH FRPSXWDWLRQ RIIORDGLQJ IRU HQHUJ\ FRQVHUYDWLRQ RQ EDWWHU\SRZHUHG V\VWHPV ,Q3DUDOOHODQG'LVWULEXWHG6\VWHPV,QWHUQDWLRQDO&RQIHUHQFHRQ9ROSS ,((( >@/L+'DL<7LDQ/ @6XQ::DQJ%&DR1/L0/RX:+RX<7 /L+0D\ 3ULYDF\SUHVHUYLQJPXOWLNH\ZRUGWH[WVHDUFKLQWKHFORXG VXSSRUWLQJ VLPLODULW\EDVHG UDQNLQJ ,Q3URFHHGLQJV RI WKH WK $&0 6,*6$& V\PSRVLXP RQ ,QIRUPDWLRQ FRPSXWHU DQG FRPPXQLFDWLRQV VHFXULW\SS $&0 >@:DQJ% @6HEDVWLDQ/5%DEX6 .L]KDNNHWKRWWDP--)HEUXDU\ &KDOOHQJHVZLWKELJGDWDPLQLQJ$UHYLHZ,Q6RIW&RPSXWLQJDQG 1HWZRUNV6HFXULW\,&616 ,QWHUQDWLRQDO&RQIHUHQFHRQSS ,((( >@6RQJ';:DJQHU' 3HUULJ$ 3UDFWLFDOWHFKQLTXHVIRUVHDUFKHVRQHQFU\SWHGGDWD,Q6HFXULW\DQG3ULYDF\6 3 3URFHHGLQJV,(((6\PSRVLXPRQSS ,((( >@&DVK'-DHJHU--DUHFNL6-XWOD&.UDZF]\N+5RVX0& 6WHLQHU02FWREHU '\QDPLFVHDUFKDEOHHQFU\SWLRQLQYHU\ ODUJHGDWDEDVHV'DWDVWUXFWXUHVDQGLPSOHPHQWDWLRQ,Q1HWZRUNDQG'LVWULEXWHG6\VWHP6HFXULW\6\PSRVLXP1'66¶ >@&DR1:DQJ&/L05HQ. /RX: 3ULYDF\SUHVHUYLQJPXOWLNH\ZRUGUDQNHGVHDUFKRYHUHQFU\SWHGFORXGGDWD3DUDOOHO DQG'LVWULEXWHG6\VWHPV,(((7UDQVDFWLRQVRQ >@6XQ::DQJ%&DR1/L0/RX:+RX<7 /L+ 9HULILDEOHSULYDF\SUHVHUYLQJPXOWLNH\ZRUGWH[WVHDUFKLQWKH FORXGVXSSRUWLQJVLPLODULW\EDVHGUDQNLQJ3DUDOOHODQG'LVWULEXWHG6\VWHPV,(((7UDQVDFWLRQVRQ >@0RDWD]7 6KLNID$0D\ %RROHDQV\PPHWULFVHDUFKDEOHHQFU\SWLRQ,Q3URFHHGLQJVRIWKHWK$&06,*6$&V\PSRVLXPRQ ,QIRUPDWLRQFRPSXWHUDQGFRPPXQLFDWLRQVVHFXULW\SS $&0 >@-/L4:DQJ&:DQJ1&DR.5HQDQG:/RX )X]]\.H\ZRUG6HDUFKRYHU(QFU\SWHG'DWDLQ&ORXG&RPSXWLQJ3URFRI ,(((,1)2&20ெ0LQL&RQIHUHQFH >@&KHQ&=KX;6KHQ3+X-*XR67DUL= =RPD\D$$Q(IILFLHQW3ULYDF\3UHVHUYLQJ5DQNHG.H\ZRUG6HDUFK0HWKRG >@'DOWRQ0.R]\UDNLV& =HOGRYLFK1$XJXVW 1HPHVLV3UHYHQWLQJ$XWKHQWLFDWLRQ $FFHVV&RQWURO9XOQHUDELOLWLHVLQ:HE $SSOLFDWLRQV,Q86(1,;6HFXULW\6\PSRVLXPSS