基於英文維基百科之文字蘊涵- 政大學術集成

文章推薦指數: 80 %
投票人數:10人

Text Entailment based on English Wikipedia ... Since Recognizing Textual Entailment (RTE) began to hold the contest of English corpus in 2005, more and more ... Togglenavigation 主頁 Post-Print 關於學術集成 English 正體中文 理學院學位論文Item 研究者 學系 學術產出 文章檢視/開啟 (355) 書目匯出 EndnoteRIS格式資料匯出 Bibtex格式資料匯出 GoogleScholarTM 搜尋 政大圖書館 學術資源探索系統 引文資訊 無doi欄位資料顯示引文資訊 社群sharing TAIR相關學術產出 >SimpleRecord >FullRecord 欄位 名稱 題名: 基於英文維基百科之文字蘊涵TextEntailmentbasedonEnglishWikipedia作者: 林柏誠Lin,PoCheng貢獻者: 劉昭麟Liu,ChaoLin林柏誠Lin,PoCheng關鍵詞: 自然語言處理NatureLanguageProcessing日期: 2014上傳時間: 2015-01-0511:22:29(UTC+8)描述.abstract: 近年來文字蘊涵研究在自然語言處理中逐漸受到重視,從2005年RecognizingTextualEntailment(RTE)舉辦英文語料相關評比開始,越來越多人開始投入文字蘊涵的相關研究,而NIITestbedsandCommunityforinformationaccessResearch(NTCIR)也從第九屆開始舉辦RecognizingInferenceinText(RITE)的相關評比,除了英文語料以外,亦包含繁體中文、簡體中文以及日文等等的語料,開始引起亞洲地區相關研究者的關注參加。

本研究以文字蘊涵技術為基底,透過維基百科,判斷任一論述句其含義是與事實相符,或與事實違背,我們依據論述句的語文資訊,在維基百科中找出與論述句相關的文章,並從中尋找有無相關的句子,支持或反對該論述句的論點,藉以判斷其結果。

我們將本系統大致分成了三個程序,第一步是先從維基百科中擷取與論述句的相關文章,接著我們從相關文章中擷取與論述句有關聯的相關句,最後則是從找出的相關句中,判別那些相關句是支持還是反對該論述句,並透過LinearlyWeightedFunctions(LWFs)藉以判別每個相關特徵的權重和各項推論的門檻值,期許透過上述的方法以及各項有效的語言特徵,能夠推論出論述句的真實與否。

Inrecentyears,theresearchoftextualentailmentisgettingmoreimportantinNaturalLanguageProcessing.SinceRecognizingTextualEntailment(RTE)begantoholdthecontestofEnglishcorpusin2005,moreandmorepeoplestarttoengageintherelatedresearch.Besides,NTCIRninthhasheldtherelatedtaskRecognizingInferenceinText(RITE)inChinese,Japanese,andotherslanguagescorpus.ThereforeithasgraduallyattractedAsianpeopletofocusonthisarea.Inthispaper,webasedontheskilloftextualentailment.Tryingtovalidateanyofinputsentenceswhicharetruthoragainsttothefact.Accordingtothelanguageinformationininputsentences,weextracttherelatedarticlesonWikipedia.Then,weextracttherelatedsentencesfromthosearticlesandrecognizingthemwhicharesupportoragainsttheinputsentence.Hence,wecanusethatinformationtovalidatetheinputsentences.Oursystemisroughlydepartedintothreeparts.FirstisextractrelatedarticlesfromWikipedia,secondisextractrelatedsentencesfromrelatedarticles.Thelastisvalidatethosesentenceswhicharesupportoragainsttheinputsentence.WealsoadoptLinearWeightFunctions(LWFs)toadjusteveryfeaturesparametersandentailment’sthreshold.Bytheinformationandusefullanguagefeaturesabove,wehopeitcanvalidatewhetherinputsentencesistruthornot.參考文獻: [1]Adams,“TextualEntailmentThroughExtendedLexicalOverlap,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.128-133,2006.[2]BLEU,http://en.wikipedia.org/wiki/BLEU[3]A.BudanitskyandG.Hirst,SemanticdistanceinWordNet:Anexperimental,application-orientedevaluationoffivemeasures,WorkshoponWordNetandOtherLexicalResources,SecondMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,Pittsburgh,Pennsylvania,USA,2001.[4]S.CohenandN.Or,"Ageneralalgorithmforsubtreesimilarity-search,"DataEngineering(ICDE),IEEE30thInternationalConference.pp.928-939,2014.[5]Gridsearch,http://scikit-learn.org/stable/modules/grid_search.html[6]S.HattoriandS.Sato,“TeamSKL’sStrategyandExperienceinRITE2,”Proceedingsofthe10thNTCIRConference,pp.435-442,2013.[7]A.Hickl,J.Bensley,J.Williams,K.Roberts,B.Rink,andY.Shi,“RecognizingTextualEntailmentwithLCC’sGROUNDHOGSystem,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.80-85,2006.[8]Heuristicfunction,http://en.wikipedia.org/wiki/Heuristic_function63[9]W.-J.HuangandC.-L.Liu,“NCCU-MIGatNTCIR-10:UsingLexical,Syntactic,andSemanticFeaturesfortheRITETasks,”Proceedingsofthe10thNTCIRConference,pp.430-434,2013.[10]G.Li,X.Liu,J.Feng,andL.Zhou,“EfficientSimilaritySearchforTree-StructuredData,AuthorAffiliations:DepartmentofComputerScienceandTechnology,”Proceedingsofthe20thScientificandStatisticalDatabaseManagementConference,pp.131-149,2008.[11]LinearlyWeightedFunctions,http://en.wikipedia.org/wiki/Weight_function[12]LongestCommonStrings,http://en.wikipedia.org/wiki/Longest_common_substring_problem[13]Lucene,http://lucene.apache.org/core/[14]NamedEntityRecognition,http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html[15]NTCIRRITE-VAL,http://research.nii.ac.jp/ntcir/index-en.html[16]RTE,http://research.microsoft.com/en-us/groups/nlp/rte.aspx[17]S.RasoulandD.Landgrebe,“ASurveyofDecisionTreeClassifierMethodology,”IEEETransactionsonSystems,Man,andCybernetics,Vol.21,No.3,pp660-674,May1991.[18]StanfordCorenlp,http://nlp.stanford.edu/software/corenlp.shtml[19]StanfordNamedEntityRecognizer,http://www-nlp.stanford.edu/software/CRF-NER.shtml64[20]StanfordParser,http://nlp.stanford.edu/software/lex-parser.shtml[21]StanfordTypedDependencies,http://nlp.stanford.edu/software/stanford-dependencies.shtml[22]SVM,http://en.wikipedia.org/wiki/Support_vector_machine[23]TextualEntailment,http://en.wikipedia.org/wiki/Textual_entailment[24]Totalcommander,http://www.ghisler.com/[25]Wikipedia,http://en.wikipedia.org/wiki/Main_Page[26]WordNet,http://wordnet.princeton.edu/[27]S.-H.Wu,S.-S.Yang,L.-P.Chen,H.-S.Chiu,andR.-D.Yang,“CYUTChineseTextualEntailmentRecognitionSystemforNTCIR-10RITE-2.”Proceedingsofthe10thNTCIRConference,pp.443-448,2013.[28]S.-H.Wu,W.-C.Huang,L.-P.Chen,andT.Ku,“Binary-classandMulti-classChineseTexturalEntailmentSystemDescriptioninNTCIR-9RITE,”Proceedingsofthe9thNTCIRConference,pp.422-426,2011.[29]Y.Y.Zhang,J.Xu,C.-L.Liu,X.-L.Wang,R.-F.Xu,Q.-C.Chen,X.Wang,Y.-S.Hou,andB.Tang,“ICRC_HITSZatRITE:LeveragingMultipleClassifiersVotingforTextualEntailmentRecognition,”Proceedingsofthe9thNTCIRConference,pp.325-329,2011.描述: 碩士國立政治大學資訊科學學系101753028103資料來源: http://thesis.lib.nccu.edu.tw/record/#G1017530281資料類型: thesis DC欄位 名稱 語言 dc.contributor.advisor劉昭麟zh_TWdc.contributor.advisorLiu,ChaoLinen_USdc.contributor.author林柏誠zh_TWdc.contributor.authorLin,PoChengen_USdc.creator(作者)林柏誠zh_TWdc.creator(作者)Lin,PoChengen_USdc.date(日期)2014en_USdc.date.accessioned2015-01-0511:22:29(UTC+8)-dc.date.available2015-01-0511:22:29(UTC+8)-dc.date.issued(上傳時間)2015-01-0511:22:29(UTC+8)-dc.identifier(其他Identifiers)G1017530281en_USdc.identifier.uri(URI)http://nccur.lib.nccu.edu.tw/handle/140.119/72556-dc.description(描述)碩士zh_TWdc.description(描述)國立政治大學zh_TWdc.description(描述)資訊科學學系zh_TWdc.description(描述)101753028zh_TWdc.description(描述)103zh_TWdc.description.abstract(摘要)近年來文字蘊涵研究在自然語言處理中逐漸受到重視,從2005年RecognizingTextualEntailment(RTE)舉辦英文語料相關評比開始,越來越多人開始投入文字蘊涵的相關研究,而NIITestbedsandCommunityforinformationaccessResearch(NTCIR)也從第九屆開始舉辦RecognizingInferenceinText(RITE)的相關評比,除了英文語料以外,亦包含繁體中文、簡體中文以及日文等等的語料,開始引起亞洲地區相關研究者的關注參加。

本研究以文字蘊涵技術為基底,透過維基百科,判斷任一論述句其含義是與事實相符,或與事實違背,我們依據論述句的語文資訊,在維基百科中找出與論述句相關的文章,並從中尋找有無相關的句子,支持或反對該論述句的論點,藉以判斷其結果。

我們將本系統大致分成了三個程序,第一步是先從維基百科中擷取與論述句的相關文章,接著我們從相關文章中擷取與論述句有關聯的相關句,最後則是從找出的相關句中,判別那些相關句是支持還是反對該論述句,並透過LinearlyWeightedFunctions(LWFs)藉以判別每個相關特徵的權重和各項推論的門檻值,期許透過上述的方法以及各項有效的語言特徵,能夠推論出論述句的真實與否。

zh_TWdc.description.abstract(摘要)Inrecentyears,theresearchoftextualentailmentisgettingmoreimportantinNaturalLanguageProcessing.SinceRecognizingTextualEntailment(RTE)begantoholdthecontestofEnglishcorpusin2005,moreandmorepeoplestarttoengageintherelatedresearch.Besides,NTCIRninthhasheldtherelatedtaskRecognizingInferenceinText(RITE)inChinese,Japanese,andotherslanguagescorpus.ThereforeithasgraduallyattractedAsianpeopletofocusonthisarea.Inthispaper,webasedontheskilloftextualentailment.Tryingtovalidateanyofinputsentenceswhicharetruthoragainsttothefact.Accordingtothelanguageinformationininputsentences,weextracttherelatedarticlesonWikipedia.Then,weextracttherelatedsentencesfromthosearticlesandrecognizingthemwhicharesupportoragainsttheinputsentence.Hence,wecanusethatinformationtovalidatetheinputsentences.Oursystemisroughlydepartedintothreeparts.FirstisextractrelatedarticlesfromWikipedia,secondisextractrelatedsentencesfromrelatedarticles.Thelastisvalidatethosesentenceswhicharesupportoragainsttheinputsentence.WealsoadoptLinearWeightFunctions(LWFs)toadjusteveryfeaturesparametersandentailment’sthreshold.Bytheinformationandusefullanguagefeaturesabove,wehopeitcanvalidatewhetherinputsentencesistruthornot.en_USdc.description.tableofcontents第1章緒論............................................................................................................................11.1研究背景與動機.........................................................................................................11.2方法概述.....................................................................................................................21.3主要貢獻.....................................................................................................................21.4論文架構.....................................................................................................................3第2章文獻回顧....................................................................................................................42.1文字蘊涵相關研究.....................................................................................................42.2RTE與RITE評比相關研究.......................................................................................5第3章語料及辭典介紹........................................................................................................73.1語料集.........................................................................................................................73.2英文維基百科.............................................................................................................93.3WordNet........................................................................................................................9第4章研究方法..................................................................................................................104.1擷取相關文章及相關句...........................................................................................104.1.1擷取相關文章................................................................................................114.1.2擷取相關文章................................................................................................164.2相關度計算...............................................................................................................184.2.1相關句權重....................................................................................................194.2.2文章權重........................................................................................................254.2.3相關句綜合權重............................................................................................26iv4.3推論驗證系統...........................................................................................................264.3.1語文特徵介紹................................................................................................274.3.2LWFs公式與參數訓練方法..........................................................................39第5章系統效能評估..........................................................................................................425.1LinearlyWeightedFunctions參數及門檻值介紹.....................................................425.2實驗結果與討論........................................................................................................44第6章利用資訊檢索方法採取小規模實驗設計..............................................................516.1方法概述...................................................................................................................516.2語料介紹...................................................................................................................526.3實驗結果...................................................................................................................54第7章結論與未來展望......................................................................................................557.1結論..………………………………………………………………………………...597.2未來展望....................................................................................................................60參考文獻..................................................................................................................................62附錄相關文章與相關句範例................................................................................................zh_TWdc.format.extent1074690bytes-dc.format.mimetypeapplication/pdf-dc.source.uri(資料來源)http://thesis.lib.nccu.edu.tw/record/#G1017530281en_USdc.subject(關鍵詞)自然語言處理zh_TWdc.subject(關鍵詞)NatureLanguageProcessingen_USdc.title(題名)基於英文維基百科之文字蘊涵zh_TWdc.title(題名)TextEntailmentbasedonEnglishWikipediaen_USdc.type(資料類型)thesisendc.relation.reference(參考文獻)[1]Adams,“TextualEntailmentThroughExtendedLexicalOverlap,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.128-133,2006.[2]BLEU,http://en.wikipedia.org/wiki/BLEU[3]A.BudanitskyandG.Hirst,SemanticdistanceinWordNet:Anexperimental,application-orientedevaluationoffivemeasures,WorkshoponWordNetandOtherLexicalResources,SecondMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,Pittsburgh,Pennsylvania,USA,2001.[4]S.CohenandN.Or,"Ageneralalgorithmforsubtreesimilarity-search,"DataEngineering(ICDE),IEEE30thInternationalConference.pp.928-939,2014.[5]Gridsearch,http://scikit-learn.org/stable/modules/grid_search.html[6]S.HattoriandS.Sato,“TeamSKL’sStrategyandExperienceinRITE2,”Proceedingsofthe10thNTCIRConference,pp.435-442,2013.[7]A.Hickl,J.Bensley,J.Williams,K.Roberts,B.Rink,andY.Shi,“RecognizingTextualEntailmentwithLCC’sGROUNDHOGSystem,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.80-85,2006.[8]Heuristicfunction,http://en.wikipedia.org/wiki/Heuristic_function63[9]W.-J.HuangandC.-L.Liu,“NCCU-MIGatNTCIR-10:UsingLexical,Syntactic,andSemanticFeaturesfortheRITETasks,”Proceedingsofthe10thNTCIRConference,pp.430-434,2013.[10]G.Li,X.Liu,J.Feng,andL.Zhou,“EfficientSimilaritySearchforTree-StructuredData,AuthorAffiliations:DepartmentofComputerScienceandTechnology,”Proceedingsofthe20thScientificandStatisticalDatabaseManagementConference,pp.131-149,2008.[11]LinearlyWeightedFunctions,http://en.wikipedia.org/wiki/Weight_function[12]LongestCommonStrings,http://en.wikipedia.org/wiki/Longest_common_substring_problem[13]Lucene,http://lucene.apache.org/core/[14]NamedEntityRecognition,http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html[15]NTCIRRITE-VAL,http://research.nii.ac.jp/ntcir/index-en.html[16]RTE,http://research.microsoft.com/en-us/groups/nlp/rte.aspx[17]S.RasoulandD.Landgrebe,“ASurveyofDecisionTreeClassifierMethodology,”IEEETransactionsonSystems,Man,andCybernetics,Vol.21,No.3,pp660-674,May1991.[18]StanfordCorenlp,http://nlp.stanford.edu/software/corenlp.shtml[19]StanfordNamedEntityRecognizer,http://www-nlp.stanford.edu/software/CRF-NER.shtml64[20]StanfordParser,http://nlp.stanford.edu/software/lex-parser.shtml[21]StanfordTypedDependencies,http://nlp.stanford.edu/software/stanford-dependencies.shtml[22]SVM,http://en.wikipedia.org/wiki/Support_vector_machine[23]TextualEntailment,http://en.wikipedia.org/wiki/Textual_entailment[24]Totalcommander,http://www.ghisler.com/[25]Wikipedia,http://en.wikipedia.org/wiki/Main_Page[26]WordNet,http://wordnet.princeton.edu/[27]S.-H.Wu,S.-S.Yang,L.-P.Chen,H.-S.Chiu,andR.-D.Yang,“CYUTChineseTextualEntailmentRecognitionSystemforNTCIR-10RITE-2.”Proceedingsofthe10thNTCIRConference,pp.443-448,2013.[28]S.-H.Wu,W.-C.Huang,L.-P.Chen,andT.Ku,“Binary-classandMulti-classChineseTexturalEntailmentSystemDescriptioninNTCIR-9RITE,”Proceedingsofthe9thNTCIRConference,pp.422-426,2011.[29]Y.Y.Zhang,J.Xu,C.-L.Liu,X.-L.Wang,R.-F.Xu,Q.-C.Chen,X.Wang,Y.-S.Hou,andB.Tang,“ICRC_HITSZatRITE:LeveragingMultipleClassifiersVotingforTextualEntailmentRecognition,”Proceedingsofthe9thNTCIRConference,pp.325-329,2011.zh_TW



請為這篇文章評分?