基於英文維基百科之文字蘊涵- 政大學術集成

2024-11-23

文章推薦指數： 80 %

投票人數：10人

Text Entailment based on English Wikipedia ... Since Recognizing Textual Entailment (RTE) began to hold the contest of English corpus in 2005, more and more ... Togglenavigation 主頁 Post-Print 關於學術集成 English 正體中文理學院學位論文Item 研究者學系學術產出文章檢視/開啟 (355) 書目匯出 EndnoteRIS格式資料匯出 Bibtex格式資料匯出 GoogleScholarTM 搜尋政大圖書館學術資源探索系統引文資訊無doi欄位資料顯示引文資訊社群sharing TAIR相關學術產出 >SimpleRecord >FullRecord 欄位名稱題名: 基於英文維基百科之文字蘊涵TextEntailmentbasedonEnglishWikipedia作者: 林柏誠Lin,PoCheng貢獻者: 劉昭麟Liu,ChaoLin林柏誠Lin,PoCheng關鍵詞: 自然語言處理NatureLanguageProcessing日期: 2014上傳時間: 2015-01-0511:22:29(UTC+8)描述.abstract: 近年來文字蘊涵研究在自然語言處理中逐漸受到重視，從2005年RecognizingTextualEntailment(RTE)舉辦英文語料相關評比開始，越來越多人開始投入文字蘊涵的相關研究，而NIITestbedsandCommunityforinformationaccessResearch(NTCIR)也從第九屆開始舉辦RecognizingInferenceinText(RITE)的相關評比，除了英文語料以外，亦包含繁體中文、簡體中文以及日文等等的語料，開始引起亞洲地區相關研究者的關注參加。

本研究以文字蘊涵技術為基底，透過維基百科，判斷任一論述句其含義是與事實相符，或與事實違背，我們依據論述句的語文資訊，在維基百科中找出與論述句相關的文章，並從中尋找有無相關的句子，支持或反對該論述句的論點，藉以判斷其結果。

我們將本系統大致分成了三個程序，第一步是先從維基百科中擷取與論述句的相關文章，接著我們從相關文章中擷取與論述句有關聯的相關句，最後則是從找出的相關句中，判別那些相關句是支持還是反對該論述句，並透過LinearlyWeightedFunctions(LWFs)藉以判別每個相關特徵的權重和各項推論的門檻值，期許透過上述的方法以及各項有效的語言特徵，能夠推論出論述句的真實與否。

Inrecentyears,theresearchoftextualentailmentisgettingmoreimportantinNaturalLanguageProcessing.SinceRecognizingTextualEntailment(RTE)begantoholdthecontestofEnglishcorpusin2005,moreandmorepeoplestarttoengageintherelatedresearch.Besides,NTCIRninthhasheldtherelatedtaskRecognizingInferenceinText(RITE)inChinese,Japanese,andotherslanguagescorpus.ThereforeithasgraduallyattractedAsianpeopletofocusonthisarea.Inthispaper,webasedontheskilloftextualentailment.Tryingtovalidateanyofinputsentenceswhicharetruthoragainsttothefact.Accordingtothelanguageinformationininputsentences,weextracttherelatedarticlesonWikipedia.Then,weextracttherelatedsentencesfromthosearticlesandrecognizingthemwhicharesupportoragainsttheinputsentence.Hence,wecanusethatinformationtovalidatetheinputsentences.Oursystemisroughlydepartedintothreeparts.FirstisextractrelatedarticlesfromWikipedia,secondisextractrelatedsentencesfromrelatedarticles.Thelastisvalidatethosesentenceswhicharesupportoragainsttheinputsentence.WealsoadoptLinearWeightFunctions(LWFs)toadjusteveryfeaturesparametersandentailment’sthreshold.Bytheinformationandusefullanguagefeaturesabove,wehopeitcanvalidatewhetherinputsentencesistruthornot.參考文獻: [1]Adams,“TextualEntailmentThroughExtendedLexicalOverlap,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.128-133,2006.[2]BLEU,http://en.wikipedia.org/wiki/BLEU[3]A.BudanitskyandG.Hirst,SemanticdistanceinWordNet:Anexperimental,application-orientedevaluationoffivemeasures,WorkshoponWordNetandOtherLexicalResources,SecondMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,Pittsburgh,Pennsylvania,USA,2001.[4]S.CohenandN.Or,"Ageneralalgorithmforsubtreesimilarity-search,"DataEngineering(ICDE),IEEE30thInternationalConference.pp.928-939,2014.[5]Gridsearch,http://scikit-learn.org/stable/modules/grid_search.html[6]S.HattoriandS.Sato,“TeamSKL’sStrategyandExperienceinRITE2,”Proceedingsofthe10thNTCIRConference,pp.435-442,2013.[7]A.Hickl,J.Bensley,J.Williams,K.Roberts,B.Rink,andY.Shi,“RecognizingTextualEntailmentwithLCC’sGROUNDHOGSystem,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.80-85,2006.[8]Heuristicfunction,http://en.wikipedia.org/wiki/Heuristic_function63[9]W.-J.HuangandC.-L.Liu,“NCCU-MIGatNTCIR-10:UsingLexical,Syntactic,andSemanticFeaturesfortheRITETasks,”Proceedingsofthe10thNTCIRConference,pp.430-434,2013.[10]G.Li,X.Liu,J.Feng,andL.Zhou,“EfficientSimilaritySearchforTree-StructuredData,AuthorAffiliations:DepartmentofComputerScienceandTechnology,”Proceedingsofthe20thScientificandStatisticalDatabaseManagementConference,pp.131-149,2008.[11]LinearlyWeightedFunctions,http://en.wikipedia.org/wiki/Weight_function[12]LongestCommonStrings,http://en.wikipedia.org/wiki/Longest_common_substring_problem[13]Lucene,http://lucene.apache.org/core/[14]NamedEntityRecognition,http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html[15]NTCIRRITE-VAL,http://research.nii.ac.jp/ntcir/index-en.html[16]RTE,http://research.microsoft.com/en-us/groups/nlp/rte.aspx[17]S.RasoulandD.Landgrebe,“ASurveyofDecisionTreeClassifierMethodology,”IEEETransactionsonSystems,Man,andCybernetics,Vol.21,No.3,pp660-674,May1991.[18]StanfordCorenlp,http://nlp.stanford.edu/software/corenlp.shtml[19]StanfordNamedEntityRecognizer,http://www-nlp.stanford.edu/software/CRF-NER.shtml64[20]StanfordParser,http://nlp.stanford.edu/software/lex-parser.shtml[21]StanfordTypedDependencies,http://nlp.stanford.edu/software/stanford-dependencies.shtml[22]SVM,http://en.wikipedia.org/wiki/Support_vector_machine[23]TextualEntailment,http://en.wikipedia.org/wiki/Textual_entailment[24]Totalcommander,http://www.ghisler.com/[25]Wikipedia,http://en.wikipedia.org/wiki/Main_Page[26]WordNet,http://wordnet.princeton.edu/[27]S.-H.Wu,S.-S.Yang,L.-P.Chen,H.-S.Chiu,andR.-D.Yang,“CYUTChineseTextualEntailmentRecognitionSystemforNTCIR-10RITE-2.”Proceedingsofthe10thNTCIRConference,pp.443-448,2013.[28]S.-H.Wu,W.-C.Huang,L.-P.Chen,andT.Ku,“Binary-classandMulti-classChineseTexturalEntailmentSystemDescriptioninNTCIR-9RITE,”Proceedingsofthe9thNTCIRConference,pp.422-426,2011.[29]Y.Y.Zhang,J.Xu,C.-L.Liu,X.-L.Wang,R.-F.Xu,Q.-C.Chen,X.Wang,Y.-S.Hou,andB.Tang,“ICRC_HITSZatRITE:LeveragingMultipleClassifiersVotingforTextualEntailmentRecognition,”Proceedingsofthe9thNTCIRConference,pp.325-329,2011.描述: 碩士國立政治大學資訊科學學系101753028103資料來源: http://thesis.lib.nccu.edu.tw/record/#G1017530281資料類型: thesis DC欄位名稱語言 dc.contributor.advisor劉昭麟zh_TWdc.contributor.advisorLiu,ChaoLinen_USdc.contributor.author林柏誠zh_TWdc.contributor.authorLin,PoChengen_USdc.creator(作者)林柏誠zh_TWdc.creator(作者)Lin,PoChengen_USdc.date(日期)2014en_USdc.date.accessioned2015-01-0511:22:29(UTC+8)-dc.date.available2015-01-0511:22:29(UTC+8)-dc.date.issued(上傳時間)2015-01-0511:22:29(UTC+8)-dc.identifier(其他Identifiers)G1017530281en_USdc.identifier.uri(URI)http://nccur.lib.nccu.edu.tw/handle/140.119/72556-dc.description(描述)碩士zh_TWdc.description(描述)國立政治大學zh_TWdc.description(描述)資訊科學學系zh_TWdc.description(描述)101753028zh_TWdc.description(描述)103zh_TWdc.description.abstract(摘要)近年來文字蘊涵研究在自然語言處理中逐漸受到重視，從2005年RecognizingTextualEntailment(RTE)舉辦英文語料相關評比開始，越來越多人開始投入文字蘊涵的相關研究，而NIITestbedsandCommunityforinformationaccessResearch(NTCIR)也從第九屆開始舉辦RecognizingInferenceinText(RITE)的相關評比，除了英文語料以外，亦包含繁體中文、簡體中文以及日文等等的語料，開始引起亞洲地區相關研究者的關注參加。

本研究以文字蘊涵技術為基底，透過維基百科，判斷任一論述句其含義是與事實相符，或與事實違背，我們依據論述句的語文資訊，在維基百科中找出與論述句相關的文章，並從中尋找有無相關的句子，支持或反對該論述句的論點，藉以判斷其結果。

我們將本系統大致分成了三個程序，第一步是先從維基百科中擷取與論述句的相關文章，接著我們從相關文章中擷取與論述句有關聯的相關句，最後則是從找出的相關句中，判別那些相關句是支持還是反對該論述句，並透過LinearlyWeightedFunctions(LWFs)藉以判別每個相關特徵的權重和各項推論的門檻值，期許透過上述的方法以及各項有效的語言特徵，能夠推論出論述句的真實與否。

zh_TWdc.description.abstract(摘要)Inrecentyears,theresearchoftextualentailmentisgettingmoreimportantinNaturalLanguageProcessing.SinceRecognizingTextualEntailment(RTE)begantoholdthecontestofEnglishcorpusin2005,moreandmorepeoplestarttoengageintherelatedresearch.Besides,NTCIRninthhasheldtherelatedtaskRecognizingInferenceinText(RITE)inChinese,Japanese,andotherslanguagescorpus.ThereforeithasgraduallyattractedAsianpeopletofocusonthisarea.Inthispaper,webasedontheskilloftextualentailment.Tryingtovalidateanyofinputsentenceswhicharetruthoragainsttothefact.Accordingtothelanguageinformationininputsentences,weextracttherelatedarticlesonWikipedia.Then,weextracttherelatedsentencesfromthosearticlesandrecognizingthemwhicharesupportoragainsttheinputsentence.Hence,wecanusethatinformationtovalidatetheinputsentences.Oursystemisroughlydepartedintothreeparts.FirstisextractrelatedarticlesfromWikipedia,secondisextractrelatedsentencesfromrelatedarticles.Thelastisvalidatethosesentenceswhicharesupportoragainsttheinputsentence.WealsoadoptLinearWeightFunctions(LWFs)toadjusteveryfeaturesparametersandentailment’sthreshold.Bytheinformationandusefullanguagefeaturesabove,wehopeitcanvalidatewhetherinputsentencesistruthornot.en_USdc.description.tableofcontents第1章緒論............................................................................................................................11.1研究背景與動機.........................................................................................................11.2方法概述.....................................................................................................................21.3主要貢獻.....................................................................................................................21.4論文架構.....................................................................................................................3第2章文獻回顧....................................................................................................................42.1文字蘊涵相關研究.....................................................................................................42.2RTE與RITE評比相關研究.......................................................................................5第3章語料及辭典介紹........................................................................................................73.1語料集.........................................................................................................................73.2英文維基百科.............................................................................................................93.3WordNet........................................................................................................................9第4章研究方法..................................................................................................................104.1擷取相關文章及相關句...........................................................................................104.1.1擷取相關文章................................................................................................114.1.2擷取相關文章................................................................................................164.2相關度計算...............................................................................................................184.2.1相關句權重....................................................................................................194.2.2文章權重........................................................................................................254.2.3相關句綜合權重............................................................................................26iv4.3推論驗證系統...........................................................................................................264.3.1語文特徵介紹................................................................................................274.3.2LWFs公式與參數訓練方法..........................................................................39第5章系統效能評估..........................................................................................................425.1LinearlyWeightedFunctions參數及門檻值介紹.....................................................425.2實驗結果與討論........................................................................................................44第6章利用資訊檢索方法採取小規模實驗設計..............................................................516.1方法概述...................................................................................................................516.2語料介紹...................................................................................................................526.3實驗結果...................................................................................................................54第7章結論與未來展望......................................................................................................557.1結論..………………………………………………………………………………...597.2未來展望....................................................................................................................60參考文獻..................................................................................................................................62附錄相關文章與相關句範例................................................................................................zh_TWdc.format.extent1074690bytes-dc.format.mimetypeapplication/pdf-dc.source.uri(資料來源)http://thesis.lib.nccu.edu.tw/record/#G1017530281en_USdc.subject(關鍵詞)自然語言處理zh_TWdc.subject(關鍵詞)NatureLanguageProcessingen_USdc.title(題名)基於英文維基百科之文字蘊涵zh_TWdc.title(題名)TextEntailmentbasedonEnglishWikipediaen_USdc.type(資料類型)thesisendc.relation.reference(參考文獻)[1]Adams,“TextualEntailmentThroughExtendedLexicalOverlap,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.128-133,2006.[2]BLEU,http://en.wikipedia.org/wiki/BLEU[3]A.BudanitskyandG.Hirst,SemanticdistanceinWordNet:Anexperimental,application-orientedevaluationoffivemeasures,WorkshoponWordNetandOtherLexicalResources,SecondMeetingoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics,Pittsburgh,Pennsylvania,USA,2001.[4]S.CohenandN.Or,"Ageneralalgorithmforsubtreesimilarity-search,"DataEngineering(ICDE),IEEE30thInternationalConference.pp.928-939,2014.[5]Gridsearch,http://scikit-learn.org/stable/modules/grid_search.html[6]S.HattoriandS.Sato,“TeamSKL’sStrategyandExperienceinRITE2,”Proceedingsofthe10thNTCIRConference,pp.435-442,2013.[7]A.Hickl,J.Bensley,J.Williams,K.Roberts,B.Rink,andY.Shi,“RecognizingTextualEntailmentwithLCC’sGROUNDHOGSystem,”ProceedingsoftheSecondPASCALChallengesWorkshoponRecognisingTextualEntailment,pp.80-85,2006.[8]Heuristicfunction,http://en.wikipedia.org/wiki/Heuristic_function63[9]W.-J.HuangandC.-L.Liu,“NCCU-MIGatNTCIR-10:UsingLexical,Syntactic,andSemanticFeaturesfortheRITETasks,”Proceedingsofthe10thNTCIRConference,pp.430-434,2013.[10]G.Li,X.Liu,J.Feng,andL.Zhou,“EfficientSimilaritySearchforTree-StructuredData,AuthorAffiliations:DepartmentofComputerScienceandTechnology,”Proceedingsofthe20thScientificandStatisticalDatabaseManagementConference,pp.131-149,2008.[11]LinearlyWeightedFunctions,http://en.wikipedia.org/wiki/Weight_function[12]LongestCommonStrings,http://en.wikipedia.org/wiki/Longest_common_substring_problem[13]Lucene,http://lucene.apache.org/core/[14]NamedEntityRecognition,http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html[15]NTCIRRITE-VAL,http://research.nii.ac.jp/ntcir/index-en.html[16]RTE,http://research.microsoft.com/en-us/groups/nlp/rte.aspx[17]S.RasoulandD.Landgrebe,“ASurveyofDecisionTreeClassifierMethodology,”IEEETransactionsonSystems,Man,andCybernetics,Vol.21,No.3,pp660-674,May1991.[18]StanfordCorenlp,http://nlp.stanford.edu/software/corenlp.shtml[19]StanfordNamedEntityRecognizer,http://www-nlp.stanford.edu/software/CRF-NER.shtml64[20]StanfordParser,http://nlp.stanford.edu/software/lex-parser.shtml[21]StanfordTypedDependencies,http://nlp.stanford.edu/software/stanford-dependencies.shtml[22]SVM,http://en.wikipedia.org/wiki/Support_vector_machine[23]TextualEntailment,http://en.wikipedia.org/wiki/Textual_entailment[24]Totalcommander,http://www.ghisler.com/[25]Wikipedia,http://en.wikipedia.org/wiki/Main_Page[26]WordNet,http://wordnet.princeton.edu/[27]S.-H.Wu,S.-S.Yang,L.-P.Chen,H.-S.Chiu,andR.-D.Yang,“CYUTChineseTextualEntailmentRecognitionSystemforNTCIR-10RITE-2.”Proceedingsofthe10thNTCIRConference,pp.443-448,2013.[28]S.-H.Wu,W.-C.Huang,L.-P.Chen,andT.Ku,“Binary-classandMulti-classChineseTexturalEntailmentSystemDescriptioninNTCIR-9RITE,”Proceedingsofthe9thNTCIRConference,pp.422-426,2011.[29]Y.Y.Zhang,J.Xu,C.-L.Liu,X.-L.Wang,R.-F.Xu,Q.-C.Chen,X.Wang,Y.-S.Hou,andB.Tang,“ICRC_HITSZatRITE:LeveragingMultipleClassifiersVotingforTextualEntailmentRecognition,”Proceedingsofthe9thNTCIRConference,pp.325-329,2011.zh_TW

請為這篇文章評分？

延伸文章資訊

Textual entailment - Wikipedia

Textual entailment (TE) in natural language processing is a directional relation between text fra...

Textual Entailment - Oxford Handbooks Online

Textual entailment is a binary relation between two natural-language texts (called 'text' and 'hy...

文字蘊涵- 维基百科，自由的百科全书

文字蘊涵（Textual entailment，TE）在自然語言處理是一個文字片段之間的定向關係。擁有一個文字片段的含意時，可以從另一個文字如下關係。TE的框架中，將會導致必須 ...

AllenNLP系列文章之六：Textual Entailment（自然语言推理

最近在看AllenNLP包的时候，里面有个模块：文本蕴含任务(text entailment)，它的任务形式是：给定一个前提文本（premise），根据这个前提去推断假说文本（ ...

中文文字蘊涵系統之特徵分析Feature Analysis of Chinese ...

Feature Analysis of Chinese Textual Entailment System. 黃文奇Wan-Chi Huang, 吳世弘Shih-Hung Wu*. 朝陽科技大學...

基於英文維基百科之文字蘊涵- 政大學術集成

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

門薩考試時間

門薩會員可以幹嘛

門薩台灣

裝修工程英文

倒數計時器app

google倒數計時器