論文

基本情報

氏名 中藤 哲也
氏名(カナ) ナカトウ テツヤ
氏名(英語) NAKATOU TETSUYA
所属 中村学園大学 栄養科学部 栄養科学科
職名 准教授

題名

Vector representation of words for plagiarism detection based on string matching

単著・共著の別

 

著者

Kensuke Baba
Tetsuya Nakatoh
Toshiro Minami

担当区分

 

概要

Plagiarism detection in documents requires appropriate definition of document similarity and efficient computation of the similarity. This paper evaluates the validity of using vector representation of words for defining a document similarity in terms of the processing time and the accuracy in plagiarism detection. This paper proposes a plagiarism detection algorithm based on the score vector weighted by vector representation of words. The score vector between two documents represents the number of matches between corresponding words for every possible gap of the starting positions of the documents. The vector and its weighted version can be computed efficiently using convolutions. In this paper, two types of vector representation of words, that is, randomly generated vectors and a distributed representation generated by a neural network-based method from training data, are evaluated with the proposed algorithm. The experimental results show that using the weighted score vector instead of the normal one for the algorithm can reduce the processing time with a slight decrease of the accuracy, and that randomly generated vector representation is more suitable for the algorithm than the distributed representation in the sense of a tradeoff between the processing time and the accuracy.

発表雑誌等の名称

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

出版者

Springer Verlag

10274

 

開始ページ

341

終了ページ

350

発行又は発表の年月

2017

査読の有無

有り

招待の有無

無し

記述言語

英語

掲載種別

研究論文(国際会議プロシーディングス)

国際・国内誌

 

国際共著

 

ISSN

 

eISSN

 

DOI

10.1007/978-3-319-58524-6_28

Cinii Articles ID

 

Cinii Books ID

 

Pubmed ID

 

PubMed Central 記事ID

 

形式

無償ダウンロード

JGlobalID

 

arXiv ID

 

ORCIDのPut Code

 

DBLP ID