PMCN: combining pdf-modified similarity and complex network in multi-document summarization
Keywords:
TF-PDF, pdf-modified similarity, Document summarization, complex networkAbstract
This study combines the concept of degree centrality in complex network with the Term Frequency * Proportional Document Frequency (TF*PDF) algorithm; the combined method, called TF-PDF, constructs relationship networks among sentences for writing news summaries. The TF-PDF method is a multi-document summarization extension of the ideas of Bun and Ishizuka (2002), who first published the TF*PDF algorithm for detecting hot topics. In their TF*PDF algorithm, Bun and Ishizuka defined the publisher of a news item as its channel. If the PDF weight of a term is higher than the weights of other terms, then the term is hotter than the other terms. However, this study attempts to develop summaries for news items. Because the TF*PDF algorithm summarizes daily news, TF-PDF replaces the concept of “channel” with “the date of the news event,” and uses the resulting chronicle ordering for a multi-document summarization algorithm, of which the F-measure scores were 0.042 and 0.051 higher than LexRank for the famous d30001t and d30003t tasks, respectively.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 International Journal of Knowledge Content Development & Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
- The Copyright of reserch papers already or will be published in the jounal shall belong to RIKCDT, and by submitting a manuscript, the copyright is consider to be transferred to RIKCDT.
- No dispute shall be raised on matters RIKCDT already managed on research papers published in previous journal.
- Once publication of the paper is confirmed, Copyright tansfer agreement shall be submitted.