Personalized Text Summarization

Róbert Móro, Supervisor: Prof. Mária Bieliková


Method of Personalized Summarization

We have proposed a method of personalized text summarization based on a method of latent semantic analysis which consists of the following steps:

  • Pre-processing
  • Construction of a personalized terms-sentences matrix
  • Singular value decomposition (SVD)
  • Sentences selection

We have identified the construction of a terms-sentences matrix representing the document as a step suitable for personalization of the summarization. We have extended the conventional weighting scheme based on tf-idf method by linear combination of multiple raters, which positively or negatively affect the weight of each term (see Figure 1).

term weighnig

Figure 1. Term weighting by a combination of raters, where tij is a term, Ri is a rater with its linear coefficient αi and w(tij) is a weight assigned to the term tij.

We have designed a set of raters which can be divided into two groups:

  • Generic raters: terms frequency rater, terms location rater and relevant domain terms rater
  • Personalized raters: knowledge rater, annotations rater, goals rater, interests rater

We have identified three main sources of the summarization personalization and adaptation suitable for (but not limited to) the domain of learning:

  • Domain conceptualization in the form of the relevant domain terms
  • Knowledge of the users
  • Annotations added by users, i.e. highlights or tags

Relevant domain terms rater utilizes information contained in a domain model of an adaptive system. Domain models are usually constructed manually by domain experts by capturing their knowledge of the domain in the form of important concepts (relevant domain terms) and relationships among them which makes them valuable sources of information for summarization adaptation (if they are available).

We have designed knowledge rater as a personalized version of the previous one. It uses information captured in the user model which records level of knowledge of each concept from the domain for each particular user.

Annotations rater takes into account fragments of the text highlighted by a particular user augmented by the most popular highlights made by all the users (the most highlighted fragments of the document).

Personalized Method of Selecting Documents for Revision

We have proposed a personalized method for selecting documents for revision based on extisting approaches in the field of recommendation. Our method considers various aspects, such as:

  • Time of reading of the documents
  • Popularity of documents
  • Concepts' relationships (especially prerequisity)
  • Suitability of a document according to a particular user's goals
  • Change of knowledge of a particular user

Scoring documents according to the change of knowledge of a particular user is based on cosine similarity between the vector of the concepts' knowledge deltas and the vector of the related concepts' weights (representing the measure of relatedness between a concept, i.e. relevant domain term and a document).

The idea is that if a knowledge of a concept had increased during the session, the user has learned something new and she should revise it before starting learning in the next session. Similarly, if the knowledge had decreased, the user forgot what she had known before (or our estimation in the user model had been wrong) and therefore, she should revise the concept to regain the forgotten knowledge. If a knowledge of a concept had not changed, we do not need to revise it. The described situation is modelled by a diagram in Figure 2.

knowledge change

Figure 2. Scoring of concepts (relevant domain terms) as a function of the knowledge change of a particular user (x-axis).