Personalized Text Summarization

Róbert Móro, Supervisor: Prof. Mária Bieliková

Evaluation

Experiments' Overview


We have evaluated our proposed method of personalized summarization in the domain of learning by means of a standalone summarizer integrated with the educational system ALEF (see Figure 1). We have performed two experiments during Functional and Logic Programming (FLP) and Principles of Software Engineering (PSE) courses.

ALEF with summarizer component

Figure 1. Example screenshot of ALEF (in Slovak) with the integrated summarizer (1). Current user rating is shown in the right top corner (2). Students can also add feedback in the form of free text (3) or navigate themselves to the next summary by clicking the Next button in the right bottom corner (4).

In total, 75 students have participated on these experiments. Their task has been to evaluate quality of the presented summaries of educational texts by rating them and providing feedback in the form of answers to the follow-up questions. We have incquired whether whether the sentences selected for the summary are representative, whether the summary is suitable for the revision or whether it could help them to decide the document relevance. We have also been interested whether the length of the summary is suitable given the length and content of the document and if it is readable and comprehensible.

Furthermore, we have chosen a control expert group to compare their summary evaluation to that of the other students. The group has consisted of five to seven domain experts. In contrast to the other participants, they have been presented both summary variants (in random order) for each educational text in order to decide which variant is better or whether they are equal.

We have gathered summaries for 303 educational texts (explanation learning objects), 2242 summary ratings and 385 summary variants comparisons from experts. Moreover, students have answered 479 follow-up questions.

Summarization Considering RDT


Summarization considering the domain-relevant terms has on average gained higher score (3.79) on the five-point scale compared to the first variant (generic summarization), which has scored 3.54 on average.

We have also computed average score for each summary variant for each document. The second variant (summarization considering RDT) has scored more in comparison to the first one in 48% of the cases, the same in 11% and less in 41%. The comparison of summaries by the experts has given us similar results. The second variant has been evaluated as better in 49% of the cases, as equal in 20% and worse in 31% (see Figure 2). Thus, our results suggest that considering the domain-relevant terms in the summarization process leads to better summaries in comparison to the baseline generic variant.

Summarization considering RDT vs. the generic variant

Figure 2. Comparison of summary variants, where A means that summary considering the domain-relevant terms was evaluated as better, B that generic summary was better and C that they were equal.

Summarization Considering Annotations


Summarization considering annotations has been evaluated (by the epert students group) as better in comparison with the generic variant in 48% of cases. This number is similar to the results from the experiment with RDT. However, much lower is the number of cases when the generic variant has been evaulated as better (only 24%, see Figure 3). Thus, we can conclude that considering annotations in the summarization process leads to better summaries in comparison to the baseline generic variant as well.

Summarization considering annotations vs. the generic variant

Figure 3. Comparison of summary variants, where A means that summary considering the annotations was evaluated as better, B that generic summary was better and C that they were equal.

Conclusions


Our experimental results suggest that using the domain-relevant terms in the process of summarization, as well as annotations, can help selecting representative sentences capable of summarizing the document, even for revision. Furthermore, students' answers to the follow-up questions show that our approach is capable of generating readable and comprehensible summaries with satisfiable length.