Page 124 - Kaleidoscope Academic Conference Proceedings 2020
P. 124
2020 ITU Kaleidoscope Academic Conference
the paper with a brief discussion on the overall contributions constraints of human communication. This technique is also
and suggestions for future work. capable of handling messages composed using text only.
2. RELATED WORK Inches and Crestani have in [7] proposed a framework for
both author and topic identifications. In this framework,
Several researchers have worked in this area of topic and Latent Dirichlet Analysis (LDA) is used for topic
intent identification in chat messages. Through in-depth identification and its hierarchical version is applied on
studies and evaluations, they have proposed several segmented conversation data for topic detection. This method
techniques for analyzing the text exchanged between two is also restricted to handling only text messages using
users and extracted the intention of the users involved. This complete sentences.
section presents a critical analysis of some of the most
prominent work published in the literature. Chen et al., have in [8] used semantic dependency distance
(SDD) along with PLSA to avoid the lack of semantic
Dong et al have studied the characteristics of chat messages information that generally happens when PLSA alone is used
using 33,121 sample messages collected from 1700 for this purpose. Though this method performs better than
conversational sessions with the objective of understanding techniques that use only PLSA for topic detection, it is also
the properties of chat messages and extracting the topic of unable to handle messages with abbreviations and other
conversation [3]. Based on the studies carried out, they have image-based components.
proposed an indicative term based chat topic detection
technique that incorporates multiple techniques such as The technique proposed by the authors in this paper differs in
sessionalization of chat messages and the extraction of many ways from existing techniques including the
features from icon text and URLs for preprocessing along incorporation of a novel algorithm for grouping similar
with naive Bayes, associative classification and support messages while minimizing the drawbacks encountered in
vector machines (SVM) as classifiers to group conversations cosine similarity in the online chatting domain. Also, the
into different categories using a set of topic indicative terms proposed technique can handle abbreviations commonly used
identified by an experimental study on the sample data and in chat messages along with other meaning bearing
words predefined for each topic. Though this technique components such as emojis and smileys.
outperforms the document frequency based approach, it is
capable of handling only text with complete words and 3. CHARACTERISTICS OF CHAT MESSAGES
sentences. Hence, the inability to handle different meaning
bearing components in messages such as emojis, smileys and It is important to understand the characteristics of online chat
emoticons present in a message and abbreviated text are the messages for processing them effectively for identifying the
main shortcomings of this technique. intention of the users or topic being discussed. Online chat
messages are generally different from other texts having their
The technique used by Zhang et al., in [4] is that each message own unique features. This makes the processing of these
is treated as a data item in a stream of messages and then messages more difficult compared to other text processing
probabilistic latent sentiment analysis (PLSA) is applied on tasks. The general features of online text messages are
the collected messages to discover the structure of the topic discussed below.
of message streams by modeling the message-word co-
occurrence matrix information. The main objective of this 3.1 Message length is generally very short
proposal is to handle three main issues in instant messaging
as handling useless terms, very short messages and the use of The short nature of messages poses great challenges for
multiple languages. This technique is also capable of understanding the topic or the context being discussed even
handling text only and cannot handle messages mixed with for a human user. Hence, understanding the messages
other meaning bearing components. becomes one of the main challenges when it is to be
automated to be carried out by a machine. Lack of details in
Iqbal et al., have in [5] suggested a framework for analyzing a message is a major issue associated with short messages. In
online messages for criminal investigations. The proposed order to address this issue, the authors suggest identifying the
technique uses the whole chat log from a confiscated semantically rich words and grouping them together to enrich
computer as input and carries out topic extraction on the content forming a larger set of semantically rich words.
identified social networks by summarizing the messages to
aid the criminal investigation. This method is also restricted 3.2 Dynamic nature of the conversations
to handling complete text only and cannot handle messages
mixed with different components. Unlike other text documents such as articles, posts,
comments, or reviews, chat messages generally do not follow
The technique used by Song and Diederich in [6] first a single topic. Also, most of the time, each and every message
segments messages into sentences and then the sentences are may not contribute to the topic. Hence, it is first necessary to
converted into tuples of the form: (performative, proposition) identify different groups of messages contributing towards
using a dialog act classifier. Following this, the intention of different topics discussed within a single thread. This issue is
the sender is formulated using the tuples and well-chosen to be handled by identifying different groups of messages
– 66 –