Page 124 - Kaleidoscope Academic Conference Proceedings 2020
P. 124

2020 ITU Kaleidoscope Academic Conference




           the paper with a brief discussion on the overall contributions   constraints of human communication. This technique is also
           and suggestions for future work.                   capable of handling messages composed using text only.

                         2.  RELATED WORK                     Inches  and  Crestani  have in [7] proposed a  framework  for
                                                              both  author  and  topic  identifications.  In  this  framework,
           Several  researchers  have  worked  in  this  area  of  topic  and   Latent  Dirichlet  Analysis  (LDA)  is  used  for  topic
           intent  identification  in  chat  messages.  Through  in-depth   identification  and  its  hierarchical  version  is  applied  on
           studies  and  evaluations,  they  have  proposed  several   segmented conversation data for topic detection. This method
           techniques  for  analyzing  the  text  exchanged  between  two   is  also  restricted  to  handling  only  text  messages  using
           users and extracted the intention of the users involved. This   complete sentences.
           section  presents  a  critical  analysis  of  some  of  the  most
           prominent work published in the literature.        Chen et al., have in [8] used semantic dependency distance
                                                              (SDD)  along  with  PLSA  to  avoid  the  lack  of  semantic
           Dong et al have studied the characteristics of chat messages   information that generally happens when PLSA alone is used
           using  33,121  sample  messages  collected  from  1700   for  this  purpose. Though this method performs  better than
           conversational sessions with the objective of understanding   techniques that use only PLSA for topic detection, it is also
           the properties of chat messages and extracting the topic of   unable  to  handle  messages  with  abbreviations  and  other
           conversation [3]. Based on the studies carried out, they have   image-based components.
           proposed  an  indicative  term  based  chat  topic  detection
           technique  that  incorporates  multiple  techniques  such  as   The technique proposed by the authors in this paper differs in
           sessionalization  of  chat  messages  and  the  extraction  of   many  ways  from  existing  techniques  including  the
           features  from  icon  text  and  URLs  for  preprocessing  along   incorporation  of  a  novel  algorithm  for  grouping  similar
           with  naive  Bayes,  associative  classification  and  support   messages  while  minimizing  the  drawbacks  encountered  in
           vector machines (SVM) as classifiers to group conversations   cosine  similarity  in  the  online  chatting  domain.  Also,  the
           into different categories using a set of topic indicative terms   proposed technique can handle abbreviations commonly used
           identified by an experimental study on the sample data and   in  chat  messages  along  with  other  meaning  bearing
           words  predefined  for  each  topic.  Though  this  technique   components such as emojis and smileys.
           outperforms  the  document  frequency  based  approach,  it  is
           capable  of  handling  only  text  with  complete  words  and   3.  CHARACTERISTICS OF CHAT MESSAGES
           sentences. Hence, the inability to handle different meaning
           bearing components in messages such as emojis, smileys and   It is important to understand the characteristics of online chat
           emoticons present in a message and abbreviated text are the   messages for processing them effectively for identifying the
           main shortcomings of this technique.               intention of the users or topic being discussed. Online chat
                                                              messages are generally different from other texts having their
           The technique used by Zhang et al., in [4] is that each message   own  unique  features.  This  makes  the  processing  of  these
           is  treated as a  data item in  a stream  of messages and then   messages more  difficult compared to  other text processing
           probabilistic latent sentiment analysis (PLSA) is applied on   tasks.  The  general  features  of  online  text  messages  are
           the collected messages to discover the structure of the topic   discussed below.
           of  message  streams  by  modeling  the  message-word  co-
           occurrence  matrix  information.  The  main  objective  of  this   3.1   Message length is generally very short
           proposal is to handle three main issues in instant messaging
           as handling useless terms, very short messages and the use of   The  short  nature  of  messages  poses  great  challenges  for
           multiple  languages.  This  technique  is  also  capable  of   understanding the topic or the context being discussed even
           handling text only and cannot handle messages mixed with   for  a  human  user.  Hence,  understanding  the  messages
           other meaning bearing components.                  becomes  one  of  the  main  challenges  when  it  is  to  be
                                                              automated to be carried out by a machine. Lack of details in
           Iqbal et al., have in [5] suggested a framework for analyzing   a message is a major issue associated with short messages. In
           online  messages  for  criminal  investigations.  The  proposed   order to address this issue, the authors suggest identifying the
           technique  uses  the  whole  chat  log  from  a  confiscated   semantically rich words and grouping them together to enrich
           computer  as  input  and  carries  out  topic  extraction  on   the content forming a larger set of semantically rich words.
           identified social networks by summarizing the messages to
           aid the criminal investigation. This method is also restricted   3.2   Dynamic nature of the conversations
           to handling complete text only and cannot handle messages
           mixed with different components.                   Unlike  other  text  documents  such  as  articles,  posts,
                                                              comments, or reviews, chat messages generally do not follow
           The  technique  used  by  Song  and  Diederich  in  [6]  first   a single topic. Also, most of the time, each and every message
           segments messages into sentences and then the sentences are   may not contribute to the topic. Hence, it is first necessary to
           converted into tuples of the form: (performative, proposition)   identify  different  groups  of  messages  contributing  towards
           using a dialog act classifier. Following this, the intention of   different topics discussed within a single thread. This issue is
           the  sender  is  formulated  using  the  tuples  and  well-chosen   to  be  handled  by  identifying  different  groups  of  messages





                                                           – 66 –
   119   120   121   122   123   124   125   126   127   128   129