Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) to transform the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms.

Introduction: Text mining is a computation technique to summarize and search large collections of text. This is an introductory course and tools to quickly parse and summarize large text corpus will be used. Indexing for search, and clustering by similarity will be taught..


Why learn Text Mining? Researchers can have a hard time keeping up with knowledge sources such as Pubmed and USPTO. Tools that summarize and fetch only the most relevant information is highly desirable. A simple keyword search is no good any longer. Researchers will save time!


Logistics: The course is divided into 4 sessions with pre-recorded videos, handouts, reference cards, examples, data, scripts and quizzes. Enrollees can contact the instructor with questions and get help on the projects. The main topics are listed below. Homework assignments will involve running commands learned in the live lectures.


Requirements: No experience is required, although programming experience will be useful


Price: $1200 for Commercial/Government enrollees and $600 for Academic researchers and students


Instructor: Shailender Nagpal


Syllabus:

  • - Introduction to the concept of mining text
  • - Searching vs mining. Some examples
  • - Bioinformatics text sources
  • - Infrastructure for text mining - USPTO and Pubmed abstract download
  • - Tools for processing text corpus and reporting statistics
  • - Meta-analysis of text corpus to cluster documents by similarit