NLP & Vision
Image Retrieval for Complex Queries
Our work on image retrieval for complex queries is amongst the earlier works in the area. We exploit the relationship among the query concepts through linguistic patterns, and based on these patterns we proposed models, textual and visual. Lack of information with images is handled by a hybrid model. We also use external knowledge from existing and/or our proposed knowledge graphs to improve the retrieval performance for abstract, ambiguous, and long type of complex queries.
Image Tagging
We have exploited the image and tag neighborhood to quantify relevance between image and a concept and thus enhancing image annotation. Using the concept of neighborhood, we also observed that rare tags which are also relevant are given better relevance score as compared to other existing methods. We also dealt with the void of information problem (absence or lack of textual information with an image) which results in deterioration of image annotation.
Knowledge Graph
We have built a knowledge base (Visio-Textual KB, VTKB) which is the first KB that exploits both modalities (textual and visual). It is an automatically built KB that uses reliable sources (dictionaries, Wikipedia, etc., rather than web articles) and establishes relations that are textually and visually important. It avoids generating noisy patterns which is a major problem in automatically built existing KBs. We have used VTKB to embed knowledge into a corpus of images for image annotation and image retrieval for complex queries and getting remarkable improvement in the results of image retrieval and image tagging.
VTKB is a generic knowledge base which has both visual and textual (semantic) representation of concepts. Images are better represented using VTKB, and thereby improving the performance of an image search engine. In particular, this can also be used to improve image indexing, classification, & clustering, etc.
Data Analytics
Anytime Mining for Data Streams
A real time data stream is characterized by continuously arriving data objects at a fast and variable rate, ordered by time. Mining data streams is typically constrained by limited available time to process and limited memory to store the incoming data objects. The time available to process each arriving object depends upon the stream speed. And, within these constraints, evolving patterns have to be captured. Our models are able to process any stream speed. Higher speeds is handled using deferred insertions & processing. The spare time available while processing lower speed streams is utilized for refining the information received and produce immediate mining results with compromised accuracy.
Parallelization strategies for ML/DM Algorithms
My interest is also on redesigning the clustering sequential algorithms to make them more amenable for parallelization. The main concepts exploited in the work are gridding, spatial locality, indexing structures, data distribution strategies, etc.
Data distribution and data indexing structures
Data distribution schemes for static and dynamic data help in reducing the communication cost and increasing the load balancing for distributed memory systems. We have also designed and developed data indexing structures which suit to specific requirements of a particular category of algorithms e.g., data access patterns, queries involved, querying pattern of a specific category of clustering algorithms, etc.
Automatic Parallelization of Clustering Algorithms
We are also working on automatic parallelization of clustering algorithms through a domain specific language (so far developed for clustering) and a parallelizing compiler (developed for MPI) which automatically produce MPI code.