Students
We collaborate with local university students on research projects, offering mentorship and real-world application of their academic knowledge. These partnerships bridge the gap between academic learning and practical application, fostering innovation and growth for both the students and our team. Below are thesis opportunities for students to contribute to cutting-edge advancements while gaining practical experience.
Thesis opportunities
Optimizing Document Chunking Strategies for Enhanced Contextual Retrieval in Compound AI Systems
Research optimal strategies for splitting documents into chunks for contextual retrieval. Analyze how chunk size, boundary definitions, and overlap techniques affect retrieval performance across different document types and query patterns. Explore the trade-offs between granularity and context retention, and the impact of aligning chunks with natural language boundaries. Investigate dynamic approaches to optimize chunking based on document characteristics. Additionally, assess strategies that can further improve performance by prioritizing the most relevant chunks.
A Comparative Study of Multimodal Models and Traditional Machine Learning Approaches for Financial Document Layout Analysis
Compare the performance of multimodal AI models against traditional machine learning approaches in extracting and interpreting specific fields from financial documents. Focus on tasks such as layout analysis of financial statements, invoices, and receipts, as well as information extraction from complex financial document formats. Evaluate the models on criteria including accuracy of field extraction, ability to handle diverse document layouts, robustness to noise or poor image quality, and generalization to unseen document types. The evaluation process will be conducted using Kubernetes-based training pipelines.
BM42 vs. Conventional Methods: Evaluating Next-Generation Hybrid Search Techniques for Information Retrieval
Conduct a comparison of the BM42 hybrid search technique against traditional methods like BM25 and dense vector search, using real-world datasets for evaluation. Examine how BM42’s combination of transformer attention weights with inverse document frequency (IDF) impacts retrieval quality, particularly in handling complex queries, domain-specific terms, and out-of-domain scenarios. Evaluate the method's performance in addressing knowledge gaps present in sparse vs. dense data distributions.