Thesis Opportunities

Here we have exciting opportunities to work with SoftRobot. During the internship, you will be supervised by one of our machine learning engineers and work with state-of-the-art tools in NLP and CV. This is a great way to see how AI is used on a practical level to provide insights to end-users. With us, you will collaborate as a team, contribute to the development discussion, and peer into the DevOps infrastructure.

Who You Are

We are looking for passionate and enthusiastic students who are constantly learning in the fields of computer science and AI. We encourage curiosity and flexibility with where you take your ideas and how you transform them into something valuable. We require that our interns have an understanding of Python. Some knowledge of SQL and the Python frameworks/libraries within the scope of AI (e.g. Scikit-learn, Pytorch, Keras, Tensorflow, Huggingface) is a plus.

Thesis Suggestions

1. Document Level NLI with a Fixed Set of Hypotheses

We would like to investigate the feasibility of constructing a model for a fixed set of hypotheses (statements that are either true, false or not mentioned w.r.t a document); when given a document, the model provides relevant passages (if they exist) for each hypothesis together with a suggested answer.

2. Visually-Rich Document Understanding with Multimodal Transformers

We would like to investigate and fine-tune a pretrained model for one of two use cases:

  • Document Classification. An investigation of ways to finetune a pre-trained multimodal transformer to perform classification of visually rich documents. A challenging aspect of this is that while sometimes documents from different classes are clearly discernible both visually and semantically, sometimes there is very little difference between them. A potential part of this project could be to find a method for determining situations where the algorithm is likely to be correct and vice versa.
  • Domain (Economic) Specific Predictions. We are interested to see to what degree a pre-trained multi modal transformer can be fine-tuned to link visually rich documents to a domain specific classification/clustering, possibly on row/sentence level. A possible extension of this project is to then link these predicted results to monetary data found in the document.

3. Data Augmentation for Visually-Rich Documents

We would like to investigate the feasibility of constructing synthetic financial documents with advanced language models (e.g. BERT, GANs, GPT) as valuable input for other services where data is scarce.


When you apply, we ask you to attach a resume and cover letter to Please include the thesis suggestion you choose in the cover letter and why it seems interesting to you. The application review process will be conducted in the late fall.