General Problem Statement

General Problem Statement: Building a Searchable Knowledge Base with Generative AI

Cash Prizes: 1st place US$250, 2nd place US$100, 3rd place US$50

Generative AI, a rapidly evolving field, has witnessed significant breakthroughs in recent years. Its impact spans diverse domains, from natural language processing to computer vision. Large Language Models (such as GPT-4) are trained on vast amounts of text data and can generate coherent and contextually relevant responses. However, to make them truly effective, grounding them on specific customer data becomes essential in customer scenarios. 

Your customer has a collection of the PDF documents (could be FAQ, could be any other documents, and you need to create this collection yourself) which are of varied length and have different content. Your customer wants to be able to search data across all the documents they have and eventually to have system which can answer various questions over their data.
Currently the documents are spread across multiple storage solutions, but ideally, they want to have them all at one place. You need to suggest what could be an appropriate solution to store initial PDF files. Also, your customer wants to be able to upload a new document there any time and within 1-2 days the new document should be searchable.

Part 1:
As the first step customer asks to build a full-text search POC first before implementing solution with LLMs. Data extraction from PDF documents should be done with managed services from Azure AI stack. Once text is extracted from PDFs you need to partition document and generate summary for each section. You need to suggest relevant Azure AI services for this. Data returned as part of the search results should include original document name, original document link, relevant section summary and full section text.

Part 2:
Customer has agreed to proceed with building POC using Gen AI. And now you need to implement solution which will allow customer to “chat” with their data in the natural language and get relevant information from their documents. For this you’ll need to implement vector search over the same document base. Think how to implement RAG pattern in this case.

How do both POCs perform over the same document set comparing to each other based on the user queries?
How to make documents searchable as soon as they are uploaded to a central storage?
What are other important considerations for the solution?

Note: Your submission could include only part 1, or part 2, or part 1 and part 2.

Tech Stack: Azure AI Services, Azure OpenAI, Azure AI Search

Sponsored by:

Powered by:

Stay connected: