Development Experience
Retrieval Augmented Chatbot
Hopkins Press contains thousands of proprietary books. This book content cannot be accessed by customers until they have already purchased the book. We are testing a Retrieval Augmented Generation (RAG) Chatbot on a set of our book pages. The chatbot allows users to "interrogate" the book contents with their research questions without sharing the book content itself. An example of the chatbot can be found on the page Teaching with AI.
The chatbot itself is a React app using the AI SDK to stream the chat responses to the frontend. The pages are built by Next.js according to their BookID; Next.js also handles the abstraction of the server-side functions, which connect to the OpenAI API and to our database. The database contains the contents of each book in individuals paragraphs—as well as the vector embeddings of each paragraph. When a user asks a question, the following happens:
- The question is converted into a vector embedding.
- The vector embedding is compared against the book content embeddings to find the most relevant content.
- The user question and book contents are passed to the OpenAI's Chat Completion AI along with instructions on how to respond to the user.
- The answer is streamed to the user.
JobNote
I built this hobby project as a way to help job seekers improve their application experience. The web app does not feed user data into an LLM like ChatGPT, which would generate résumés that are error prone and sloppy. Instead, the user's résumé is split up and saved to a database along with vector embeddings representing the content. When a user optimizes their résumé for a particular job, the server compares the embeddings of the job description with the embeddings of the résumé and builds a new document tailored for that particular job.
JobNote is a Next.js Typescript application using the app router. The embeddings are generated with the OpenAI API and stored in a postgres database, along with user information and résumé content. The backend of the site is an Express server. There is also a JobNote Chrome Extension that allows users to save jobs to their account as the browse the web.
Association Multisite
Hopkins Press works extensively with academic associations to produce their journals—as part of this we also maintain websites for them. Previously, the method was to "copy and paste" one association site to create another. This created two problems: bug fixes would not be reflected across all the sites and updating Drupal versions required updating each site individually.
Because the associations re-used much of the same functionality, we transitioned to using a Drupal multisite to streamline the development process. I built out the templates, coded custom modules in PHP, and wrote out the CSS that would apply to all the sites, keeping accessibility and usability front and center. Maintain the server (Linux) and Apache configuration, keeping things up to date according to the quarterly security audits. As a result of adopting this multsite structure, time spent creating a new site is under 30 minutes and Drupal updates only need to be completed once for the entirety of the sites.
Video Format Conversion Automation
FSU was maintaining a collection of over 800 videos developed from 2007-2015 through the GEOSET initative. These videos could only run through the outdated Mediasite player and cost the university more than $10,000 per month. To preserve these records, I wrote a collection of python scripts to download the collection of files, convert the slides (images) into video based on timings pulled from xml, and overlay the slides with a speaker video if necessary. The project used a Selenium bot and a number of video manipulation libraries for python.
eTextbook Retrieval Automation
This project serves to automate a number of process associated with FSU Libraries’ eTextbook program. Previously, the Libraries used part-time staff to manually compile thousands of entries. I drastically reduced time spent on this project by automating calls to an API, compiling results in the needed CSV files, and running a bot with Selenium to check access models (which were generated separately from the API). eTextbook Automation Github Repo.
Math Fun
I began developing Math Fun as a digitization of Math Fun Day with the FSU Math Department. After the event, I have continued to maintain and add to the project. The purely front-end project is built on React.js, and it uses p5.js for visualization of some of the components. Math Fun Github Repo.