AI Projects
💡 Side Project
People Analtyics: Sentiment Analysis on open-ended employee survey responses
Built a sentiment analysis pipeline using an open-source large language model to analyze a dataset containing 400 open-ended employee survey responses. I’ve been building meaningful relationships in the People Analytics space and a common question across teams and industries is the desire to utilize open-ended survey responses on employee surveys. Open-ended survey responses contain the richest qualitative signal in any employee dataset, but hand-coding thousands of responses is prohibitively slow. Further, datasets can vary in size, which means fine-tuning an NLP model is not an option. The challenge is finding something that can work out of box to automate coding sentiment.
Github Repository 🐙
Github Repository 🐙
The Question
How can People Analytics teams extract meaningful signal from open-ended employee survey responses without spending months on manual coding?
What I Did
Using an open source LLM via HuggingFace, I was able to use a model to automate sentiment on open-ended employee survey responses.
Impact & Metrics
The tool makes qualitative survey data tractable at scale — giving People Analytics teams access to the full richness of what employees actually write, not just how they rate things on a Likert scale.
🔬 Professional Work
NLP Classification Pipeline using ChatGPT API
Reduced manual text processing time by 50% (4 to 2 months) by building an NLP classification pipeline using the ChatGPT API, achieving 77% model accuracy on a language classification task. This project directly addresses whether AI can reliably categorize nuanced human language at scale, replacing months of manual coding with automated pipelines.
The Question
Can LLMs reliably classify complex human language at scale to replace manual human coding?
What I Did
Developed a structured API pipeline to feed utterance level text into GPT model and used prompt engineering to acheieve accuracy in language classification task.
Impact & Metrics
77% overall model accuracy achieved against human-coded benchmarks. This reduced our manual text process time by 50%, which allowed us to expedite time to completion for research projects.
💡 Side Project
Automated Voice-Part Playlist Builder (built by AI)
Built a customized playlist with .mp4 files for my choir. We receive our practice tracks as individual .mp4 files on a shared Google Drive folder for each voice part. I saw an opportunity to use AI to build a playlist that would take the .mp4 files and make a Playlist by voice-part for the choir. This will make it easier for members to practice and we can track choir member usage via the app.
Launch Live App ↗
Launch Live App ↗
The Question
Can AI generate a working prototype of a customized playlist for our choir, so members do not need to manually download and organize individual files each week?
What I Did
Used Claude to generate an R Shiny App to make a prototype for my choir director. The entire project took ~30mins. The prototype allowed me to share a working model to receive feedback.
Impact & Metrics
The tool makes it easier for choir members to practice. We saw an increase in practice through self-reported measures, which we could validate by tracking the usage of the application.
🤝 Contract Work
AI Evaluation Dataset Question Developer
Designed a multiple-choice evaluation dataset for an AI company to test language model understanding of cognitive development concepts, achieving clear differentiation of model failures (i.e., ~20% accuracy) as measured by performance on conceptually discriminative questions, by creating 20-25 expert-level items with carefully constructed distractor answers that targeted common conceptual confusions rather than factual recall.
The Question
Can a language model truly *understand* cognitive development concepts — or is it pattern-matching on surface-level cues? I was brought in to design an evaluation dataset that could tell the difference.
What I Did
Designed 20–25 expert-level multiple-choice items grounded in developmental psychology, with adversarial distractor answers constructed to exploit common conceptual confusions — not factual gaps. Each question was designed to distinguish true understanding from surface-level pattern matching.
Impact & Metrics
Consistent model failure on non-obvious conceptual distinctions confirmed the dataset had high diagnostic value. The items successfully differentiated models that understood the *structure* of developmental concepts from those recombining familiar terms.