| Nicole Burke, PhD

💡 Side Project

People Analtyics: Sentiment Analysis on open-ended employee survey responses

Built a sentiment analysis pipeline using an open-source large language model to analyze a dataset containing 400 open-ended employee survey responses. I’ve been building meaningful relationships in the People Analytics space and a common question across teams and industries is the desire to utilize open-ended survey responses on employee surveys. Open-ended survey responses contain the richest qualitative signal in any employee dataset, but hand-coding thousands of responses is prohibitively slow. Further, datasets can vary in size, which means fine-tuning an NLP model is not an option. The challenge is finding something that can work out of box to automate coding sentiment.

Github Repository 🐙

Python Open Source LLMs NLP Data Pipelines

The Question How can People Analytics teams extract meaningful signal from open-ended employee survey responses without spending months on manual coding?

What I Did Using an open source LLM via HuggingFace, I was able to use a model to automate sentiment on open-ended employee survey responses.

Impact & Metrics The tool makes qualitative survey data tractable at scale — giving People Analytics teams access to the full richness of what employees actually write, not just how they rate things on a Likert scale.

🔬 Professional Work

NLP Classification Pipeline using ChatGPT API

Reduced manual text processing time by 50% (4 to 2 months) by building an NLP classification pipeline using the ChatGPT API, achieving 77% model accuracy on a language classification task. This project directly addresses whether AI can reliably categorize nuanced human language at scale, replacing months of manual coding with automated pipelines.

R NLP chatGPT API Data Pipelines

The Question Can LLMs reliably classify complex human language at scale to replace manual human coding?

What I Did Developed a structured API pipeline to feed utterance level text into GPT model and used prompt engineering to acheieve accuracy in language classification task.

Impact & Metrics 77% overall model accuracy achieved against human-coded benchmarks. This reduced our manual text process time by 50%, which allowed us to expedite time to completion for research projects.

💡 Side Project

Automated Voice-Part Playlist Builder (built by AI)

Built a customized playlist with .mp4 files for my choir. We receive our practice tracks as individual .mp4 files on a shared Google Drive folder for each voice part. I saw an opportunity to use AI to build a playlist that would take the .mp4 files and make a Playlist by voice-part for the choir. This will make it easier for members to practice and we can track choir member usage via the app.

Launch Live App ↗

R Shiny App AI Prototype Engineering

The Question Can AI generate a working prototype of a customized playlist for our choir, so members do not need to manually download and organize individual files each week?

What I Did Used Claude to generate an R Shiny App to make a prototype for my choir director. The entire project took ~30mins. The prototype allowed me to share a working model to receive feedback.

Impact & Metrics The tool makes it easier for choir members to practice. We saw an increase in practice through self-reported measures, which we could validate by tracking the usage of the application.

🤝 Contract Work

AI Evaluation Dataset Question Developer

Designed a multiple-choice evaluation dataset for an AI company to test language model understanding of cognitive development concepts, achieving clear differentiation of model failures (i.e., ~20% accuracy) as measured by performance on conceptually discriminative questions, by creating 20-25 expert-level items with carefully constructed distractor answers that targeted common conceptual confusions rather than factual recall.

AI Evaluation Dataset Prompt Engineering

The Question Can a language model truly *understand* cognitive development concepts — or is it pattern-matching on surface-level cues? I was brought in to design an evaluation dataset that could tell the difference.

What I Did Designed 20–25 expert-level multiple-choice items grounded in developmental psychology, with adversarial distractor answers constructed to exploit common conceptual confusions — not factual gaps. Each question was designed to distinguish true understanding from surface-level pattern matching.

Impact & Metrics Consistent model failure on non-obvious conceptual distinctions confirmed the dataset had high diagnostic value. The items successfully differentiated models that understood the *structure* of developmental concepts from those recombining familiar terms.

AI Projects