AI Projects

  1. Side Project: Employee Sentiment

  2. NLP classification pipeline

  3. shiny app built in 25 mins

  4. AI evaluation dataset • Designed a multiple-choice evaluation dataset for an AI company to test language model understanding of cognitive development concepts, achieving clear differentiation of model failures (i.e., ~20% accuracy) as measured by performance on conceptually discriminative questions, by creating 20-25 expert-level items with carefully constructed distractor answers that targeted common conceptual confusions rather than factual recall. • Improved evaluation quality for model benchmarking by ensuring high diagnostic value of questions as measured by consistent model failure on non-obvious conceptual distinctions, by leveraging domain expertise in developmental psychology to construct adversarial answer choices that distinguished true conceptual understanding from surface-level pattern matching.