I am a Staff Software Engineer at Duolingo's Cloud Operations team, building infrastructure automation and platform engineering at scale. I championed Temporal adoption from 0->1 — growing it into an org-wide workflow platform powering 140+ production workflows across 10+ teams and establishing Duolingo as a recognized reference implementation in the Temporal ecosystem. I currently lead the Temporal infrastructure project team.
Previously, I was a Senior Backend Engineer and Scrum Master at data.ai (formerly App Annie), where I led microservice migrations and built platform integrations at scale. Before that, I was a Software Engineer at VIPKID.
I hold a Master of Software Engineering from Carnegie Mellon University, a B.S. in Information and Computing Science from China University of Mining and Technology, Beijing, and a B.S. in Economics (double major) from Peking University.
- Serving as technical lead for a Temporal infrastructure project team of 5 engineers: defining project direction, mentoring engineers on Temporal internals and infrastructure patterns, and partnering with teams across the org to design and onboard new use cases.
- Leading zero-downtime, transparent-to-service-teams namespace-by-namespace ECS->EKS migration of the Temporal worker fleet (multiple namespaces fully on EKS); implementing KEDA-based autoscaling to handle unpredictable bursty workflow loads across hundreds of workers.
- Leading zero-downtime migration of datastores (RDS, DynamoDB, S3) and services from a monolithic AWS account to dedicated accounts for improved security isolation and scalable multi-account management; leveraging Terragrunt to abstract account-level infrastructure (VPC, subnets, peering, etc.); automating CIDR planning based on resource topology, programmatically generating migration playbooks for service teams, and continuously improving migration guides.
- Championed Temporal adoption at Duolingo from 0->1: introduced the technology, built foundational infrastructure from scratch, fostered a cross-org community, and scaled it into an org-wide durable workflow platform powering 140+ production workflows across 10+ teams with hundreds of workers — establishing Duolingo as a recognized reference implementation in the Temporal ecosystem.
- Designed and built the Temporal Self-Service Platform — UI, gateway microservice, router workers, shared libraries, and onboarding tooling — enabling 15+ CloudOps self-service workflows (RDS resize, upgrade, deletion, PG replication, Redis->Valkey migration) and saving thousands of engineering hours annually.
- Improved developer experience in the Temporal monorepo: refactored Dockerfiles, folder structure, and deployment scripts to scale with growing team adoption; shipped Agent Skills for workflow authoring; reduced CI, build, and pre-commit times by 50%; and introduced higher-level abstractions via Temporal Nexus for cross-namespace workflow communication.
- Implemented full data store disaster recovery via Temporal workflows; designed workflow orchestration patterns for multi-step RDS, DynamoDB, S3, and Redis restoration across AWS regions.
- Stepped into de facto DBRE ownership; built production-grade monitoring for all Aurora PostgreSQL clusters from scratch — custom exporters, Grafana dashboards (table size trends, slow queries, blocked queries); automated DB upgrades with zero downtime and AI-driven Terraform PR generation; upgraded 30+ Aurora PostgreSQL clusters v11->v14 in under one month with zero learner impact.
- Devised and built a new unified connection workflow to migrate 30+ platforms to microservices with a scalable solution to complete Apple and Google's Multi-Factor Authentication, reducing time cost by 50+% and improving success rate by 30+%.
- Migrated legacy services from a monolithic Django app to Flask and Go microservices, improving performance by 50%.
- Enhanced the Node.js scraping framework with Puppeteer for browser-based scraping and integrated Firebase, making data.ai the first in the industry to support it.
- Developed internal troubleshooting tools and monitoring dashboards, reducing MTTR from over 3 days to 1–2 days.
- Served as Scrum Master for a team of 6+, facilitating sprints, daily meetings, and retrospectives.
- Developed microservices with Flask, gRPC, Java, SpringBoot to support VIPKID App, VIPKID School, internal systems, etc.
- Constructed asynchronous task system with Celery for Dino English App and its WeChat platform.