Pedro Cruz

I contributed to open-sourcing Apple's Matryoshka Diffusion Models as a Python library. I specifically wrote unit tests for their tokenizer, refactored the project's dependency management, cleaned code style for disambiguation, and worked on documentation for the package. GitHub Repository.
Keywords: Python, ML, Apple, Unit testing.

I developed an NLP-powered recommendation system to find similar titles based on a user-provided show description. I wanted to play around with available libraries related to ML and NLP, and was able to learn a ton in the process. I definitely would have changed the approach a bit to try to use more sophisticated methods (using LCA rathen than PCA for dimensionality reduction, for example). GitHub Repository.
Keywords: Python (NLTK, SKlearn, Pandas, NumPy), ML

I contributed to Omnilingo, a listening-based language learning app. Think about Duolingo, but open-source. I learned a lot about web development, application design, and integrating pedagogical goals through engineering. GitHub Repository.
Keywords: Web Development, HTML/CSS/JavaScript

I developed a digital dictionary to promote community language documentation in Oaxaca, Mexico; the project was supported by National Geographic. This was one of my first summer projects when in college. I worked closely with language activits from Santiago Matatlán, Mexico, to support the revitalization of their native Zapotec to a community of +10,000 speakers. My work was very deta-related. I helped them think about what should go in their dictionary, then helped them collect the data, and made decisions as to how to effectively store the database in disk and online. Online Dictionary.
Keywords: Data Collection; Fieldwork; Language Documentation.

I developed an NLP-powered recommendation system to find similar titles based on a user-provided show description. GitHub Repository.
Keywords: Python (NLTK, SKlearn, Pandas, NumPy), ML

I created a web application for the Frente pelo Avanço dos Direitos Políticos das Mulheres to hold local congresspeople accountable for constitutional amendment projects. They were able to gather +900 signatures collected from all Brazilian states using this application. I learned a lot about web development doing this project. In hindsight, I could have been more careful with how data was being stored and handled through the forms. There's pretty much no cryptography in the app, which I would implement nowadays to make it safer. Also, I would have done a much better top-down design to make the app simpler (it's very unnecessarily complex.) GitHub Repository.
Keywords: Web Development, HTML/CSS/JavaScript

I worked under Prof. Vasanta Chaganti in analyzing Starlink internet latency. As a member of the lab, my goal was to identify performance measures that are salient, accurate, and statistically significant - for constituencies that range from consumers making purchase decisions, to regulators, to those who diagnose issues in the network. I specifically analyzed 2022-2024 data related to Starlink idle latency while creating Jupyter Notebooks.
Keywords: Python, Data Analysis, Pandas, NumPy.

During my sophomore year, I was part of the Swarthmore Phonetics Laboratory. I worked on Ultrace, a tool for manual annotation of 2D UTI (Ultrasound Tongue Imaging) data. This software can be used by linguists interested in annoting phonological data for their research. I worked under Prof. Jonathan Washington and supported the correction of bugs and roadmap development for the software. Github Repository.
Keywords: Python, Software Development, Linguistics.

I researched and designed the new visual identity for LALA, an educational institution in Latin America worth over US$ 1 mi. Behance.

Stone is one of the biggest fintechs in Brazil. I led the research for an AI in-app payments assistant, aiming to support over 2.1 million active users in managing their daily expenses. This product was projected to generate US$1 billion annually to the company. Part of my work involved exploring product-market fit, conducting benchmarking analysis in Brazil and internationally, leading product ideation, prototyping, and validating the product's first version with seven clients. I also worked on the financial projection.

My project was to help the organization understand what were the funding opportunities in Brazil to leverage their products. I developed a database analyzing congressional spending to prioritize healthcare tech funding opportunities. I wrote Python scripts for web scraping and data analysis to gather data from over 535 entities and individuals across the internet and did data analysis. Ultimately, I identified key opportunities totaling US$4 million annually for the organization, which provided actionable insights to guide the second-semester funding strategy.

Projects

Apple Inc. Matryoshka Diffusion Models (2024)

TV Show Recommender (2024)

Omnilingo (2023)

Zapotec Talking Dictionaries (2022)

Nheengatu-Portuguese Rule-Based Translator (2022)

Political Pressure Tool (2021)

Research

Starlink Internet Latency Measurement (2025)

UltraTrace (2022)

Design

Latin American Leadership Academy (LALA) Rebranding (2022)

Jobs

Stone Co.

ImpulsoGov