Career Companion App: Week 2 Monday Standup
Date: May 12, 2025

A Few Gains, More Errors, More Lessons Learned
This week I worked on the backend of Career Companion, an AI-powered app that helps users plan their career growth by analyzing resumes and identifying skill gaps. The goal is to build a system that can act like a smart career coach — one that understands what jobs are looking for and tells users how to grow.
I started by converting resumes from a CSV file into clean text files so the app could read them. That part went smoothly. Next, I labeled skills in each resume so the machine learning model would know what to look for. I did this semi-automatically at first, using some basic rules to mark phrases like “project manager” or “Jira” as skills.
The next big step was uploading everything to Amazon S3. That didn’t go perfectly at first. Some folders got nested inside other folders, and the upload script couldn’t find the right files. After a few tries, I used the AWS CLI directly instead of relying on the SageMaker SDK, which made things clearer.
Once the data was safely in the cloud, I tried training a custom skill extractor using Hugging Face Transformers and BERT. This part had more hiccups. The tokenizer kept throwing errors because the labels weren't lined up with the words after they were split into smaller parts. I updated the tokenization function to make sure every token had a label, even if it was just ignored (-100). That helped keep the training process stable.
Right now, the app can pull resume data from S3, use NLP to extract key phrases and entities, and store everything in a PostgreSQL database. It’s connected to AWS Lambda and Amplify, and there's a plan to deploy a machine learning model that will get better over time.
This upcoming week, I’ll be focusing on training a custom model using Amazon SageMaker. Once that’s done, the app should be able to recognize new skills automatically, not just the ones I’ve already told it about. I’ll also work on making sure the database can handle user IDs correctly, especially when connecting to production.
It’s been a bit messy at times, but I’m starting to see how all the pieces fit together. It’s kind of cool to see Python scripts and AWS services come together to build something that actually works. I still have a lot to figure out, but I feel like I’m moving in the right direction.