Senior Data Engineer / Data Curator
Senior Data Engineer / Data Curator
A job at TSMC Arizona offers an opportunity to work at the most advanced semiconductor fab in the United States. TSMC Arizona’s first fab will operate it’s leading-edge semiconductor process technology (N4 process), starting production in the first half of 2025. The second fab will utilize its leading edge N3 and N2 process technology and be operational in 2028. The recently announced third fab will manufacture chips using 2nm or even more advanced process technology, with production starting by the end of the decade. America’s leading technology companies are ready to rely on TSMC Arizona for the next generations of chips that will power the digital future.
As a Senior Data Engineer in the AI Data Curation track, you will ensure that the data powering our AI models is high-quality, well-organized, and fit for use in model training and deployment. You will play a key role in designing and maintaining scalable data pipelines, ensuring that data is clean, relevant, and aligned with ethical and compliance standards.
Responsibilities:
- Design and implement data pipelines for processing, cleaning, and curating large datasets used in model training and fine-tuning.
- Automate data cleaning processes (e.g., removing noise, duplicates, irrelevant content) and ensure datasets are appropriately labeled and structured.
- Collaborate with model teams to ensure data aligns with model requirements and performance goals.
- Assess and mitigate bias in datasets, ensuring that models are trained on diverse and representative data.
- Manage data storage and retrieval strategies, ensuring scalability and data consistency across different environments.
- Conduct regular audits to ensure data integrity, privacy, and security compliance.
Minimum Qualifications/Requirements:
Education: Minimum degree required: Bachelor's degree in Computer Science, Data Science, or a related field.
Technical Skills:
- 5+ years of experience in data engineering, data wrangling, or data curation, particularly in machine learning or AI-driven environments.
- Strong proficiency in Python (Pandas, NumPy) and SQL for data manipulation and querying.
- Familiarity with cloud-based data storage (AWS S3, Google Cloud Storage, etc.) and distributed systems for managing large datasets.
- Experience with data annotation tools and platforms for manual or semi-automated labeling.
- Experience with NLP data formats, such as JSONL, text, or embeddings, and an understanding of tokenization.
- Experience managing data pipelines with tools like Apache Kafka, Apache Airflow, or similar ETL tools.
- Strong knowledge of AI ethics, data privacy, and compliance standards (GDPR, CCPA, etc.).
- Bonus: Experience with vector databases and indexing for LLMs (e.g., FAISS, Pinecone).
Interpersonal Skills:
- Communication
- Computer proficiency
- Presentation skills
- Listening
- Teamwork
Candidates must be willing and able to work on-site at our Phoenix Arizona facility.
As a valued member of the TSMC family, we place a significant focus on your health and well-being. When you are at your best-physically, mentally, and financially-our company is at its best. We offer a comprehensive and competitive benefits program that provides the resources you need to help you manage your health and achieve your goals across many areas of your life. This includes a variety of medical, dental and vision plan offerings you can choose from that best fit your and your family’s needs. Additionally, TSMC provides income-protection programs to financially assist you should you experience an injury or illness, and a 401(k)-retirement savings plan to help you secure your financial future. TSMC also offers competitive paid time-off programs and paid holidays allowing you to recharge and spend time with your family and loved ones.
Work Location: 5088 W. Innovation Circle, Phoenix, AZ 85083
TSMC is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other protected characteristic. We encourage all qualified individuals to apply, and we welcome applications from individuals with diverse backgrounds and experiences. Candidates must be able to perform the essential functions of the job with or without a reasonable accommodation. If you need a reasonable accommodation as part of this application process, please contact P_LOA@tsmc.com.
Nearest Major Market: Phoenix