Member of Technical Staff, Data Pipeline
Member of Technical Staff, Data Pipeline
Boson AI
Santa Clara, CA
See who Boson AI has hired for this role
Boson AI is an early-stage startup building large language tools for everyone to use. Our founders (Alex Smola, Mu Li), and a team of Deep Learning, Optimization, NLP, AutoML and Statistics scientists and engineers are working on high quality generative AI models for language and beyond.
We are seeking machine learning engineers to join our team full-time in our Santa Clara office. As part of your role, you will help us build pipelines of data collection, data filtering, synthetic data generation and data analysis. This will help us build more lifelike AI models. You will work closely with other scientists and engineers to empower our next generation of large multimodal model.
Responsibilities:
We are seeking machine learning engineers to join our team full-time in our Santa Clara office. As part of your role, you will help us build pipelines of data collection, data filtering, synthetic data generation and data analysis. This will help us build more lifelike AI models. You will work closely with other scientists and engineers to empower our next generation of large multimodal model.
Responsibilities:
- Design and develop data collection pipelines to gather and preprocess diverse datasets (beyond language) from various sources (beyond web crawls)
- Design and develop data processing pipelines, including data labeling, data filtering, data cleaning, data visualization, data auditing, etc.
- Implement machine learning models to improve the quality and diversity of data, e.g., quality classifier, document layout model, speech transcribe model
- Strong proficiency in building large-scale data processing pipelines, familiar with distributed workload (e.g., multiprocessing, Ray, Docker, Kubernetes)
- Proficiency in at least one programming language commonly used in machine learning, such as Python and ability to write clean, maintainable code
- Proficiency in at least one deep learning framework, such as PyTorch
- Proficiency in database management
- PhD or Master's degree in computer science or equivalent
- Excellent problem-solving skills and attention to detail, especially when handling data anomalies and biases to further improve data quality
- Familiar with at least one of the following tools for data labeling (e.g., LabelStudio), data collection (e.g., VPNs, Selenium), data processing (e.g., Hadoop, Datasketch).
- Experience in building large-scale datasets.
- Hands-on experience in the cloud, like AWS, Azure or GCP
- Experience in machine learning, e.g., projects in language/vision/audio
- Active Github contributions are a big plus
- Multilingual which contributes to enriching the language diversity crucial for robust model training.
- Experience with fairness, toxicity, data privacy regulations and compliance considerations
-
Seniority level
Not Applicable -
Employment type
Full-time -
Job function
Engineering and Information Technology -
Industries
Transportation, Logistics, Supply Chain and Storage
Referrals increase your chances of interviewing at Boson AI by 2x
See who you knowGet notified about new Member of Technical Staff jobs in Santa Clara, CA.
Sign in to create job alertSimilar jobs
People also viewed
-
Member of Technical Staff
Member of Technical Staff
-
Member of Technical Staff, Model Efficiency
Member of Technical Staff, Model Efficiency
-
Founding Member of Technical Staff
Founding Member of Technical Staff
-
Founding Member of Technical Staff
Founding Member of Technical Staff
-
Member of Technical Staff - IAM
Member of Technical Staff - IAM
-
Principal Member of Technical Staff
Principal Member of Technical Staff
-
Groupware Specialist // Remote
Groupware Specialist // Remote
-
Member of Technical Staff - IAM
Member of Technical Staff - IAM
-
Member of Technical Staff
Member of Technical Staff
-
Principal Member of Technical Staff
Principal Member of Technical Staff
Looking for a job?
Visit the Career Advice Hub to see tips on interviewing and resume writing.
View Career Advice Hub