ME00573-Data Scientist 3
Indexed description
Job Summary
- Seeking a Data Scientist to support the development of advanced Natural Language Processing (NLP) solutions focused on the automated processing and analysis of spoken and written language data
- This role is responsible for building and improving models that accurately tokenize language data and annotate linguistic features, enabling scalable and high-quality language understanding capabilities
- The position involves evaluating model performance against human-generated annotations to continuously enhance accuracy and effectiveness
- Develop and implement NLP models to automatically tokenize language data from both spoken and written sources
- Design and build automated solutions for annotating language data with parts-of-speech (POS) tagging and linguistic features
- Train, test, and validate machine learning models for language processing tasks
- Evaluate and improve model performance by benchmarking against human-generated annotations for speech and text data
- Process and analyze large volumes of structured and unstructured language data
- Develop data pipelines to support ingestion, preprocessing, and transformation of linguistic datasets
- Collaborate with cross-functional teams to integrate NLP models into production systems
- Perform error analysis and model tuning to improve accuracy and robustness
- Document methodologies, model performance, and data processing workflows
- Must have active Top Secret/SCI clearance with Full Scope Polygraph (MD Customer)
- Master’s degree with 6 years of relevant experience, Bachelor’s Degree with 8 years of relevant experience, or Associate's Degree with 10 years of in-depth relevant experience that is clearly related to the position
- Experience with Natural Language Processing (NLP) and machine learning techniques
- Strong proficiency in Python and relevant NLP/ML libraries (e.g., NLTK, spaCy, Hugging Face, or similar)
- Experience developing models for tokenization, text processing, and linguistic annotation
- Experience working with speech and/or text data
- Understanding of model evaluation techniques, including comparison to ground truth or human-labeled data
- Strong analytical and problem-solving skills
- Experience with part-of-speech tagging and linguistic modeling
- Experience working with multilingual or speech-based datasets
- Familiarity with deep learning frameworks (e.g., PyTorch, TensorFlow)
- Experience deploying NLP models in production environments
- Knowledge of data pipelines and large-scale data processing
The Pay Range For This Role Is
135,000 - 180,000 USD per year(Ft. Meade MD)
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search