Data Engineer

Updated: 20 days ago
Location: Princeton, NEW JERSEY
Job Type: FullTime

The Accelerator seeks a Data Engineer to work with team members to assist in developing, deploying, and improving data-intensive applications and processes. As part of a small cross-functional team, this individual will participate in product design and iterative development to support the mission of powering policy-relevant research by building shared infrastructure.  

 

As someone growing in their expertise, this individual usually plans and executes tasks requiring judgment, adapting standard techniques, and sometimes creating new methods to solve problems. They have enough experience to be confident in their abilities and have completed projects. They typically work independently, receiving instructions on the expected outcomes, occasional technical guidance for uncommon issues, and approval from supervisors before starting projects. They collaborate with others to resolve important questions and coordinate work. They may use advanced techniques.

 

A remote work arrangement within the United States may be considered for candidates with the appropriate background and experience. University-paid business travel to Princeton, NJ may be required approximately 2-4 times per year. 

 

The term of this appointment is 1 year, with the possibility of renewal based upon satisfactory performance and funding. 



Data Lake Design, Implementation and Maintenance

 

  • Using the latest Cloud storage and processing techniques, help design and implement a Data Lake architecture allowing our project to store and process terabytes of data daily. Utilize Parquet tables using a common storage format and enable processing using Python and Spark. 
  • Support in monitoring performance and optimizing bottlenecks in storage and data transfer for scale and cost-effectiveness. Help onboard new users and document processes on the platform.

Data Science and Data Augmentation Analysis

  • With support, analyze data sets within the platform to identify useful insights and patterns with computational modeling using ML libraries such as Pytorch and Tensorflow. These insights will then be incorporated into our product designs and these conceptional model code will then be made ready for use in a production environment.
  • Help test, QA'd and document data flows and model definitions to ensure that ongoing support is available.

Cloud Based Data Processing

 

  • We plan to use services offered by our cloud-based partners to enhance and expand our data with the help of third-party data sources and insights. To support this effort, assist in creating and documenting processes, contribute insight on offers from third-party providers, and help develop code that enables our platform to communicate with our partners through REST APIs and/or SDKs.

Data Ingestion Pipeline Development

  • Help create pipelines to pull data from various sources such as websites and APIs to then be transformed into our core storage format. Write these pipelines using Python and using tools such as Apache Spark and the Data Lake for efficient processing of the data.
  • Assist in maintaining pipelines, ensure that they run efficiently and at the correct time. Perform data validation and data quality tasks with some supervision to ensure the accuracy of the data.

Web based Crawler Development

  • Develop, with supervision, web crawlers to extract HTML from websites using tooling, such as Python, BeautifulSoup, Selenium, or similar. The crawlers should work on both plain HTML and AJAX-based websites. Provide ongoing support and maintenance for the crawlers, under supervision, to ensure they continue to function properly.


Essential Qualifications:

  • A combination of relevant internship or work experience and education that would equal 1-3 years of relevant experience with a record of accomplishment 
  • Proficiency in Python.
  • Experience with distributed systems.
  • Strong knowledge of data storage technologies
  • Familiarity with relational databases and Elasticsearch.
  • Experience tuning data systems for performance and reliability.
  • Development experience with PyTorch and TensorFlow on both CPU and GPU targets.
  • Knowledge of text processing and image processing techniques.
  • Experience with extracting data from APIs
  • Education: Bachelor's degree or equivalent work-related experience 

Preferred Qualifications:

  • Experience in data lakes and data mesh architectures
  • Experience with web scraping

We at the School of Public and International Affairs believe that it is vital to cultivate an environment that embraces and promotes diversity, equity and inclusion - fundamental to the success of our education and research mission. This commitment to diversity informs our efforts in recruitment and hiring as we actively seek colleagues of exceptional ability who represent a broad range of viewpoints, experiences and value systems, and who share Princeton University's dedication to excellence.

 

Princeton University is an Equal Opportunity/Affirmative Action Employer and all qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability status, protected veteran status, or any other characteristic protected by law. KNOW YOUR RIGHTS



36.25

No

Yes

180 days

No

No

No

Mid-Senior Level

#Ll-DP1

Join our Talent Network to receive updates about working at Princeton.
Princeton University job offers are contingent upon the candidate’s successful completion of a background check, reference checks, and pre-employment screening, as applicable.
If you have questions or comments regarding the iCIMS Privacy Policy or iCIMS FAQs , please contact [email protected] .
Go to our careers site.

Similar Positions