We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

We are building a rigorous verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects non-English data processing and comple

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model optimization. We are seeking legal and compliance professionals to contribute expert judgment to human-in-the-loop AI evaluation

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model optimization. We are seeking legal and compliance professionals to contribute expert judgment to human-in-the-loop AI evaluation

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model optimization. We are seeking legal and compliance professionals to contribute expert judgment to human-in-the-loop AI evaluation

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model optimization. We are seeking legal and compliance professionals to contribute expert judgment to human-in-the-loop AI evaluation

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking software engineering and DevOps professionals to contribute expert judgment to human-in-the-loop AI e

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking software engineering and DevOps professionals to contribute expert judgment to human-in-the-loop AI e

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking finance and investment professionals to contribute expert judgment to human-in-the-loop AI evaluation

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking finance and investment professionals to contribute expert judgment to human-in-the-loop AI evaluation

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking software engineering and DevOps professionals to contribute expert judgment to human-in-the-loop AI e

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking software engineering and DevOps professionals to contribute expert judgment to human-in-the-loop AI e

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking finance and investment professionals to contribute expert judgment to human-in-the-loop AI evaluation

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking finance and investment professionals to contribute expert judgment to human-in-the-loop AI evaluation

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking software engineering and DevOps professionals to contribute expert judgment to human-in-the-loop AI e

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking software engineering and DevOps professionals to contribute expert judgment to human-in-the-loop AI e

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking finance and investment professionals to contribute expert judgment to human-in-the-loop AI evaluation

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model monitoring. We are seeking finance and investment professionals to contribute expert judgment to human-in-the-loop AI evaluation

Apply Now

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model optimization. We are seeking legal and compliance professionals to contribute expert judgment to human-in-the-loop AI evaluation

OverviewLILT is building a global network of domain experts to support high-quality AI evaluation across training benchmarking red-teaming and ongoing model optimization. We are seeking legal and compliance professionals to contribute expert judgment to human-in-the-loop AI evaluation

Apply Now
Contract

About the RoleWe are looking for data-driven Project Managers to lead our large-scale multilingual data collection and Large Language Model (LLM) evaluation this role you will be the operational backbone of our AI development orchestrating global teams of annotators and data speciali

About the RoleWe are looking for data-driven Project Managers to lead our large-scale multilingual data collection and Large Language Model (LLM) evaluation this role you will be the operational backbone of our AI development orchestrating global teams of annotators and data speciali

Apply Now

About LILTAI is changing how the world communicates and LILT is leading that transformation.Were on a mission to make the worlds information accessible to everyone regardless of the language they speak. We use cutting-edge AI machine translation and human-in-the-loop expertise to tra

About LILTAI is changing how the world communicates and LILT is leading that transformation.Were on a mission to make the worlds information accessible to everyone regardless of the language they speak. We use cutting-edge AI machine translation and human-in-the-loop expertise to tra

Apply Now