What percentage of Revelo’s demand was for LLM roles in 2024?

Over 20% of Revelo’s 2024 revenue came from LLM-related roles.

What tasks are included in post-training LLM work?

They include data curation, annotation workflows, evaluation, fine-tuning, and model integration.

Why partner with Latin American developers for AI projects?

They offer lower costs and time-zone alignment for real-time collaboration.

Llm May 4, 2025

Revelo says LLM hiring now drives a fifth of demand from U.S. companies

Revelo says demand from U.S. companies is rising, and AI is driving a lot of it. The company, which connects U.S. employers with a network of more than 400,000 vetted developers in Latin America, told TechCrunch that LLM-related roles made up over 20...

Revelo’s LatAm hiring surge says a lot about where LLM work is actually going

That matters because it shows what AI hiring looks like once the headlines wear off. Most LLM work isn't happening inside frontier labs. It's in the expensive, messy layer after the base model exists: data prep, annotation, evaluation, fine-tuning, prompt and workflow design, safety checks, and product integration. Companies need people who can do that work now, during the same working day, without paying Bay Area salaries for every seat.

Latin America fits that demand well.

Why U.S. companies keep looking south

Cost is the obvious reason. It's also incomplete.

Time zone overlap is usually the stronger argument for teams actually shipping AI products. If you're iterating on model evals, fixing a broken data pipeline, or reviewing weak annotation batches, a 12-hour handoff cycle gets old fast. Teams in Brazil, Colombia, Mexico, and Argentina can join the same standup as product, infra, and ML leads in the U.S. That sounds ordinary. For applied AI work, it matters a lot. Response time often matters as much as raw technical depth.

There's also a hiring problem underneath all this. U.S. companies need more people who can work across software and ML boundaries. Not pure researchers. They need engineers who can read Python, understand a training run, clean a dataset, wire up cloud infrastructure, and notice when the model problem is really a data problem. Those people are hard to find anywhere.

Nearshoring widens the pool without forcing every hire into one local market.

LLM work changes the hiring mix

A lot of executives still talk about AI staffing as if the bottleneck is model scientists. For most companies, it isn't.

The bottleneck is post-training work and productization.

That includes:

curating domain-specific datasets
building annotation and review workflows
running eval suites against changing prompts and model versions
fine-tuning smaller models for internal tasks
setting up retrieval pipelines and guardrails
measuring quality with automated metrics and human review

That's where Revelo gets interesting. A talent marketplace with deep software bench strength can do well here because LLM deployments need hybrid teams. You need backend engineers who understand APIs and latency budgets, data people who can structure training corpora, and ML engineers who can run experiments without setting fire to the cloud bill.

The source material points to a familiar stack: PyTorch, TensorFlow, Label Studio, Prodigy, SageMaker, Vertex AI, Airflow, Prefect, GitHub, GitLab, Terraform, Pulumi. Nothing exotic. That's the point. The money is in applied systems work.

Where distributed AI teams usually break

Distributed AI teams usually fail for boring reasons. The workflow is sloppy.

If you're splitting LLM work across countries, the basics have to hold up.

Reproducible environments

Containerized dev environments are standard. Docker is the minimum. Kubernetes starts to matter when you're coordinating GPU jobs or managing multiple services around a model stack. If one engineer fine-tunes in a local setup nobody else can reproduce, the team will drift fast.

Infrastructure as code helps too. Terraform or Pulumi won't rescue a bad ML process, but they do stop cloud environments from mutating into something nobody understands.

Data pipelines with versioning

This is the part AI demos usually skip. Raw data comes in, gets cleaned, normalized, tokenized, annotated, filtered, split, versioned, and reevaluated. Then the model changes and part of the pipeline runs again.

Without lineage, teams can't answer basic questions:

Which annotation guide produced this dataset?
Which prompts or labels changed after the last evaluation drop?
Did quality improve because of the model, or because reviewers got stricter?

Airflow and Prefect show up here because someone has to orchestrate those jobs. Someone also has to own the unglamorous discipline of dataset versioning.

Human review you can audit

RLHF and similar post-training workflows depend on human judgment, but "human in the loop" doesn't mean much without an audit trail. You need documented labeling instructions, reviewer calibration, dispute resolution, and privacy controls for sensitive material. Otherwise you're just scaling inconsistency.

That gets harder when teams span countries and legal jurisdictions.

Fine-tuning is one part of the job

The reference material includes a Hugging Face Accelerate example for multi-node fine-tuning. Useful signal. It can also give the wrong impression about where most teams are spending time.

Yes, distributed fine-tuning matters. If you're adapting an open model for a narrow domain, a modest training stack can go a long way. Something like Trainer plus mixed precision, batched tokenization, and a sane validation loop covers plenty of practical cases.

But many teams hiring through networks like Revelo aren't training large models from scratch. They're usually doing one of three things:

Fine-tuning smaller models for focused tasks
Building retrieval and orchestration layers around foundation models
Running evaluation, safety, and data quality workflows around commercial APIs

That matters for engineering leaders making headcount plans. If your roadmap is mostly RAG, agent scaffolding, or enterprise copilots, you probably don't need ten research engineers. You may need three strong platform engineers, two ML engineers, and a solid annotation pipeline.

That's a different org chart. It's also a cheaper one.

Cost matters, but compute matters too

Lower compensation relative to U.S. markets gives companies room to hire larger teams. That's obvious. The more interesting question is where the savings go.

Applied AI teams burn money in two places: people and compute. If nearshoring eases payroll pressure, some of that budget can move into GPUs, managed inference, data tooling, and eval infrastructure. In many cases, that's a better use of money than overpaying for a small local team that's understaffed from day one.

There is a catch. Cheap hiring doesn't fix expensive mistakes.

A bad annotation workflow still wastes money. So does over-fine-tuning a model that should've stayed behind a retrieval layer. So does pushing private data through external systems without tight access controls. Lower labor cost helps. It doesn't cover for weak technical judgment.

Why this affects model quality

There's another piece here that gets less attention than labor arbitrage: perspective.

LLMs fail in culturally specific ways. They miss local language patterns, misread informal phrasing, flatten regional context, and stumble on multilingual or code-switched inputs. Latin American engineers and reviewers can catch some of that earlier, especially for products aimed at broad consumer markets or international business users.

That doesn't happen by magic. Diverse teams don't automatically produce better models. But annotation and eval processes with broader linguistic and cultural context tend to surface blind spots sooner.

For customer support automation, coding assistants, knowledge search, and workflow copilots, that's quality control.

The trade-offs

This trend has downsides, and they shouldn't be waved away.

Salary pressure from U.S. employers can distort local markets. That's good for some workers. It's rough on local startups that can't match remote U.S.-backed pay. Retention gets harder. Top engineers have options, and they know it.

There's also a risk that AI hiring platforms reduce complex labor to interchangeable task work. That's especially touchy around annotation, red teaming, and safety review, where companies want expert judgment but treat it like low-status ops work. If firms care about AI quality, they need to pay for that work accordingly and treat it as real expertise.

Security is another obvious issue. LLM projects often involve proprietary datasets, internal documents, regulated information, or customer records. Cross-border work means data governance has to be explicit: least-privilege access, region-aware storage rules, audit logging, signed policies, and clear separation between annotation environments and production systems. If that isn't in place, the hiring model is the least of your worries.

What engineering leaders should take from this

Revelo's numbers are useful because they show where companies are actually spending on applied AI.

Three questions matter.

Are you hiring for the actual bottleneck?

If your failures are in eval quality, dataset curation, and deployment plumbing, stop staffing like you're building a research lab.

Can your process handle a distributed team?

Shared hours help. They don't replace reproducibility, versioning, and documentation.

Are you treating human review like infrastructure?

You should be. Annotation, preference ranking, and quality review shape model behavior as directly as code does.

The broader shift is pretty clear. AI work is spreading beyond labs, and a lot of it looks like disciplined software engineering with ML attached. That favors talent markets that can supply strong engineers quickly, in sync with U.S. teams, at costs companies can live with.

Latin America is in a strong position there. Revelo is one of the clearer signs that the demand is already real.

Keep going from here

Useful next reads and implementation paths

If this topic connects to a real workflow, these links give you the service path, a proof point, and related articles worth reading next.

Relevant service

Expert staff augmentation

Add focused AI, data, backend, and product engineering capacity when the roadmap is clear.

Related proof

Embedded AI engineering team extension

How an embedded engineering pod helped ship a delayed automation roadmap.

Wikipedia’s Signs of AI Writing is a better guide than most AI detectors

Wikipedia’s editors have published something the AI detection industry keeps missing: a practical guide to spotting LLM-written prose that people can actually use. The page is called Signs of AI writing. It grew out of Project AI Cleanup, a volunteer...

Why Juicebox is replacing keyword search with LLM search in hiring

Keyword search has always been a weak fit for hiring. Anyone trying to find a strong infra engineer, applied ML lead, or staff backend developer has seen the problem. The people who can do the work often don’t describe themselves in the tidy terms a ...

Hugging Face CEO on the LLM bubble and why AI may hold up better

Clem Delangue, the CEO of Hugging Face, said this week that we’re in an LLM bubble, not an AI bubble, and that he expects it to start deflating next year. The distinction matters. If he’s right, the damage won’t spread evenly across AI. It’ll hit the...