How to Scrape LinkedIn Profiles and Jobs Without Getting Banned

advanced35 minutes

Prerequisites

• Python 3.10+ installed
• Strong understanding of HTTP requests, cookies, and session management
• Familiarity with browser DevTools and network inspection
• A LinkedIn account (free or premium)
• Residential proxy access for production use

LinkedIn holds one of the most valuable data sets on the internet — 900M+ professional profiles, millions of job listings, and company data that powers recruiting, sales intelligence, and market research. It is also one of the hardest platforms to scrape. LinkedIn's detection system is aggressive: accounts are restricted or permanently banned for automated activity, and their legal team has a history of pursuing scrapers in court. This guide covers the technical approaches that work in 2026, the detection mechanisms you need to understand, and the infrastructure decisions that determine whether your scraping operation survives past the first week.

Understand LinkedIn's Detection System

LinkedIn's anti-scraping measures operate on three levels, and understanding each is essential before writing any code: **Account-level detection:** - Profile view velocity — LinkedIn tracks how many profiles you view per day. Free accounts trigger warnings at ~80-100 views/day. Premium accounts have higher thresholds but are still monitored. - Navigation patterns — Real users do not view 50 profiles sequentially without interacting with any. LinkedIn tracks dwell time, scroll depth, and click patterns. - Session anomalies — Logging in from 5 different IP addresses within an hour flags the account. **Network-level detection:** - IP reputation scoring — Datacenter IPs are flagged immediately. Residential IPs are scored based on historical behavior from that subnet. - TLS fingerprinting — LinkedIn inspects JA3/JA4 fingerprints. Python's `requests` library has a distinctive fingerprint that does not match any browser. - Geographic consistency — An account based in New York making API calls from a Singapore datacenter raises flags. **API-level detection:** - LinkedIn's internal API (Voyager) has rate limiting and payload validation. Malformed requests or unusual parameter combinations trigger alerts. - CSRF token rotation — LinkedIn rotates CSRF tokens and validates them against the session. Stale tokens indicate automated replay. The consequence of getting caught is severe: account restriction (temporary or permanent), IP blocks, and potentially legal action under the CFAA.

Tip: LinkedIn's detection is more account-focused than IP-focused. Losing an aged LinkedIn account with real connections is far more costly than burning a proxy IP. Protect accounts aggressively — use conservative rate limits and realistic behavior patterns.

Choose Your Scraping Surface: Public Pages vs. Authenticated API

LinkedIn offers two scraping surfaces, each with different tradeoffs: **Public profile pages (no login required):** - Accessible via `linkedin.com/in/username` without authentication - Limited data: name, headline, current position, education (often truncated) - Lower risk — no account to lose - Heavily rate-limited by IP **Authenticated Voyager API (requires login):** - Full profile data: complete work history, skills, endorsements, recommendations - Access to job listings, company pages, search functionality - Higher risk — account ban is possible - More data per request, so fewer total requests needed For most use cases, the authenticated approach provides better data efficiency — you get complete profiles in fewer requests, which actually reduces your detection footprint compared to repeatedly hitting public pages for partial data.

from curl_cffi import requests

# Public page approach (no auth needed)
def fetch_public_profile(username: str, proxy: str | None = None) -> str:
    """Fetch a public LinkedIn profile page."""
    url = f"https://www.linkedin.com/in/{username}/"
    session = requests.Session(impersonate="chrome")
    proxies = {"https": proxy, "http": proxy} if proxy else None

    response = session.get(
        url,
        proxies=proxies,
        headers={
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
        },
        timeout=15,
    )
    return response.text

Tip: If you only need basic profile data (name, headline, current company), the public page approach avoids account risk entirely. Use the authenticated API only when you need full profile details.

Authenticate and Capture Session Tokens

For the authenticated approach, you need to establish a session with LinkedIn and capture the cookies and CSRF token that authorize API requests. The critical tokens are `li_at` (session cookie) and `JSESSIONID` (used as the CSRF token in API requests). The safest authentication method is to log in manually in a real browser, then extract the cookies. Automating the login flow risks triggering security challenges (email verification, phone verification) that can flag the account.

import json
from pathlib import Path
from curl_cffi import requests

class LinkedInClient:
    BASE_URL = "https://www.linkedin.com"
    API_URL = "https://www.linkedin.com/voyager/api"

    def __init__(self, li_at: str, jsessionid: str, proxy: str | None = None):
        self.session = requests.Session(impersonate="chrome")
        self.proxy = proxy
        self.proxies = {"https": proxy, "http": proxy} if proxy else None

        # Set authentication cookies
        self.session.cookies.set("li_at", li_at, domain=".linkedin.com")
        self.session.cookies.set("JSESSIONID", jsessionid, domain=".linkedin.com")

        self.headers = {
            "csrf-token": jsessionid.strip('"'),
            "Accept": "application/vnd.linkedin.normalized+json+2.1",
            "Accept-Language": "en-US,en;q=0.9",
            "x-li-lang": "en_US",
            "x-li-track": json.dumps({"clientVersion": "1.13.8860", "osName": "web"}),
            "x-restli-protocol-version": "2.0.0",
        }

    def _get(self, endpoint: str, params: dict | None = None) -> dict:
        url = f"{self.API_URL}{endpoint}"
        response = self.session.get(
            url,
            headers=self.headers,
            params=params,
            proxies=self.proxies,
            timeout=15,
        )
        if response.status_code == 429:
            raise Exception("Rate limited — back off")
        if response.status_code == 401:
            raise Exception("Session expired — re-authenticate")
        response.raise_for_status()
        return response.json()

    @classmethod
    def from_cookie_file(cls, filepath: str, proxy: str | None = None):
        """Load credentials from a JSON file."""
        data = json.loads(Path(filepath).read_text())
        return cls(li_at=data["li_at"], jsessionid=data["JSESSIONID"], proxy=proxy)

Tip: Extract cookies from your browser using a browser extension like 'EditThisCookie' or from DevTools > Application > Cookies. Store them in a JSON file outside your code repository — never hardcode credentials.

Scrape LinkedIn Profiles via the Voyager API

LinkedIn's internal Voyager API returns structured JSON data that is far richer and more consistent than parsing HTML. The profile endpoint returns work experience, education, skills, and more in a single request. The key is constructing the correct API path — LinkedIn uses the profile's public identifier (the URL slug) or an internal URN.

from dataclasses import dataclass, field

@dataclass
class LinkedInProfile:
    public_id: str
    first_name: str
    last_name: str
    headline: str
    location: str
    industry: str
    summary: str
    experience: list[dict] = field(default_factory=list)
    education: list[dict] = field(default_factory=list)
    skills: list[str] = field(default_factory=list)

def get_profile(client: LinkedInClient, public_id: str) -> LinkedInProfile:
    """Fetch a full LinkedIn profile by public ID (URL slug)."""
    endpoint = f"/identity/dash/profiles?q=memberIdentity&memberIdentity={public_id}"
    data = client._get(endpoint)

    elements = data.get("elements", [])
    if not elements:
        raise ValueError(f"Profile not found: {public_id}")

    profile_data = elements[0]

    return LinkedInProfile(
        public_id=public_id,
        first_name=profile_data.get("firstName", ""),
        last_name=profile_data.get("lastName", ""),
        headline=profile_data.get("headline", ""),
        location=profile_data.get("geoLocationName", ""),
        industry=profile_data.get("industryName", ""),
        summary=profile_data.get("summary", ""),
    )

def get_profile_experience(client: LinkedInClient, profile_urn: str) -> list[dict]:
    """Fetch work experience for a profile."""
    endpoint = f"/identity/dash/profilePositionGroups?q=viewee&profileUrn={profile_urn}"
    data = client._get(endpoint)

    positions = []
    for group in data.get("elements", []):
        for position in group.get("profilePositionInPositionGroup", {}).get("elements", []):
            pos = position.get("profilePosition", {})
            positions.append({
                "title": pos.get("title", ""),
                "company": pos.get("companyName", ""),
                "location": pos.get("locationName", ""),
                "start_date": pos.get("timePeriod", {}).get("startDate", {}),
                "end_date": pos.get("timePeriod", {}).get("endDate", {}),
                "description": pos.get("description", ""),
            })
    return positions

Tip: LinkedIn's API response structure includes 'included' and 'elements' arrays with cross-references via URN identifiers. For deeply nested data, you may need to resolve these references by matching URNs across the response payload.

Extract Job Listings

LinkedIn's job search API supports filtering by keywords, location, company, and job type. Job listings contain structured data including title, company, description, requirements, and posting metadata. The API uses cursor-based pagination with a `start` parameter.

from dataclasses import dataclass

@dataclass
class JobListing:
    job_id: str
    title: str
    company: str
    location: str
    posted_at: str
    description: str
    employment_type: str
    seniority_level: str
    apply_url: str

def search_jobs(
    client: LinkedInClient,
    keywords: str,
    location: str = "",
    start: int = 0,
    limit: int = 25,
) -> list[JobListing]:
    """Search LinkedIn job listings."""
    endpoint = "/voyagerJobsDashJobCards"
    params = {
        "decorationId": "com.linkedin.voyager.dash.deco.jobs.search.JobSearchCardsCollection-218",
        "q": "jobSearch",
        "query": f"(origin:JOB_SEARCH_PAGE_QUERY_EXPANSION,keywords:{keywords},locationUnion:(geoId:103644278),selectedFilters:(sortBy:List(DD)))",
        "count": limit,
        "start": start,
    }

    data = client._get(endpoint, params=params)
    jobs = []

    for element in data.get("elements", []):
        job_card = element.get("jobCardUnion", {}).get("jobPostingCard", {})
        if not job_card:
            continue

        jobs.append(JobListing(
            job_id=job_card.get("jobPostingUrn", "").split(":")[-1],
            title=job_card.get("primaryDescription", {}).get("text", ""),
            company=job_card.get("primarySubtitle", {}).get("text", ""),
            location=job_card.get("secondarySubtitle", {}).get("text", ""),
            posted_at=job_card.get("tertiaryDescription", {}).get("text", ""),
            description="",  # Requires separate detail request
            employment_type="",
            seniority_level="",
            apply_url="",
        ))

    return jobs

def get_job_details(client: LinkedInClient, job_id: str) -> dict:
    """Fetch full details for a specific job posting."""
    endpoint = f"/jobs/jobPostings/{job_id}"
    data = client._get(endpoint)
    return {
        "title": data.get("title", ""),
        "description": data.get("description", {}).get("text", ""),
        "employment_type": data.get("formattedEmploymentStatus", ""),
        "seniority_level": data.get("formattedExperienceLevel", ""),
        "industries": data.get("formattedIndustries", ""),
        "apply_url": data.get("applyMethod", {}).get("companyApplyUrl", ""),
        "listed_at": data.get("listedAt", ""),
    }

Tip: Job listing descriptions are not included in search results — you need a separate request per job ID for the full description. Batch these requests carefully with delays to avoid rate limiting.

Implement Rate Limiting and Session Management

LinkedIn's rate limits are strict and account-specific. Exceeding them does not just block requests — it flags your account. The safe approach is to stay well below the detection threshold and distribute activity across multiple accounts and sessions. Conservative rate limits for 2026: - Profile views: 40-60 per day per account - Search queries: 20-30 per day per account - Job listing views: 50-80 per day per account - Minimum delay between requests: 3-8 seconds (randomized) - Session duration: 2-4 hours maximum, then cool down

import time
import random
from datetime import datetime, timedelta
from dataclasses import dataclass, field

@dataclass
class AccountState:
    client: LinkedInClient
    daily_profile_views: int = 0
    daily_searches: int = 0
    session_start: datetime = field(default_factory=datetime.now)
    last_request: datetime | None = None
    cooldown_until: datetime | None = None

class RateLimitedScraper:
    MAX_PROFILE_VIEWS = 50
    MAX_SEARCHES = 25
    MIN_DELAY = 3.0
    MAX_DELAY = 8.0
    SESSION_DURATION = timedelta(hours=3)

    def __init__(self, accounts: list[AccountState]):
        self.accounts = accounts

    def _get_available_account(self, action: str) -> AccountState | None:
        now = datetime.now()
        for account in self.accounts:
            if account.cooldown_until and now < account.cooldown_until:
                continue
            if now - account.session_start > self.SESSION_DURATION:
                account.cooldown_until = now + timedelta(hours=2)
                continue
            if action == "profile" and account.daily_profile_views >= self.MAX_PROFILE_VIEWS:
                continue
            if action == "search" and account.daily_searches >= self.MAX_SEARCHES:
                continue
            return account
        return None

    def _wait(self, account: AccountState):
        if account.last_request:
            elapsed = (datetime.now() - account.last_request).total_seconds()
            min_wait = max(0, self.MIN_DELAY - elapsed)
            delay = random.uniform(min_wait, self.MAX_DELAY)
        else:
            delay = random.uniform(1.0, 3.0)
        time.sleep(delay)
        account.last_request = datetime.now()

    def scrape_profile(self, public_id: str) -> LinkedInProfile | None:
        account = self._get_available_account("profile")
        if not account:
            print("No accounts available for profile scraping")
            return None
        self._wait(account)
        try:
            profile = get_profile(account.client, public_id)
            account.daily_profile_views += 1
            return profile
        except Exception as e:
            print(f"Failed to scrape {public_id}: {e}")
            if "429" in str(e):
                account.cooldown_until = datetime.now() + timedelta(hours=4)
            return None

Tip: Reset daily counters at midnight in the account's apparent timezone, not your server's timezone. An account 'based' in New York should not have its activity counter reset at midnight UTC.

Structure and Store Your Data

LinkedIn data is relational — profiles connect to companies, companies connect to jobs, and all of them change over time. Structure your storage to support both point-in-time snapshots and longitudinal tracking.

import json
import sqlite3
from dataclasses import asdict
from datetime import date

def init_database(db_path: str) -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS profiles (
            public_id TEXT NOT NULL,
            scraped_at DATE NOT NULL,
            first_name TEXT,
            last_name TEXT,
            headline TEXT,
            location TEXT,
            industry TEXT,
            summary TEXT,
            raw_json TEXT,
            PRIMARY KEY (public_id, scraped_at)
        );
        CREATE TABLE IF NOT EXISTS jobs (
            job_id TEXT NOT NULL,
            scraped_at DATE NOT NULL,
            title TEXT,
            company TEXT,
            location TEXT,
            description TEXT,
            employment_type TEXT,
            seniority_level TEXT,
            raw_json TEXT,
            PRIMARY KEY (job_id, scraped_at)
        );
        CREATE INDEX IF NOT EXISTS idx_profiles_scraped ON profiles(scraped_at);
        CREATE INDEX IF NOT EXISTS idx_jobs_company ON jobs(company);
    """)
    return conn

def save_profile(conn: sqlite3.Connection, profile: LinkedInProfile):
    conn.execute(
        "INSERT OR REPLACE INTO profiles VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
        (
            profile.public_id,
            date.today().isoformat(),
            profile.first_name,
            profile.last_name,
            profile.headline,
            profile.location,
            profile.industry,
            profile.summary,
            json.dumps(asdict(profile)),
        ),
    )
    conn.commit()

def save_job(conn: sqlite3.Connection, job: JobListing):
    conn.execute(
        "INSERT OR REPLACE INTO jobs VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
        (
            job.job_id,
            date.today().isoformat(),
            job.title,
            job.company,
            job.location,
            job.description,
            job.employment_type,
            job.seniority_level,
            json.dumps(asdict(job)),
        ),
    )
    conn.commit()

Scale Beyond Account Limits with Real-Device Infrastructure

The DIY approach above works for small-scale data collection — monitoring a few hundred profiles or tracking job listings in a specific niche. At scale, the bottleneck shifts from technical implementation to operational management: maintaining multiple LinkedIn accounts, rotating sessions, handling account restrictions, and staying ahead of detection updates. LinkedIn's detection in 2026 leans heavily on device fingerprinting. Even with perfect rate limiting and residential proxies, if your TLS fingerprint, browser canvas hash, or WebGL renderer does not match what LinkedIn expects from the claimed browser, the session gets flagged. This is where the gap between emulated browsers and real devices becomes critical. Real-device infrastructure routes your LinkedIn requests through actual smartphones with native browser environments. Every fingerprint signal — TLS, canvas, WebGL, audio context, screen dimensions — is inherently consistent because it comes from real hardware. Services like Archonum maintain dedicated, factory-reset devices that produce authentic fingerprints without any spoofing. The result is significantly lower detection rates and longer account lifespans. For teams that need to scrape LinkedIn at scale (thousands of profiles, daily job monitoring across multiple markets), the economics favor real-device infrastructure over managing a fleet of accounts and proxies in-house.

Tip: If your LinkedIn accounts are getting restricted despite conservative rate limits, the issue is almost certainly fingerprint-related rather than behavioral. Real-device solutions eliminate this class of detection entirely.

LinkedIn scraping is a high-reward, high-risk activity. The data is uniquely valuable for recruiting, sales intelligence, and market research, but LinkedIn's detection system is aggressive and the consequences of getting caught are real. The approach in this guide — authenticated API access with conservative rate limits, proper session management, and structured data storage — provides a reliable foundation for moderate-scale collection. When you need to scale beyond what a handful of accounts can support, real-device infrastructure removes the fingerprinting challenge that causes most account restrictions. Whatever approach you choose, treat your LinkedIn accounts as valuable assets and protect them accordingly.

FAQ

The legal landscape is nuanced. The 2022 hiQ v. LinkedIn ruling established that scraping public profile data does not violate the CFAA. However, LinkedIn's User Agreement prohibits scraping, creating potential breach-of-contract liability. Scraping non-public data (requiring login) carries additional risk. Most companies engaged in LinkedIn data collection operate under the hiQ precedent for public data and accept the contractual risk. Consult your legal team.

Keep daily profile views under 50, searches under 25, and use random delays of 3-8 seconds between requests. Never access LinkedIn from datacenter IPs while logged in. Use a single, consistent residential IP per session. Do not perform actions that a human would not do — viewing 50 profiles without clicking any links or scrolling any page is a clear signal.

Yes, public profile pages are accessible without authentication, but the data is limited — typically just name, headline, current position, and education. Full work history, skills, and contact information require authentication. Public page scraping is lower risk but provides less data per request.

LinkedIn's Voyager API is an internal API not intended for third-party use, so it changes without notice. Major structural changes happen 2-3 times per year. Minor parameter and response format changes happen more frequently. Build your scraper to handle missing fields gracefully and monitor for parsing failures.

Residential proxies in the same geographic region as your LinkedIn account's stated location. Mobile carrier proxies work well but are more expensive. Never use datacenter proxies for authenticated LinkedIn access — they trigger immediate security challenges. For the highest reliability, real-device infrastructure provides both the IP and the fingerprint authenticity that LinkedIn's detection requires.

LinkedIn's official Marketing and Talent APIs are limited to approved partners and provide restricted data scopes. The approval process is slow and many use cases are not covered. For most data collection needs — competitive intelligence, market research, lead generation — scraping remains the practical approach because the official APIs simply do not provide the data you need.

Scale LinkedIn Data Collection Without Account Risk

Archonum's real-device infrastructure routes LinkedIn requests through actual smartphones with native fingerprints. Lower detection rates, longer account lifespans, and no fingerprint management overhead.

Talk to Sales