Back to Articles
Originally published on Substack
View original

80 Days to Stay - Connecting Recent Grads to Hidden Tech Jobs

How one professor turned SEC filings into a 30,000-company database connecting international talent to funded startups before their visa deadlines expire

[

Article image

](https://substackcdn.com/image/fetch/$s_!uw3-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe921cab1-8ca9-4793-b67d-e11214b9e8e0_2912x1632.png)

How one professor turned regulatory paperwork into a lifeline for international talent racing against deportation deadlines

You have 73 days left in the United States.

Not 73 days to find the perfect job, or 73 days to negotiate salary, or 73 days to weigh your options. You have 73 days to locate a company willing to navigate H-1B bureaucracy, secure an offer at your previous salary level, and complete the paperwork-or you’re gone. Your apartment lease doesn’t matter. Your relationships don’t matter. The AI model you’ve been training for six months doesn’t matter. The clock is absolute.

This is the reality for thousands of H-1B visa holders laid off in the current wave of tech industry “efficiency adjustments.” Fresh graduates on Optional Practical Training face an even tighter noose: 90 days from diploma to deportation. While you scroll through the same job boards as 47,000 other applicants-all targeting the same hundred companies with established immigration pipelines-a different kind of goldmine sits in plain sight, locked inside the bureaucratic vaults of the Securities and Exchange Commission.

The Hidden Market

Here’s what most people miss: Big Tech represents maybe three percent of the American economy. Google, Microsoft, Tesla-they’re trillion-dollar behemoths with entire legal teams devoted to H-1B sponsorship. They also receive approximately 10,000 applications per opening. You know this. Every international student knows this. The competition is gladiatorial.

Meanwhile, across the United States, thousands of startups have raised five million, fifteen million, fifty million, one hundred million dollars. They’re desperate for talent. They’re flush with venture capital. And roughly ninety percent of them-despite having the financial capacity to hire-avoid international candidates entirely.

Not because they’re opposed to sponsorship. Because they don’t understand it.

They assume it’s expensive (it’s not, relatively speaking). They assume it’s complicated (it’s mostly paperwork). They assume it requires some specialized legal infrastructure (multiple low-cost services exist). But because they’ve never done it before, because their founder’s MBA program didn’t cover immigration law, because it feels foreign and uncertain, they simply checkbox “U.S. citizens only” and move on.

The talent pool and the capital pool exist in parallel universes. This is the gap that 80 Days to Stay was built to close.

The Data Pipeline

Nik Bear Brown, an Associate Teaching Professor at Northeastern University, started with a simple question: Where is every funded startup in America?

The answer, it turns out, is public record. When you raise money-even private money from angel investors or venture capital-you must file Form D with the SEC. Company name, address, funding amount, date. It’s all there, freely available, just exceptionally tedious to compile.

Brown started downloading. Quarter by quarter, going back ten years. The XML files stacked up: fourth quarter 2024, third quarter 2024, second quarter 2024. Each file contained thousands of companies. Some filed once. Others filed repeatedly as they raised successive rounds. By the time the script finished running-removing duplicates, aggregating totals, calculating statistics-the database contained over 500,000 companies.

This is where most people would have stopped. Half a million companies is impressive. It’s also useless. A teenager’s lemonade stand that received $50,000 from an uncle appears in this dataset. So does a real estate holding company. So does a company that raised money in 2015 and shut down in 2016.

What you need-what a student with 61 days remaining needs-is signal, not noise.

The Filtering Begins

Brown built a pipeline. Step one: filter by funding threshold. If a company raised less than one million dollars, delete it. If it raised money ten years ago and nothing since, delete it. If it’s headquartered in Europe or Canada (outside the H-1B jurisdiction), delete it. If it operates in industries with security clearance requirements that exclude international hires, delete it.

The list compressed. Five hundred thousand became fifty-seven thousand.

Still too broad. A name and an address aren’t enough to apply. You need a website, a careers page, contact information. You need to know if they’re actually hiring.

Step two: domain inference. For each company, the bot generated likely URLs. If the company is called “Crossbow Therapeutics,” try crossbowtherapeutics.com. Try crossbowtherapeutics.io. Try crossbow.bio. Ping each domain. Record which ones return valid responses.

It worked. Out of 41,000 companies (after deduplication), the bot verified websites for approximately 25,000. A sixty-two percent success rate from educated guessing.

Step three: crawl everything. Visit each verified website. Follow internal links. Scrape up to fifty pages per company. Save the HTML. The bot ran for fifty-nine hours straight, operating fifty requests in parallel, ultimately collecting nearly half a million web pages-an average of 16.5 pages per company.

Somewhere in those pages: career listings, team bios, “About Us” sections describing what the company actually does. The raw material for inference.

What the Bot Knows

The scraped data revealed patterns invisible to human job seekers manually refreshing LinkedIn.

Some companies use Greenhouse as their Applicant Tracking System. This matters enormously. Greenhouse uses specific keyword-matching algorithms. If you know a company runs on Greenhouse, you can reverse-engineer your resume to pass their filters. Other companies use Workday, Lever, or custom systems. Each has its own parsing logic. Knowing the ATS is knowing the gatekeeper’s language.

Some companies list job openings that explicitly mention visa sponsorship. Strong signal. Others have “People” pages showing employees with undergraduate degrees from India and master’s degrees from U.S. universities-the classic international student trajectory. Also a strong signal. They’ve done this before. They know the process.

The final enrichment layer: cross-reference with Department of Labor and USCIS databases. These are also public records. When a company files for H-1B sponsorship, it’s logged. How many times has this company filed in the past year? The past five years? When was their most recent filing?

A company that sponsored twelve people last year is a known quantity. A company that has never filed might be willing-they just don’t know they’re willing yet.

The Master Dataset

From 500,000 raw SEC filings to 30,000 verified, annotated, high-probability targets. Each entry now includes:

Company name and verified website

Total funding raised and most recent funding round

Industry classification

Location

Identified ATS system (if applicable)

Historical H-1B sponsorship count

Verified job listings (where available)

Team composition signals (inferred from scraped bio pages)

This is not a job board. Job boards list openings at companies you already know about. This is a targeting system for the hidden market-the funded, growing, capable-but-uninformed companies that represent the vast majority of American innovation.

The Stakes

Let’s be precise about what’s at stake here. International students don’t just “contribute to” American universities-they are, in Brown’s phrasing, “the lifeblood.” They pay full tuition. They subsidize research. They enable American students to attend schools that would otherwise lack the funding to operate at current scale.

After graduation, they staff the labs, write the code, and train the models that power the companies everyone claims to care about. Then, when the labor market contracts, they’re given two months to find a new sponsor or leave.

The standard advice-”just apply to Google”-is mathematically absurd. There are maybe one hundred companies with truly robust immigration infrastructure. There are tens of thousands of talented graduates. The equation doesn’t close.

Meanwhile, a seed-funded biotech company in Cambridge has $8 million in the bank and can’t find a qualified ML engineer. A Series A fintech startup in Austin has twelve open positions and no idea that filing Form I-129 is straightforward. The talent exists. The capital exists. The jobs exist.

What didn’t exist, until now, was a map connecting them.

How to Use It

You could spend seventy-three days sending generic applications to Microsoft and Amazon, competing with tens of thousands of identical resumes. Or you could filter the 80 Days to Stay dataset by your field (biotechnology, fintech, AI infrastructure), your location preference (California, Massachusetts, Texas), and companies with verified H-1B filing history.

You could identify a Series B company in your specialty that sponsored four people last year. You could see they use Greenhouse. You could tailor your resume accordingly, demonstrate you understand their specific technical stack (which you learned from their scraped job listings), and apply with a ninety-percent-higher chance of human review than the generic FAANG application.

This is not about gaming the system. This is about finding the companies that need you but don’t know how to reach you.

Next Steps: How Volunteers Can Enhance This Dataset

The current 30,000-company dataset is functional-people are using it right now to find jobs. But it’s version 1.0. The difference between a useful tool and an indispensable one lies in the details that only distributed human intelligence can provide.

Here’s where the project needs help, organized by skill level and time commitment:

Immediate Impact: Data Validation (No Coding Required)

Task: Website VerificationThe domain inference bot achieved a 62% success rate by guessing URLs. That means approximately 38% of companies in the original 57,000-company pool still have no verified website. You can fix this manually.

What to do:

Download the unverified_companies.csv from GitHub

Pick a batch of 50 companies (filter by your city, your industry, whatever interests you)

For each company, search “[Company Name] + [City] + [Industry]” in Google

Record the actual website URL

Submit your findings via the Google Form linked in the repository

*Why it matters:*Every website you verify adds another company to the searchable pool. If you verify 50 companies and 30 of them have active job listings, you’ve just created 30 new opportunities that were previously invisible.

*Time commitment:*2-3 hours per 50-company batch

Task: ATS System IdentificationKnowing a company’s Applicant Tracking System is tactical gold. The bot can auto-detect Greenhouse (by checking if companyname.greenhouse.io returns a valid response), but other systems require human verification.

What to do:

Visit company career pages in the verified dataset

Look for telltale signs: Workday (distinctive blue interface), Lever (minimalist design), custom systems (unique branding)

Document which companies use which systems

Flag companies that appear to review applications manually (no ATS at all-these need different application strategies)

*Why it matters:*ATS systems parse resumes differently. A Workday-optimized resume will get filtered out by Greenhouse. This metadata transforms the dataset from a list of companies into a strategic targeting tool.

*Time commitment:*30 seconds per company once you know what to look for

Intermediate: Enrichment Scripts (Python Skills Helpful)

Task: LinkedIn Team AnalysisThe original plan was to analyze company team pages to infer sponsorship likelihood. LinkedIn blocked this approach, but individual company websites often have “Team” or “About Us” pages with employee bios.

What to do:

Write a script that visits the scraped HTML for each company

Search for pages containing team member information

Extract education histories where visible (look for the international undergrad + U.S. grad school pattern)

Calculate a “sponsorship likelihood score” based on team composition

*Why it matters:*A company that currently employs five people with international backgrounds is qualitatively different from a company with zero. This is predictive data.

*Skills needed:*Python, basic HTML parsing (BeautifulSoup), regex pattern matching

*Time commitment:*10-15 hours to build initial version

Task: Job Listing Extraction and ClassificationThe bot scraped up to 50 pages per company. Somewhere in that HTML are job listings. They need to be extracted and categorized.

What to do:

Build a parser that identifies job listing pages (look for keywords: “careers,” “jobs,” “openings,” “apply”)

Extract job titles and descriptions

Classify by role type (engineering, data science, product, design, operations)

Flag listings that explicitly mention “visa sponsorship available”

Store structured job data separately from company metadata

*Why it matters:*Right now, users know “Company X has a website and raised $15M.” They need to know “Company X is hiring two ML engineers and one data scientist, posted 3 weeks ago.”

*Skills needed:*Python, HTML parsing, basic NLP for classification

*Time commitment:*15-20 hours for robust solution

Advanced: New Data Sources and Integration

Task: Cross-Reference with H-1B Disclosure DataThe Department of Labor publishes H-1B Labor Condition Applications. USCIS publishes approval/denial statistics. Both datasets are public but require significant cleaning.

What to do:

Download DOL LCA disclosure data (available at dol.gov/agencies/eta)

Download USCIS H-1B employer data hub statistics

Clean and normalize company names (this is harder than it sounds-”Google LLC” vs “Google Inc” vs “Alphabet Inc”)

Match companies in the 80 Days dataset to their H-1B filing history

Calculate: Total approvals, most recent filing date, average salary offered, job titles sponsored

*Why it matters:*This converts “might sponsor” into “definitely sponsors, here’s proof.” A company that sponsored 47 people last year is a known entity. A company with zero filings requires a different approach (educating them about the process).

*Skills needed:*Data engineering, fuzzy matching algorithms, experience with messy government datasets

*Time commitment:*25-30 hours for initial integration; ongoing maintenance as new data releases

Task: Real-Time Funding Data IntegrationThe SEC Form D data is current, but it’s filed quarterly. Companies raise money every week. Real-time funding data exists in fragmented sources: Crunchbase, PitchBook, AngelList, news announcements.

What to do:

Set up scrapers for Crunchbase and similar platforms (note: some require API access/payment)

Monitor tech news aggregators for funding announcements

Create a weekly update pipeline that adds newly-funded companies to the dataset

Flag companies that just raised Series A or B (they’re in active hiring mode)

Why it matters:A company that raised $20M two weeks ago is desperate for talentright now. Speed matters. Real-time data creates asymmetric advantage.

*Skills needed:*Web scraping, API integration, automated pipeline management

*Time commitment:*20 hours initial setup, 2-3 hours/week maintenance

Expert Level: Build the Search Interface

Task: Resume-to-Company Matching SystemThe vision: Upload your resume, get a ranked list of companies most likely to hire you based on your skills, location preferences, and visa status.

What to do:

Build a semantic search system that compares resume text to company descriptions and job listings

Weight companies by: funding recency, H-1B history, ATS compatibility, geographic proximity

Generate personalized application strategies (e.g., “This company uses Greenhouse; emphasize these keywords”)

Create a simple web interface (even a Streamlit app is better than raw CSV)

*Why it matters:*Right now, using the dataset requires manual filtering in Excel. That’s a barrier. A search interface democratizes access-anyone can use it, regardless of technical skill.

*Skills needed:*NLP/embeddings, web development, database design, UX thinking

*Time commitment:*40+ hours for MVP

The Meta-Task: Documentation and Outreach

The best dataset in the world is useless if nobody knows it exists or how to use it.

Needed:

Step-by-step tutorials for non-technical users (”How to filter this CSV for biotech companies in Boston”)

YouTube walkthroughs demonstrating successful application strategies

Case studies from people who actually got jobs using this data

Translation of the README into Hindi, Mandarin, Spanish (the demographics of international students)

Outreach to university career services offices

Social media campaigns targeting international student communities

*Time commitment:*Variable, ongoing

The Bigger Picture

Brown didn’t set out to solve American immigration policy. He set out to solve a specific, tactical problem: his students were running out of time. The solution required no new legislation, no appeals to bureaucratic mercy, no change in public opinion. It required nineteen days of coding, sixty hours of web scraping, and a willingness to treat regulatory filings as raw material.

Now the infrastructure exists. The next phase is refinement, validation, and distribution. This is where the project transitions from one professor’s emergency intervention to a sustainable, community-maintained resource.

If you’re reading this with fifty-eight days left on your visa, you now have options beyond the same hundred companies everyone else is targeting. If you’re a startup founder who assumed sponsorship was impossibly complicated, you now have evidence that your competitors are doing it routinely.

And if you’re someone with Python skills, data engineering experience, or just a few hours to validate website URLs, you can help turn a functional tool into an indispensable one.

The clock is still running. But the map is no longer blank.

Get Involved:Repository:github.com/nikbearbrown/80-Days-to-Stay


Connect with Nik Bear Brown

Nik Bear Brown Poet and Songwriter