How to Use Your LLM as an AI Lab

Most weeks, I have a version of the same conversation with a finance leader. They ask whether to sign with the AI-Native SaaS vendor that just gave a great demo. Or brief a freelancer to build the month-end close. Or wait for the ERP vendor's next AI release. Each is a fair question for the specific workflow. None of them is the question to ask first.

The question to ask first is whether the workflow is even ready to be automated, and what good would look like if it were. You answer that in your LLM, before any vendor or freelancer gets a brief. Last week I called that the prototyping layer of the AI stack. This week is how to actually use it that way.

In the subscriber section: an interactive AI Lab artifact with six common CFO workflows, three structural steps each, and the system prompts, project layouts, and reference architectures for every cell.

Claude in Action: One May Spot Left

Claude in Action is a three-session hands-on AI training, built around your team's actual finance workflows and delivered by a CFO who uses these tools every day. The standard format is company-based: one company, one team, your real work. More about the program.

I have one May spot left in that format. If you have been thinking about it, apply now.

A few of you have asked about a different format: a mixed cohort of CFOs and fractional CFOs from different companies, learning alongside peers instead of a company-specific program. I am still figuring out the details (size, schedule, structure), but if that fits you better, sign up for the waitlist. Early waitlist subscribers will get a discount when the cohort opens.

LLM is your AI Lab

Last week, we walked through the four building blocks I see in most finance organizations today.

The point I want to come back to this week is that the LLM belongs in every stack, no matter what else is running, because it plays a role nothing else can. It is the testing lab. The place where you figure out which workflows are candidates for a structured project or a built automation, and which are not. Last week I called it the prototyping layer. This week is how you actually use it that way.

You don’t decide upfront what to automate; you test first.

You run the work manually inside your LLM, watch what stabilizes, and only then decide whether to formalize anything. By stable I mean the inputs come in the same shape each time, the prompt stops changing, and your edits to the output shrink. The output of the lab is not a finished automation; it is a decision: this is worth automating, this isn’t, this stays manual, this graduates.

One important caveat before any of this works. The lab requires an LLM you can actually put your data into. That means a corporate-grade subscription, and an internal policy that allows you to upload the kind of data you are planning to test with: real invoices, real contracts, real client information, real financials. If your company does not have either of these in place, the lab is a stage-zero conversation about your enterprise AI access, not a stage-one experiment with workflows. Get the access first.

Don’t pick one workflow; pick several. Try them all in the LLM, and the ones that work climb the stairs.

Chat in the LLM. You do the work, the LLM helps, you upload files, paste rules, ask questions.
A structured project. When a workflow stabilizes in chat, you set it up as a project inside your LLM, in Claude Projects, ChatGPT Custom GPTs, or a similar space, where rules, context, and templates persist across sessions. You still drop in the inputs yourself and review the output, but you stop rebuilding the prompt every time. Faster, more reliable, but still hands-on.
Built automation. You take what you learned in the project and put it into a real tool. That can be an AI-native SaaS product, a custom build by a freelance team, an internal development team, or an external development partner. The tool could be a deterministic workflow or an agent, depending on what the work needs. You step out, and you only stay in the loop for the edge cases.

There is a fair counterargument worth naming. The AI-native SaaS vendors have already done the validation work for thousands of customers, so why should every CFO reinvent the discovery process in their own LLM. The answer is that the lab is not really about discovering whether the work can be automated. It is about discovering whether it should be automated in your company, with your edge cases, and what good looks like for your team. Vendor demos do not tell you that. Your own data does.

Two workflows I have run through this. One graduated. The other didn’t.

Example 1: Accounts payable

A client on NetSuite was choosing between an AI-native SaaS vendor like Hyperbots or Vic.ai, a custom build, or expanding what their controller was doing in Claude. We ran the work through the LLM for four weeks.

AP is not one task: invoice intake, two-way and three-way match, MSA comparison when there is no PO, GL coding, and approval routing. The Cowork setup held the MSAs, open POs refreshed weekly from NetSuite, the chart of accounts, and approval thresholds.

What the LLM caught: a $14,200 invoice against a $12,000 PO, a $250 rate where the MSA specified $225, an MSA-scope invoice with no PO. What it missed: receipt confirmation when NetSuite data was thin, cross-border tax.

After three weeks, the prompt stopped changing; 80 percent of invoices ran cleanly. The client went custom because their service-based work ran against MSAs, not POs, which most SaaS tools assume. The project served as the brief.

Example 2: CFO contract approval

This one is mine. I tried to build an LLM workflow to approve incoming contracts at a startup. I loaded my signing authority, the standard MSA, vendor categories, and the redlines we always pushed back on. The first five contracts looked clean.

Then the rule list grew. A revenue-share agreement. A UK contractor under English law. A hybrid licensing-plus-services deal. Each needed new rules.

Then the harder cases. A long-relationship vendor sent worse-than-standard terms; the right answer was to sign anyway. A twelve-month renewal came in for a service we were switching off in two quarters; the right answer was three months. The rules were not the work; the work was judgment around the rules, drawn from context the LLM did not have.

So I stopped. It stays in chat as a thinking tool. This shape shows up anywhere the work is judgment-led: vendor risk reviews, exception handling, executive comp. The lab does not graduate any of them, and that finding saved me from a freelancer brief I would have regretted.

Two traps I keep seeing.

Picking only one workflow to test. Run several in parallel, because the comparisons are where the learning is.
Treating one good run as proof. The point is repetition; one prompt that worked once is not a workflow.

If you want to start a lab this week, you do not even need to pick a workflow yet. Open your LLM and look at your chat history. The candidates are already there. Try this prompt:

‘Look at our recent conversations and identify three to five workflows where I have asked you for similar help more than once. For each, tell me whether the inputs, the rules, and the output format have been stable across our conversations, or whether they shifted each time. Rank them from most stable to least stable.’

Whatever comes back at the top of that list is your first lab candidate.

The point is not to do less automation; it is to do the right automation. The lab tells you what right looks like.

We have covered the staircase, two examples, and the prompt for finding workflow candidates in your own LLM history. What the free section cannot show you is what each step actually looks like across more than one workflow. That is what the paid artifact does.

This week’s subscriber benefit is an interactive AI Lab artifact built in Claude. Six common CFO workflows: AP automation, reconciliations, monthly board package, rolling 13-week cash flow forecast, variance commentary, and CFO contract approval. Three steps each. Eighteen cells of working content. For every workflow you see the full system prompt for the chat version, the Cowork project folder layout for the structured version, and the reference architecture for the built version with named SaaS alternatives.

Upgrade Now

Closing Thoughts

Thanks for reading. If this piece made you think about a workflow you have been tempted to automate, that is probably your first lab candidate. Try it for a few weeks and see what stabilizes. The most interesting findings are usually the ones nobody predicted. Reply and tell me what you find.

See you Tuesday.

Anna

Tell me what you think...

We Want Your Feedback!

This newsletter is for you, and we want to make it as valuable as possible. Please reply to this email with your questions, comments, or topics you'd like to see covered in future issues. Your input shapes our content!

Want to dive deeper into balanced AI adoption for your finance team? Or do you want to hire an AI-powered CFO? Book a consultation!

Did you find this newsletter helpful? Forward it to a colleague who might benefit!

Until next Tuesday, keep balancing!

Anna Tiomina
AI-Powered CFO