Skip to main content

4 posts tagged with "LLM"

LLM project

View All Tags

Version 3: Back to OCR -This Time For Real

· 2 min read
Jay Pitkänen
AI Deployment Strategist

The coordinate grid was DOA. Back to square one.

Python powered OCR would be the only way forward.

I have to get this right. If you are working on a car, guessing doesn't cut it. Accuracy is everything. Get a torque spec wrong, and you snap a bolt inside an engine block.

But instead of trying to become a computer vision engineer overnight, I decided to use an established solution created by someone way smarter than me: EasyOCR.

Why reinvent the wheel, right?

Version 2: Blinded By The Coordinate

· 5 min read
Jay Pitkänen
AI Deployment Strategist

The LLM-powered vision pipeline was starting to take shape.

The JetKVM grabs the screenshot.

A quick JavaScript script sends that image over to the LLM. The LLM reads the screen and tells Python exactly where to click.

[JetKVM Laptop Screen]

▼ (WebRTC Stream)
[Your Browser Frontend] ──(Captures Frame)──> [Converts to Base64 Image String] │ ▼ (Fetch POST Request) [Ollama / Local Server]
│ (Runs Ministral 8B)

[Structured JSON Data Out]

Clean. Simple.

But there was one glaring flaw. How does the LLM describe a location? It can't just say "click the top right button." Python needs exact pixels.

The LLM needed a map.

Version 1: OpenCV, How Hard Could it Be?

· 5 min read
Jay Pitkänen
AI Deployment Strategist

My first idea was pretty simple.

  1. JetKVM forwards video to my browser.
  2. Python reads the screen and finds the buttons.
  3. Python shows content to LLM.
  4. Python tells JetKVM what to do based on LLM reasoning.

Simple, elegant.

OpenCV was the industry standard 6 years ago when I was actively coding Python, surely it would work fine here today, right?

Surely.

Fixing Enterprise Problems With a Dusty IBM Laptop

· 5 min read
Jay Pitkänen
AI Deployment Strategist

Ever owned an old Mercedes?

Let me tell you about a regular Tuesday evening owning an old Mercedes.

You're lying under the car, back against the pavement, face inches away from a hot exhaust pipe, trying to fix another "tiny thing".

It's dark. Your eyes are full of dirt and dust. The wrench you're holding is starting to feel real heavy. Just gotta tighten this one last bolt and then it's sauna and beer time.

Then your mind goes blank. What torque?

Was it 50Nm or 90Nm? You check your notes - nothing. Google? No way, this information is way too niche. ChatGPT? Forget it. You can't trust it to be accurate.

Get it wrong and the bolt might snap and cause a fuel leak.

The Mercedes Xentry Diagnosis Software could give the exact value, but it's locked away in a laptop in your office, some 900m away.

If only there was a way to access it remotely with voice commands...