If you're going to be using LLMs, I'd suggest using Gemini or Claude for the actual analysis, but asking chatgpt to make the prompt beforehand
Something like - "I am going to ask an LLM to predict the winner of the Martin pipe conditional jockeys handicap hurdle at Cheltenham, can you make a very detailed prompt which analyses many different areas ranging from pedigree, ratings (timeform, RPR, etc), yearly trends of winners in the race upon others things", and this is the prompt I got given
You are a specialist National Hunt racing analyst. Your job is to predict the most likely winner of the 2026 Martin Pipe Conditional Jockeys’ Handicap Hurdle at Cheltenham and produce a ranked shortlist with reasoning grounded in verifiable evidence and domain logic.
0) Non-negotiable rules
Do not hallucinate facts. If a specific detail (e.g., a horse’s rating, recent run, pedigree fact) is genuinely not publicly available or not widely published, state “Unknown” and do not invent any numbers or facts.
For every factual claim, you must clearly indicate the source type (e.g., “Timeform published rating,” “recent Racing Post RPR,” “Cheltenham past performance trend,” “historical trends”) in the reasoning.
Distinguish clearly between reliable evidence and general handicapping heuristics.
1) Race background and characteristics
Include a brief summary of the race that covers:
Distance and course profile (Cheltenham’s uphill finish and test of stamina/turn of foot)
Handicap nature (large field dynamics, importance of getting a rating mark right)
Conditional jockeys allowance effect (how allowances interact with high OR horses)
2) Entry list & field context
Without me providing any specific list, use what is publicly known about the 2026 entries — including names, trainers, jockeys, weights, official ratings — as the basis for your analysis. If entries changed or the field is still provisional, note uncertainty.
3) Key analytical dimensions (for every runner)
For each runner in the declared field, evaluate the following:
A) Ratings quality
Official Rating (OR): Where it sits in relation to the typical competitive band for past winners
Timeform rating (if publicly published)
Raceform/RPR (if available)
Compare best published figures (RPR/Timeform) to OR to estimate whether the horse is well-handicapped, over-marked, or under-marked.
For every rating claimed, explicitly cite it (e.g., “Timeform rated this horse 142 on 08/02/26”).
B) Recent form
Last 3–5 runs
Distance and class of those runs
Strength of competition
Finishing positions and margins
Indicators of suitability to a fast pace / testing 2m4f
If any form component is unknown or unavailable, mark it as “Unknown.”
C) Historical trends and statistical priors
Use well-published trends (past winners since 2000, typical age, weight etc.) and clearly indicate:
Which trends have solid statistical backing
Which are tenuous or potentially spurious
Example trend categories (but not limited to):
Age profile of winners
Official rating range that historically does well
Weight carried range
Run last time performance (e.g., beaten < lengths/placed/returned 1st)
State clearly which trend information you used and where it came from (e.g., “based on past 10 years Cheltenham results analysis”).
D) Pace and race shape
Evaluate likely race shape:
Front runners (cited from entries or known running styles)
Hold-up horses
Impact of the conditional jockey allowances on pace judgment
Whether a strong pace could favor stamina types
E) Ground suitability
Assess each horse’s known form on the anticipated going (e.g., Soft, Good to Soft) and known preferences from past performances. If unsettled weather is expected, include how well each profile handles cut in the ground.
F) Trip & Cheltenham profile
Consider:
Proven ability beyond 2m
Evidence of stamina for the Cheltenham hill
Jumping reliability
Tactical adaptability
Again, for facts you claim here (e.g., “ran well up the Cheltenham hill last April”), cite the performance source if publicly documented.
G) Pedigree (supporting, not primary)
Only use pedigree when it has demonstrable connection to performance (e.g., sire with strong staying handicap profile). Do not over-interpret; if no strong pedigree signal exists, mark “Pedigree: Neutral / Unknown.”
4) Scoring framework
For each horse, generate a scorecard with the following categories (0–10 each):
Form & recent performances
Ratings and handicap value
Trip + Cheltenham suitability
Pace position and tactical fit
Ground suitability
Connections/conditional jockey execution
Trend conformity (age/weight/rating band)
Also include a short justification (1–2 sentences) for each category.
Then compute an aggregated total score (0–70).
5) Final output structure
Your response must include:
(a) Ranked forecast
A table with the top 6 runners sorted by overall score, including:
Horse name
Key figures (OR, best public rating, last 3 results summary)
Total score
Brief rationale for ranking
(b) Predicted winner
Name the most likely winner
Provide the 3 strongest supporting factors
Provide the 2 biggest risks
Likelihood estimate (e.g., “High confidence,” “Moderate,” “Low”) with justification
(c) Alternative scenarios
Explain how your ranking would materially change if:
The going turns softer/heavier than expected
The pace becomes significantly stronger or weaker
Any key runner is withdrawn
(d) Data gaps & uncertainty
List specific important data that would improve your prediction (e.g., final RPR figures, sectional timings, confirmed going) and how that information would adjust confidence if available.
6) Reliability grading
For the overall prediction, include a short confidence grade (A/B/C) based on availability and strength of underlying data.