Technical SEO Audit of Khatabook [Fintech]

Every day, thousands, if not millions, of small businesses in India open Khatabook to remember who owes them money, settle tabs of other accounts or track profits for the month.

Khatabook has quietly become SMB’s “Bahi-Khata” (Bookkeeping system) for many, yet its technical SEO audit tells a different story – 10,000+ pages live at high crawl depth, 199 orphan pages have no incoming internal links, 2000+ indexable pages missing from the sitemap and a few non-indexable URLs being present in the sitemap confuse search engines on which pages matters most for the users.

Add that to the 400+ broken URLs, 11 pages with redirect chains, 6000 problematic URL structures, 81 thin content pages and missing proper hreflang definitions, the pattern is clear – there is still a huge disconnect between what people are asking on search and what they’re getting with Khatabook pages on the SERPs. Most pages on the khatabook.com site are fragmented, inconsistent and often buried too deep to be reliably discovered.

In this post, we’ll discover Khatabook’s website structure, guess their top priorities in terms of digital visibility needs and conduct a technical audit of their website – Khatabook.com. We’ll connect these items together to build a story upon which you can do similar audits and fix the most pressing issues for your clients or stakeholders.

Contents

1 Company in Focus for Technical SEO audit: Khatabook

2 Site Structure and Tech Stack of Khatabook

3 Why do you need to know about Khatabook’s Tech Stack and Business Model?

4 Executive Summary for Khatabook’s Technical SEO Audit

5 Interesting Findings from the Technical SEO Audit

Company in Focus for Technical SEO audit: Khatabook

Khatabook is an Indian SaaS product made for India’s small and medium businesses. Backed by investors like Tital Capital and built by IIT Bombay alumni, Khatabook has turned centuries-old “Bahi Khata” system into a digital ledger, which is now being used by over 50 million merchants across 13 Indian languages.

Khatabook is available on mobile (iOS and Android) and Desktop, offering credit tracking, build reminders, nudged repayments, formal business loans through partnerships and more to shopkeepers, traders and manufacturers.

Khatabook is developed by Adj Utility Apps Private Limited. The team uses internal tools built on Appsmith to manage lead flows and sales ops, Plotline to launch targeted in‑app journeys that drove a 120% uplift in adoption of monetized features like Bulk Reminders, and Rocketium’s tech to generate thousands of personalized festive banners from spreadsheet inputs so merchants can run campaigns across social channels within a day.

They are a bootstrapped growth stage business with ₹13,830+ Million funding and 600 employees.

Site Structure and Tech Stack of Khatabook

Khatabook uses Webtrends analytics, jQuery-based Owl Carousel for UI interactions, New Relic for full-stack performance monitoring and serves images through Amazon S3. They also use popular CDNs like CDNjs and UNPKG to deliver front-end libraries.

Khatabook’s site structure looks like a content and feature-led organic traffic acquisition model. It is layered on top of an ecosystem that has a main app, GST and billing subdomains, desktop app that supports the fintech services it offers.

Within the stack, the presence of locale and category folders signals templated routing or CMS driven dynamic paths to spin out localised content at scale. The presence of UTM parameterized URLs and double slash (blog//) shows traces of experimentation and possible migration without perfect governance or a simple Technical site audit.

site structure hindi folder khatabook : Technical SEO Audit of Khatabook [Fintech]

site structure level3 gold rate khatabook : Technical SEO Audit of Khatabook [Fintech]

Why do you need to know about Khatabook’s Tech Stack and Business Model?

A business that lives and dies on app installs and SaaS metrics needs web SEO and supported journeys, such as blogs in various languages, to cater to the diverse regional diaspora of India. This means multiple variations of the same topic with crawlability and intent alignment potential issues.

Company scale and funding signal that the company can realistically implement fixes that would require development, such as page performance improvements, global hreflang implementation, sitemap refactoring, etc.

The information from the tech stack can inform you that you can expect to see image URLs on a separate host (such as s3.amazonaws.com) and analyse potential issues with image metadata or core web vitals when images are served. You can also analyse if the page is JS-heavy and if the rendering is happening properly, as it should, or if dynamically generated and injected content is being left out in render by search engine crawlers.

Khatabook developers have implemented New Relic and Webtrends on the site, which means you would have rich session-level and event-level data that you can use to correlate with SEO issues, such as core web vitals, or help with priorities, such as top pages getting the most hits from users across the web.

Executive Summary for Khatabook’s Technical SEO Audit

Khatabook’s technical SEO foundation is largely stable, but structural and on-page issues might be limiting organic growth and wasting precious crawl budget.

The main issues are around URL hygiene, discoverability and sitemap misalignment, along with a few 3XX, 4XX and 5XX issues that are manageable but non-trivial.

It is worth noting that the site passed sitemap file validity, redirect loops, duplicate content, soft 404 and lorem ipsum placeholder issues, but the current site structure doesn’t accurately signal pages that should be prioritised for crawl, indexing and ranking.

While the immediate opportunities include internally linking important pages to reduce crawl depth, eliminating orphans and normalising URL structures, other widespread issues worth taking a look at afterwards are missing and duplicate titles, meta descriptions and H1s and H2s.

The initiatives mentioned above should improve crawl efficiency, rankings stability on core fintech queries, CTR, and overall trust and usability of the product experience.

I use Screaming Frog for SEO technical Audits (It’s not a mandate)

I use Screaming Frog to quickly scan for issues. If you don’t have Screaming Frog, you can use Ahrefs, SEMrush, online web tools, or manual checks to achieve similar, if not better, results.

Get the Tech SEO Audit Checklist – Free

Access the complete checklist in the Excel sheet format here for free. Give credits only if you please 🙂

Let me know in the comments if this Tech SEO audit checklist helped you.

This is a One & Done Cursory Audit

Copy and use this technical SEO audit sheet of Khatabook for your own tech audits.

The purpose of this audit is to highlight sample issues identified across the site. The final deliverable, once the full audit is completed, will be a more detailed, refined, and thoroughly cleaned version of what is presented here.

For example, in the final audit, duplicate issues will be consolidated, a dependency tree will be mapped to show where solving one issue may resolve related ones. The SEO insights in the audit will help prioritise issues based on what’s most important or valuable for your organisation’s goals.

Key Findings

Sharing the findings here for people with some SEO knowledge. You can add more context, such as the impact or benefit of solving the issue, in addition to this summary, for presenting to a stakeholder, devs, or management.

Tech SEO Check	Findings
Crawlability and Indexing	The site has a healthy baseline of 11,252 indexable internal URLs with 200 status codes, but crawlability is hindered by 400+ 4XX URLs, 300+ 5XX URLs, and a few 3XX redirects.
XML Sitemaps	2,966 indexable URLs are missing from sitemaps entirely and 101 non-indexable URLs are present in sitemaps
URL Structure	URL hygiene is a critical structural weakness with 6,009 instances of problematic formatting (multiple slashes, spaces, non-ASCII characters, underscores, uppercase letters) and 57 URLs exceeding 115 characters.
Heading Structure	50 URLs are missing H1 tags, 3,037 URLs share duplicate H1s across the site and 3,095 URLs contain multiple H1s on a single page.
Page Titles	11 URLs are missing title tags entirely, 256 URLs use very short titles, and 3,770 URLs exceed 60 characters.
Meta Descriptions	3,665 URLs share duplicate meta descriptions, 10 URLs lack descriptions entirely, 74 URLs use overly short descriptions and 199 URLs exceed the recommended 160-character limit.
Content Quality	81 URLs have low or thin word count and 17 URLs have very difficult readability (Flesch score below 30).
Security and Protocol	78 URLs on the HTTPS site still reference HTTP URLs, triggering 301 redirects.
Images and Media	528 URLs with images are missing alt text attributes, and 199 URLs have images without width and height attributes in HTML.
Canonical Tags	32 URLs are missing canonical tags entirely.
Hreflang and Localization	11,219 URLs are missing the x-default hreflang declaration.
Internal Linking	11,211 URLs have excessively high external link counts, while only 22 URLs have excessively high internal link counts.
Structured Data	2,950 URLs lack any schema markup implementation.
Page Performance	PageSpeed Insights score of 50 (rated as “Average”) across top organic pages indicates performance below best-practice thresholds.
High Crawl Depth	11,250 URLs sit at high crawl depth (more than 3 clicks from the homepage).
HTML Structure	1,270 URLs contain invalid elements in the <head> section.

Interesting Findings from the Technical SEO Audit

Copy and use this sheet for your own tech audits.

Banner Ads and UTM Parameters are causing web traffic analytics issues

Khatabook has banner ads on its blog content, which encourages users to check out its desktop product. For this, they have used an image with CTA text. The URL appended to the image that leads to the product page reads like this:

https://khatabook.com/?utm_source=kb_website&utm_medium=blog&utm_campaign=khatabook-desktop_payroll-&-salary&utm_content=login_button

khatabook audit banner ads utm params : Technical SEO Audit of Khatabook [Fintech]

Example URLs where this issue exists:

https://khatabook.com/blog/fci-salary-and-job-profile/
https://khatabook.com/blog/pink-slip/
https://khatabook.com/blog/fssai-salary/

This is an issue because:

When a user clicks this link, Google Analytics starts a new session. The original traffic source (e.g., Organic Search) is lost and replaced by the internal UTM values. This inflates session counts and prevents accurate attribution of conversions to the original traffic source.
The utm_campaign parameter appears to be dynamic based on the specific blog post or category. This creates unique URL variations for the homepage that search engines must crawl. Each of these unique URL variations is self-canonicalised. This means that each of these urls are also causing duplicate page issues, canonicalisation issues, and more such issues, which are also flagged separately in the technical seo audit.

khatabook canonical utm parameters : Technical SEO Audit of Khatabook [Fintech]

Recommendation:

Use event tracking using gtag instead of URL parameters to track website visits from these ads.
All of these URL variations, if they need to exist for any reason, should have a proper canonical set to the actual homepage without any parameters in the URL.

Empty Category Pages

Several category pages exist but lack any content at all. These pages are still rendered with a header and footer and are linked to multiple pages of the website.

Example pages:

https://khatabook.com/blog/hi/category/accounting-and-inventory/
https://khatabook.com/blog/gu/category/ઈન્કમ-ટેક્સ/
https://khatabook.com/blog/ta/category/வருவாய்-வரி/

khatabook empty category pages : Technical SEO Audit of Khatabook [Fintech]

This is an issue because:

Crawlers and users can see these pages because these URLs are mentioned on prominent pages on the site. These are essentially soft 404 pages.
These URLs are not set to noindex or nofollow, which makes it easy to waste crawl budget.
A high volume of pages with little to no content (“Thin Content”) can signal to search engines that the overall quality of the domain is low, potentially affecting the ranking of valid pages.

Recommendation:

If the categories are intentional but currently empty, apply a noindex meta tag to prevent them from being indexed until they are populated with content.
Ensure the CMS does not automatically generate indexable URLs for tags or categories with zero associated posts.

Programmatic Content Injection

Khatabook’s website uses programmatic templates to dynamically inject the current date (e.g., “24th January 2026”) into page titles and headers to signal content freshness. However, the data feed for these pages is failing, resulting in the display of null or “0” values throughout the main content and historical data tables.

khatabook programmtic seo injection null : Technical SEO Audit of Khatabook [Fintech]

khatabook programmatic content null value : Technical SEO Audit of Khatabook [Fintech]

Example Pages:

https://khatabook.com/gold-rate-india/odisha/
https://khatabook.com/gold-rate-india/rajasthan/
https://khatabook.com/gold-rate-india/madhya-pradesh/
https://khatabook.com/gold-rate-india/west-bengal/

This is an issue because:

The title tag promises the latest market data (Today, 24th January), encouraging users to click from search results. When the page delivers null data (0), it creates a negative user experience (aka Pogo-sticking) where users immediately return to Google, signaling the page is irrelevant.
A page consisting of “0” values offers no utility. Search engines may classify this as low-quality automated content and users see it as an untrustworthy website, which in the financial domain is a big thing.

Redirect Chains and Mixed Status Codes

The website contains URLs that pass through redirect hops before reaching the final destination. There are instances where a 301 (Permanent) redirect is followed by a 302 (Temporary) redirect.

Example pages:

https://khatabook.com/blog/gst-calculators-essential-tools-for-gst-payments/
https://khatabook.com/blog/gratuity-calculator-online/
https://khatabook.com/blog/how-to-calculate-hra-online-hra-calculation/

khatabook redirect hops 301 302 : Technical SEO Audit of Khatabook [Fintech]

If you see the hops closely, you’ll see that the first hop is a simple 301 redirect. The next one happens because there is no trailing slash at the end of the URL, but it is a 302 redirect, which doesn’t seem right.

This is an issue because:

Each redirect adds a round-trip network request (RTT), increasing the Time to First Byte (TTFB) and slowing down the experience for users, particularly on mobile networks.
Using a 302 (Temporary) redirect for a structural change (adding a trailing slash) sends a conflicting signal to search engines compared to the initial 301 (Permanent) redirect.

This also uncovers another core structural issue:

Any URL without a trailing slash has a 302 redirect on the site.

The website enforces a URL structure that requires a trailing slash (e.g., changing /page to /page/). However, this site-wide enforcement is implemented using a 302 Found (Temporary) redirect status code instead of a 301 Moved Permanently status code.

Technical SEO Audit of Khatabook [Fintech]

Company in Focus for Technical SEO audit: Khatabook

Site Structure and Tech Stack of Khatabook

Why do you need to know about Khatabook’s Tech Stack and Business Model?