VERITAS is a production-ready full-stack web app that classifies uploaded face images as REAL or AI GENERATED, returns confidence scores, and performs EXIF metadata analysis.
- Frontend: React + Vite + TailwindCSS (
localhost:3000) - Backend: Node.js + Express (
localhost:5000) - ML Engine: Python + FastAPI (
localhost:8000) - Model: EfficientNet-B0 (timm, fine-tuned)
STEP 1 — PRIMARY DATASET (Real + Fake Western Faces) Download from Kaggle: https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces
- This gives you 70k real + 70k fake faces at 256x256
- After download, copy:
- 7000 images from /real folder → /VeritasDetector/dataset/real/
- 8000 images from /fake folder → /VeritasDetector/dataset/fake/
STEP 2 — INDIAN REAL FACES (Critical for Indian accuracy) Download Dataset A from Kaggle: https://www.kaggle.com/datasets/havingfun/indian-celebrities-faces
- Copy 2000 images → /VeritasDetector/dataset/real/
Download Dataset B from Kaggle: https://www.kaggle.com/datasets/jangedoo/utkface-new
- Run this command after download to extract Indian faces: python ml_engine/dataset_prep.py --source /path/to/UTKFace --dest dataset/real
- This auto-filters race=3 (Indian) and copies ~2000 faces
Download Dataset C from Kaggle: https://www.kaggle.com/datasets/sushilyadav1998/bollywood-celeb-localized-face-dataset
- Copy 1000 images → /VeritasDetector/dataset/real/
STEP 3 — INDIAN FAKE FACES (Critical for Indian fake detection) Go to Google Colab: https://colab.research.google.com Create new notebook and run this code to generate 2000 Indian AI fake faces using Stable Diffusion:
--- COLAB CODE START --- !pip install diffusers transformers accelerate torch -q
from diffusers import StableDiffusionPipeline import torch, os, uuid from google.colab import files
pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, safety_checker=None ) pipe = pipe.to("cuda") pipe.enable_attention_slicing()
os.makedirs("indian_fakes", exist_ok=True)
prompts = [ "realistic portrait photo of indian man face, professional headshot, sharp focus, 4k", "realistic portrait photo of indian woman face, professional headshot, sharp focus, 4k", "closeup face photo south asian man natural lighting realistic photography", "closeup face photo south asian woman natural lighting realistic photography", "indian person professional linkedin profile photo realistic face sharp", "realistic face photo of young indian man plain background", "realistic face photo of young indian woman plain background", "south asian man face photo natural sunlight outdoor realistic", ]
negative = "cartoon, anime, painting, illustration, blur, deformed, ugly, watermark, text"
count = 0 target = 2000
for i in range(250): for prompt in prompts: if count >= target: break try: img = pipe( prompt, negative_prompt=negative, num_inference_steps=25, guidance_scale=7.5, width=512, height=512 ).images[0] img = img.resize((256, 256)) img.save(f"indian_fakes/sd_indian_fake_{count:05d}.jpg", quality=95) count += 1 if count % 50 == 0: print(f"Generated {count}/{target}") except Exception as e: print(f"Error: {e}") continue
print(f"Done! Generated {count} images")
import shutil shutil.make_archive("indian_fakes", "zip", "indian_fakes") files.download("indian_fakes.zip") --- COLAB CODE END ---
After download, unzip and copy all images → /VeritasDetector/dataset/fake/
FINAL DATASET COUNT CHECK: dataset/real/ should have ~12,000 images total dataset/fake/ should have ~10,000 images total Run: python ml_engine/dataset_prep.py --check to verify counts
- Create a Kaggle account and install Kaggle CLI:
pip3 install kaggle
- Download your Kaggle API token (
kaggle.json) from Kaggle Account settings. - Place token:
mkdir -p ~/.kagglemv /path/to/kaggle.json ~/.kaggle/kaggle.jsonchmod 600 ~/.kaggle/kaggle.json
- From project root (
VeritasDetector), create temp download folders:mkdir -p downloads/kaggle
- Download 140k dataset:
kaggle datasets download -d xhlulu/140k-real-and-fake-faces -p downloads/kaggleunzip -o downloads/kaggle/140k-real-and-fake-faces.zip -d downloads/kaggle/140k
- Copy required subset into dataset folders (choose any 7000 real + 8000 fake):
find downloads/kaggle/140k -type f \( -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" \) | rg '/real/' | head -n 7000 | while read -r f; do cp "$f" dataset/real/; donefind downloads/kaggle/140k -type f \( -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" \) | rg '/fake/' | head -n 8000 | while read -r f; do cp "$f" dataset/fake/; done
- Download Indian Celebrities dataset:
kaggle datasets download -d havingfun/indian-celebrities-faces -p downloads/kaggleunzip -o downloads/kaggle/indian-celebrities-faces.zip -d downloads/kaggle/indian_celebfind downloads/kaggle/indian_celeb -type f \( -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" \) | head -n 2000 | while read -r f; do cp "$f" dataset/real/; done
- Download UTKFace and extract Indian faces:
kaggle datasets download -d jangedoo/utkface-new -p downloads/kaggleunzip -o downloads/kaggle/utkface-new.zip -d downloads/kaggle/utkfacepython ml_engine/dataset_prep.py --source downloads/kaggle/utkface --dest dataset/real
- Download Bollywood localized faces and copy 1000 images:
kaggle datasets download -d sushilyadav1998/bollywood-celeb-localized-face-dataset -p downloads/kaggleunzip -o downloads/kaggle/bollywood-celeb-localized-face-dataset.zip -d downloads/kaggle/bollywoodfind downloads/kaggle/bollywood -type f \( -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" \) | head -n 1000 | while read -r f; do cp "$f" dataset/real/; done
- Generate Indian fakes in Colab using the provided notebook block, then unzip and copy outputs:
unzip -o /path/to/indian_fakes.zip -d downloads/indian_fakesfind downloads/indian_fakes -type f \( -iname "*.jpg" -o -iname "*.jpeg" -o -iname "*.png" \) | while read -r f; do cp "$f" dataset/fake/; done
- Verify final counts:
python ml_engine/dataset_prep.py --check
From project root:
chmod +x setup.sh && ./setup.sh- Fill
dataset/realanddataset/fakefirst. - Train model:
cd ml_engine && python train.py- Start ML API (requires
ml_engine/model.pth):
cd ml_engine && python api.py- Start Node backend:
cd server && node server.js- Start frontend:
cd frontend && npm run devOpen http://localhost:3000.
- Upload a known real smartphone photo.
- Upload a known AI-generated face.
- Confirm verdict, confidence score, and metadata panel update correctly.
- Stop ML server and test upload again to verify graceful offline handling.
- Western faces: 90-95%
- Indian/South Asian faces: 85-90%
Accuracy depends heavily on dataset balance, image quality, and training quality.
- Error: model.pth not found
- Run
cd ml_engine && python train.pyfirst.
- Run
- ML engine offline in frontend
- Start ML API:
cd ml_engine && python api.py.
- Start ML API:
- Backend offline in frontend
- Start backend:
cd server && node server.js.
- Start backend:
- Training exits early with dataset errors
- Ensure
dataset/realanddataset/fakecontain images. - Run
python ml_engine/dataset_prep.py --check.
- Ensure
- Kaggle CLI unauthorized
- Verify
~/.kaggle/kaggle.jsonexists and permission is600.
- Verify
- Slow training
- Confirm MPS is available on Mac (
torch.backends.mps.is_available()).
- Confirm MPS is available on Mac (
This repo includes a Blueprint file: render.yaml.
- Render deploys from a Git provider repo (GitHub/GitLab/Bitbucket).
- The ML service needs
ml_engine/model.pthat runtime. ml_engine/model.pthis ignored by.gitignore, so for deployment either:- upload the model to a public URL and add startup download logic, or
- force-add model file to git for this deployment:
git add -f ml_engine/model.pth
git init
git add .
git add -f ml_engine/model.pth
git commit -m "VERITAS ready for Render deployment"
git branch -M main
git remote add origin https://github.com/<your-user>/<your-repo>.git
git push -u origin mainOpen:
https://dashboard.render.com/blueprint/new?repo=https://github.com/<your-user>/<your-repo>
ML_ENGINE_URL(forveritas-server):- set to your ML service URL after it is created, for example:
https://veritas-ml-engine.onrender.com
CORS_ORIGINS(forveritas-server):- set to your frontend URL, for example:
https://veritas-frontend.onrender.com
VITE_API_BASE_URL(forveritas-frontend):- set to backend API URL:
https://veritas-server.onrender.com/api
- Deploy
veritas-ml-engineand verify/health. - Set
ML_ENGINE_URLinveritas-server, then deploy backend. - Set
VITE_API_BASE_URLandCORS_ORIGINS, then deploy frontend.
- Backend health:
https://<server>.onrender.com/api/health - ML health:
https://<ml>.onrender.com/health - Frontend:
https://<frontend>.onrender.com