
THVD Dataset
High end 4k video dataset for stress testing and training models.
About
We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 600 hours of footage and featuring 23,841 unique identities from around the world.
Who Can Use It
List examples of intended users and their use cases:
Data Scientists: Training machine learning models for video-based AI applications.

Researchers: Studying human behavior, facial analysis, or video AI advancements.

Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.
Distribution
Detailing the format, size, and structure of the dataset:
Data Volume:Total Size | 2.5TB |
Total Videos | 47,573 |
Identities Covered | 20,841 |
Resolution | 60% 4K (1980), 33% Full HD (1080) |
Formats | MP4 |
Full-length videos with visible mouth movements in every frame. | |
Minimum face size of 400 pixels. | |
Video durations range from 20 seconds to 5 minutes. | |
Faces have not been cut out, full-screen videos including backgrounds. |
Usage
This dataset is ideal for a variety of applications:
Coverage
Explaining the scope and coverage of the dataset:- Geographic Coverage: Worldwide
- Time Range: Time range and size of the videos have been noted in the CSV file.
- Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.
English | 23,839 videos |
Polish | 1,818 videos |
Arabian | 1,691 videos |
Dutch | 1,668 videos |
Japanese | 1,433 videos |
Portuguese | 1,359 videos |
Deuch | 1,281 videos |
Turkish | 1,245 videos |
Hindi | 1,194 videos |
Indonesian | 1,182 videos |
Romanian | 1,144 videos |
French | 1,107 videos |
Swedish | 1,059 videos |
Greek | 1,006 videos |
Italian | 1,006 videos |
Tagalog | 924 videos |
Spanish | 688 videos |
Czech | 590 videos |
Norwegian | 586 videos |
Chinese (cn) | 444 videos |
Chinese (tw) | 241 videos |
Bulgarian | 340 videos |
Statistics
Gender
Male: | 31,830 |
Female: | 15,509 |
Others: | 234 |
Age
20-29: | 23,904 |
30-39: | 17,003 |
40-49: | 3,561 |
Others: | 3,105 |
Race
White: | 33,280 |
Asian: | 9,123 |
Black: | 3,556 |
Indian: | 1,380 |
Resolution
2160p: | 24,856 |
1440p: | 296 |
1080p: | 21,964 |
720p: | 457 |
Additional Notes
Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, downsampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 20GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file. We’d be happy to provide example videos selected by the potential buyer.