Data Science

Semi-Structured Data

Data that has some organizational structure but does not conform to a rigid schema like a relational database. Examples include JSON, XML, and HTML.

Why It Matters

Semi-structured data is the format of most APIs and web content. ML systems must parse and normalize it before it can be used for training.

Example

A JSON API response: {"user": {"name": "John", "orders": [{"id": 1, "amount": 29.99}]}} — it has structure but is flexible and can vary between records.

Think of it like...

Like a form letter with blanks — there is a template (structure) but the content varies and can include different amounts of information.

Related Terms