2 | Notion

general purpose data model: relational, document, graph • application은 layer가 축적되는 형태로 구성할 수 있으며 각 layer의 최상단은 data model ◦ application specific/JSON/XML/db-like/hw 친화적 등. • relational db ◦ business data processing을 기반으로 함 ◦ translation layer가 필요한 경우 존재 ◦ impedance mismatch: app 상의 data model과 db data model 간 mismatch. translation layer 사용 야기 ▪ ORM: Object-relational mapping. impedance mismatch를 줄이는 framework • Hibernate, ActiveRecord ◦ JSON/XML 기능의 도입으로 one-to-many relationship에서의 제약을 줄임 ◦ normalization: remove duplication ▪ machine-readable ID 활용 • 같은 이름의 항목 관리 용이 • 여러 항목 동시 업데이트 ◦ column 추가 시에도 조건 바탕으로 update 가능 • 동시 번역 등에 유리 • 실질적 의미가 없어 변경이 적음 • document db ◦ nosql ▪ scalability: rdb와 비교해 scalability 확보 용이하며 write throughput 좋음 ▪ query 문법 상 이점 ▪ rds보다 제약이 적고 oss 생태 확보되어 있음 ▪ MapReduce query: MongoDB, CouchDB 제공 • 상대적으로 type strict: side effect가 없어야 함 • mongoDB의 경우 js 기반의 함수 제공 ◦ JSON/XML의 경우 one-to-many relationship에 강점 ▪ mongoDB, RethinkDB, CouchDB, Espresso ▪ impedance mismatch에 강점 ◦ locality: task 관련 정보를 join 없이 한 번에 가져올 수 있음 ▪ 전체 data를 가져올 때 join이 필요없어 성능상 이점 ▪ 특정 data만 가져올 경우 불리 • db size를 작게 유지하는 게 유리 ◦ many-to-one relation에 약점. denormalized된 형태로 존재 ▪ code level에서 해결해야 함. ▪ data의 complexity가 높아질 경우 무방비 ▪ relational db가 초기에 겪은 문제점을 답습 • network model(CODASYL model) ◦ 여러 parent를 둬서 many-to-one 문제 해결 시도 ◦ 특정 parent 값을 찾기 위해 너무 많은 범위를 검색해야 함 ◦ graph-db의 edge 개념이 없어 item의 순서를 잘 고려해야 • relational model ◦ collection of tuple 형태. ◦ query optimizer가 고도로 설계되어야 함 ▪ data에 specific ▪ data의 크기가 커지면 결국 general-purpose solution보다 못해지는 경우 발생 ◦ false-tolerance 좋음 ◦ concurrency 좋음 ◦ schema flexibity/schemaless ▪ schema-on-read: dynamic type처럼 작성 시에는 비교적 자유로우나 runtime error 발생 ▪ 새 column 추가 시 전체 튜닝이 어렵고 코드로 해결이 일반적. ▪ data 형식의 자유: 한 column의 두 가지 이상의 data type 사용 가능 • graph-like db ◦ many-to-many relationship이 많은 경우 가장 유리 ◦ structure: vertices, edge 각각에 대한 relational table 두 개함로 구성된 구조 ▪ vertex: identifier, in/outcoming edges, properties • vertices는 edge로 연결되며 vertices간 type 제한 없음 ▪ edge: identifier, start/end vertices, label, properties • 양 vertices에 대한 정보 가지고 있음 • label을 통해 vertex간 관계 정의 ◦ query ▪ SQL로도 구현은 가능 ▪ cypher (vertex) -[:label]-> (vertex) 구조로 관계 정의

`CREATE (NAmerica:Location {name:'North America', type:'continent'}), (USA:Location {name:'United States', type:'country' }), (Idaho:Location {name:'Idaho', type:'state' }), (Lucy:Person {name:'Lucy' }), (Idaho) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica), (Lucy) -[:BORN_IN]-> (Idaho)

MATCH (person) -[:BORN_IN]-> () -[:WITHIN0..]-> (us:Location {name:'United States'}), (person) -[:LIVES_IN]-> () -[:WITHIN0..]-> (eu:Location {name:'Europe'}) RETURN person.name`

    ▪ RDF: XML과 연계하여 외부 url에서 필요한 vertice를 가져올 경우 사용

<http://example.com/page#label>

        • #뒤의 link가 edge의 label로 사용
        • web과의 연계에서 이점
    ▪ SPARQL: cypher와 비슷한 형태.

SELECT ?personName WHERE { ?person :name ?personName. ?person :bornIn / :within* / :name "United States". ?person :livesIn / :within* / :name "Europe". }

    ▪ Datalog: 초창기 graph-like db용 query language