This document provides an overview of several document database technologies including MongoDB, CouchDB, and RavenDB. It discusses key architectural considerations for using document databases such as their schema-free model, eventual consistency, ability to model object aggregates, scaling out through sharding and replication, need for queries to use indexes, and ongoing administration requirements. It also presents two case studies where document databases were used for a survey system and a CRM.
Polyglot Persistence - Two Great Tastes That Taste Great TogetherJohn Wood
The days of the relational database being a one-stop-shop for all of your persistence needs are over. Although NoSQL databases address some issues that can’t be addressed by relational databases, the opposite is true as well. The relational database offers an unparalleled feature set and rock solid stability. One cannot underestimate the importance of using the right tool for the job, and for some jobs, one tool is not enough. This talk focuses on the strength and weaknesses of both relational and NoSQL databases, the benefits and challenges of polyglot persistence, and examples of polyglot persistence in the wild.
These slides were presented at WindyCityDB 2010.
eBay has been using MongoDB to power several large scale applications due to its ability to scale horizontally and handle high volumes of data and queries. Some key applications summarized include search suggestions which handles high volume lookups, a cloud manager which tracks resource states, and a media metadata store which will store tens of terabytes of image data partitioned across multiple shards. MongoDB provides benefits like flexible schemas, automatic shard rebalancing, and proximity-aware replication which help eBay meet performance and scalability needs for these systems.
What SharePoint Admins need to know about SQL-CinncinatiJ.D. Wade
Does you know there are numerous settings changes you should be making on your SQL Server for your SharePoint farm? Do you know there are settings in SharePoint that you should never change if you wish to maintain SQL performance? This session reviews how to properly setup and maintain SQL Server for a SharePoint farm. You will learn how SharePoint is optimized for SQL, how to properly manage and maintain the SharePoint databases, how to optimize the SQL configuration for SharePoint, what settings in SharePoint need to be changed or not changed to maintain SQL Server performance, and supported methods for providing high availability and disaster recovery.
The document discusses NoSQL databases, describing their characteristics like being non-relational, scalable, and schema-free. It covers different types of NoSQL databases like key-value stores, wide column stores, document stores, and graph databases. The document also discusses where NoSQL databases are particularly useful compared to relational databases and gives examples of companies using NoSQL.
Introduction to CosmosDB - Azure Bootcamp 2018Josh Carlisle
Josh Carlisle introduces Azure Cosmos DB, a globally distributed, multi-model database service. Cosmos DB offers turnkey global distribution, high availability up to 99.999%, and low latency reads and writes typically under 10ms. It uses request units to reserve throughput and ensure service level agreements. Cosmos DB supports multiple APIs including MongoDB, SQL, Cassandra, and table storage and scales elastically.
Connected at the hip for MS BI: SharePoint and SQLJ.D. Wade
This document summarizes a presentation about connecting SharePoint and SQL using Microsoft Business Intelligence (BI) tools. It discusses SQL Reporting Services and SQL Analysis Services integrated with SharePoint, including supported versions. It also covers topics like delegation and Kerberos authentication required to share data across servers and domains. The presentation provides an overview of configurations and components involved in setting up these Microsoft BI solutions integrated with SharePoint and SQL.
In this talk from the Dublin Websummit 2014 AWS Technical Evangelist Danilo Poccia discusses NoSQL technology.
Includes an introduction to NoSQL DB and a discussion of when it is time to consider NoSQL.
Danilo also introduces Amazon DynamoDB as a NoSQL solution and talks through several case studies of customers that are using Amazon DynamoDB today.
SPS Kansas City: What SharePoint Admin need to know about SQLJ.D. Wade
You will learn how SharePoint is optimized for SQL, how to properly manage and maintain the SharePoint databases, how to optimize the SQL configuration for SharePoint, what settings in SharePoint need to be changed or not changed to maintain SQL Server performance, and supported methods for providing high availability and disaster recovery.
What SQL DBA's need to know about SharePointJ.D. Wade
With the number of deployments of SharePoint exponentially growing every day, as a DBA, it is very likely you are going to have SharePoint databases on SQL Servers you support. This session reviews SharePoint strictly from the SQL Server perspective. You will learn how SharePoint is optimized for SQL, how to properly manage and maintain the SharePoint databases, how to optimize the SQL configuration for SharePoint, what settings in SharePoint need to be changed or not changed to maintain SQL Server performance, and supported methods for providing high availability and disaster recovery.
Oren Eini discusses the next major version of RavenDB 4.0, running on the CoreCLR, and skim over topics of performance (much higher), flexibility and ease of use.
MongoDB 3.2 introduces a host of new features and benefits, including encryption at rest, document validation, MongoDB Compass, numerous improvements to queries and the aggregation framework, and more. To take advantage of these features, your team needs an upgrade plan.
In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. By the end, you should be prepared to start developing an upgrade plan for your deployment.
This is an introduction to relational and non-relational databases and how their performance affects scaling a web application.
This is a recording of a guest Lecture I gave at the University of Texas school of Information.
In this talk I address the technologies and tools Gowalla (gowalla.com) uses including memcache, redis and cassandra.
Find more on my blog:
http://schneems.com
Hardware planning & sizing for sql serverDavide Mauri
This document provides an overview of hardware planning and sizing considerations for SQL Server. It discusses that performance is the typical requirement for relational database management systems. While high performance is expected, typical server hardware configurations often result in unbalanced systems that are not optimized. The document advocates for balanced systems with no single bottleneck. It provides guidance on evaluating CPU, memory, I/O capabilities and storage to ensure a system can handle peak resource consumption. Baseline testing is recommended to compare hardware performance.
NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these heterogeneous database systems. MongoDB, CouchDB, Cassandra, HBase, Memcaches etc belong to one of 4 families and a common workload can be generated by ycsb to simulate your usecase and benchmark them.
1) RDI uses RavenDB embedded in over 36,000 restaurants with around 500,000 individual machines processing $50,000 per second in payments.
2) RavenDB allows for unit testing without mocking the database, advanced statistics on data persistence, and transparent replication with high availability.
3) The challenges of using RavenDB on specialized older hardware with low memory and ESENT include fine-tuning memory usage, disabling caching, and automating recovery from unclean shutdowns with ESENTUTL.EXE.
MS DevDay - SQLServer 2014 for DevelopersДенис Резник
Presentation about hidden treasures inside SQL Server 2014. It was 30 min presentation about 4 features: Cardinality Estimator, Query and Plan Fingerprints, Delayed Durability and TempDB performance.
Boost the Performance of SharePoint Today!Brian Culver
Is your farm struggling to server your organization? How long is it taking between page requests? Where is your bottleneck in your farm? Is your SQL Server tuned properly? Worried about upgrading due to poor performance? We will look at various tools for analyzing and measuring performance of your farm. We will look at simple SharePoint and IIS configuration options to instantly improve performance. I will discuss advanced approaches for analyzing, measuring and implementing optimizations in your farm as well as Performance Improvements in SharePoint 2013.
This document provides an overview and introduction to Cosmos DB. It discusses what Cosmos DB is, its data models, APIs, partitioning, and global distribution. It explains why Cosmos DB was created to address limitations of traditional databases. Key aspects covered include throughput and consistency levels, indexing, backups, failovers, and using Cosmos DB for developers and database administrators. The document also discusses migration tools, limitations, and integrations with PowerBI and geospatial data.
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013J.D. Wade
This document provides an overview and guidance for SQL DBAs on key topics related to managing databases for SharePoint. It covers SharePoint database types and schema, performance considerations like server setup, database management, and operations. It also discusses high availability and disaster recovery options like clustering, mirroring and AlwaysOn availability groups. Other sections address the SharePoint kitchen sink of applications, business intelligence integration, and remote blob storage.
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
This presentation gives an brief overview of the history of relational databases, ACID and SQL and presents some of the key strentgths and potential weaknesses. It introduces the rise of NoSQL - why it arose, what is entails, when to use it. The presentation focuses on MongoDB as prime example of NoSQL document store and it shows how to interact with MongoDB from JavaScript (NodeJS) and Java.
Technical feature review of features introduced by MongoDB 3.4 on graph capabilities, MongoDB UI tool: Compass, improvements on the replication and aggregation framework stages and utils. Operations improvements on Ops Manager and MongoDB Atlas.
Microsoft Azure DocumentDB - Global Azure Bootcamp 2016Sunny Sharma
Microsoft Azure DocumentDB is a fully managed NoSQL database service that supports JSON documents and SQL queries. It provides tunable consistency levels from strong to eventual, excellent search capabilities without SQL, and a REST API. Documents are stored in collections and addressed through a unique ID. Operations include CRUD and querying documents. DocumentDB also supports server-side JavaScript for stored procedures, triggers, and user-defined functions.
Azure SQL Database is a relational database-as-a-service hosted in the Azure cloud that reduces costs by eliminating the need to manage virtual machines, operating systems, or database software. It provides automatic backups, high availability through geo-replication, and the ability to scale performance by changing service tiers. Azure Cosmos DB is a globally distributed, multi-model database that supports automatic indexing, multiple data models via different APIs, and configurable consistency levels with strong performance guarantees. Azure Redis Cache uses the open-source Redis data structure store with managed caching instances in Azure for improved application performance.
Make Text Search "Work" for Your Apps - JavaOne 2013javagroup2006
This document summarizes a presentation on implementing effective text search in applications with relational databases. It discusses key aspects of text search like inverted indexes, relevance ranking, and differences from traditional database searches. The presentation provides design principles for text search apps, including ensuring basic searches work perfectly, using text indexes for applicable views, accommodating index re-creation, and avoiding treating text engines as relational stores. Popular Java text search libraries and platforms are also mentioned.
MongoDB BY VASUDEV PRAJAPATI, DOCUMENT BASED DATABASE PRESENTATION, NO SQL DATABASE, WHY MONGO DB IS USEFUL, USES OF MONGO DB, FEATURES OF MongoDB , WHO USE MongoDB , WHEN MongoDB IS USEFUL
The document provides an introduction to database management systems (DBMS). It discusses what a database is and the key components of a DBMS, including data, information, and the database management system itself. It also summarizes common database types and characteristics, as well as the purpose and advantages of using a database system compared to traditional file processing.
- MongoDB is well-suited for systems of engagement that have demanding real-time requirements, diverse and mixed data sets, massive concurrency, global deployment, and no downtime tolerance.
- It performs well for workloads with mixed reads, writes, and updates and scales horizontally on demand. However, it is less suited for analytical workloads, data warehousing, business intelligence, or transaction processing workloads.
- MongoDB shines for use cases involving single views of data, mobile and geospatial applications, real-time analytics, catalogs, personalization, content management, and log aggregation. It is less optimal for workloads requiring joins, full collection scans, high-latency writes, or five nines u
ГАННА КАПЛУН «noSQL vs SQL: порівняння використання реляційних та нереляційни...QADay
This document compares noSQL and SQL databases, providing examples of each. NoSQL databases are non-relational and have dynamic schemas while SQL databases are relational and have predefined schemas. Some common noSQL databases mentioned are MongoDB, DynamoDB, Cassandra, and Neo4j, while examples of SQL databases include Oracle, MySQL, PostgreSQL, and SQLite. The document then discusses using MongoDB for a production application that stores tree-structured and dynamic data more easily than a SQL database would. It also covers combining MongoDB and Oracle by storing documents in Oracle and metadata in MongoDB.
An overview of various database technologies and their underlying mechanisms over time.
Presentation delivered at Alliander internally to inspire the use of and forster the interest in new (NOSQL) technologies. 18 September 2012
The document provides an agenda for a two-day training on NoSQL and MongoDB. Day 1 covers an introduction to NoSQL concepts like distributed and decentralized databases, CAP theorem, and different types of NoSQL databases including key-value, column-oriented, and document-oriented databases. It also covers functions and indexing in MongoDB. Day 2 focuses on specific MongoDB topics like aggregation framework, sharding, queries, schema-less design, and indexing.
The document discusses the history and concepts of NoSQL databases. It notes that traditional single-processor relational database management systems (RDBMS) struggled to handle the increasing volume, velocity, variability, and agility of data due to various limitations. This led engineers to explore scaled-out solutions using multiple processors and NoSQL databases, which embrace concepts like horizontal scaling, schema flexibility, and high performance on commodity hardware. Popular NoSQL database models include key-value stores, column-oriented databases, document stores, and graph databases.
Relational databases store data in tables with rows and columns, enforcing strict relationships between data points. NoSQL databases use various models like documents, key-value pairs, or graphs, providing a more flexible structure for diverse data types.
System design for video streaming serviceNirmik Kale
This is my presentation for a "Streaming Service" like Netflix or Amazon Prime.
This was a part of an Interview I did woth a company so there is a lot of text explaining all components in detail.
Dropping ACID: Wrapping Your Mind Around NoSQL DatabasesKyle Banerjee
This document discusses NoSQL databases as an alternative to traditional relational databases. It provides an overview of different types of NoSQL databases like document stores, wide column stores, key-value stores and graph databases. It also discusses advantages of NoSQL databases like horizontal scalability and ease of use with large amounts of unstructured data, as well as disadvantages like lack of transactions and joins. The document recommends choosing a database based on the type of queries, data size, read/write needs, and whether the data needs to be accessed by other applications.
This document provides an overview of NoSQL databases. It discusses that NoSQL databases are non-relational and do not follow the RDBMS principles. It describes some of the main types of NoSQL databases including document stores, key-value stores, column-oriented stores, and graph databases. It also discusses how NoSQL databases are designed for massive scalability and do not guarantee ACID properties, instead following a BASE model ofBasically Available, Soft state, and Eventually Consistent.
Modern databases can be categorized as memory based distributed transactional databases, column stores, NoSQL distributed document stores, NoSQL distributed key-value stores, NoSQL distributed data stores using Apache Lucene, distributed data stores supporting ACID transactions, and graph databases. Each has advantages for different data and query requirements regarding performance, scalability, data structure, and transaction support. The document provides examples of databases for each category.
01-Database Administration and Management.pdfTOUSEEQHAIDER14
This document provides an introduction and overview of database systems. It discusses the purpose of database systems in addressing issues with file-based data storage like data redundancy, inconsistent data, and difficulty of data access. It also describes database applications, data models, database languages like SQL, database design, database architecture, and the major components of a database system including the storage manager, query processor, and transaction manager.
Dates and times are one of the most common problems programmers encounter - so why is it that we so often get this space wrong? Let's discuss where our current Date goes bad, better ways to model the date and time space in our code, and how the Temporal proposal making its way through TC39 helps us write correct code for every person in every time zone.
This document discusses challenges in maintaining open source projects when business priorities must also be considered. It describes two types of maintainers - Huggers who are community-focused and Experts who have deep technical knowledge. When hiring maintainers, companies should be clear on job responsibilities and ensure the maintainer's open source work can continue. Maintainers thrive with clear priorities balancing business and community goals, and avoiding boredom through mentorship and community building opportunities. Leveling up the relationship involves empowering maintainers to advance open source within the organization.
This document discusses issues with the native Date object in JavaScript and how Moment.js addresses these issues. It outlines 5 main problems with Date: 1) UTC and local time context switches are confusing, 2) there is no time zone support, 3) parsing dates from strings is unreliable, 4) formatting options are limited, and 5) date math APIs are bare-bones. It then explains how Moment.js provides clear and reliable functions for parsing, formatting, and manipulating dates while properly handling time zones and edge cases. The future of Moment.js and potential collaboration on a new native Date API in JavaScript is also mentioned.
Date and time can be complicated due to different perspectives of local time versus coordinated universal time and assumptions that do not always hold true. Time zones are political and subject to change, and date and time calculations involve subtleties around daylight saving time transitions, leap years, and distinguishing between date math and time math. To properly work with dates and times, it is important to consider all perspectives, make no assumptions, use a quality library like Moment.js that handles time zone and parsing complexities, and remember that time zones are not fixed and can change rapidly.
This document discusses various perspectives on date and time, including absolute time, local time, time zones, and assumptions that can lead to problems. It notes that date and time concepts like time zones are political and can change. Time and date math work differently, and dates may not exist in all time zones. Quality libraries like Moment.js can help avoid issues with the JavaScript Date object. Overall, the document emphasizes considering multiple perspectives, distinguishing between absolute and local time, remembering time zones are not static, and avoiding assumptions regarding dates and times.
The document discusses various perspectives and assumptions related to working with dates and times. It explains that dates and times can be viewed differently depending on location and time zone. Common assumptions around dates and times, such as the number of hours in a day or that time zone offsets never change, are problematic. The document emphasizes considering multiple perspectives, distinguishing between local and absolute time, accounting for time zone changes, using quality libraries, and avoiding assumptions when working with dates and times.
It Depends - Database admin for developers - Rev 20151205Maggie Pint
The document provides an overview of common database administration tasks performed by DBAs and tips for developers to improve database performance when troubleshooting slow queries. It discusses using dynamic management views to identify wait types, poorly performing queries, missing indexes, and updating statistics. Specific techniques are presented for addressing PageIOLatch, LCK_M_XX, and SOS_Scheduler_Yield waits through indexing, transaction optimization, and simplifying complex queries respectively.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
DefCamp_2016_Chemerkin_Yury-publish.pdf - Presentation by Yury Chemerkin at DefCamp 2016 discussing mobile app vulnerabilities, data protection issues, and analysis of security levels across different types of mobile applications.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Demystifying Neural Networks And Building Cybersecurity ApplicationsPriyanka Aash
In today's rapidly evolving technological landscape, Artificial Neural Networks (ANNs) have emerged as a cornerstone of artificial intelligence, revolutionizing various fields including cybersecurity. Inspired by the intricacies of the human brain, ANNs have a rich history and a complex structure that enables them to learn and make decisions. This blog aims to unravel the mysteries of neural networks, explore their mathematical foundations, and demonstrate their practical applications, particularly in building robust malware detection systems using Convolutional Neural Networks (CNNs).
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPathCommunity
Welcome to our third live UiPath Community Day Amsterdam! Come join us for a half-day of networking and UiPath Platform deep-dives, for devs and non-devs alike, in the middle of summer ☀.
📕 Agenda:
12:30 Welcome Coffee/Light Lunch ☕
13:00 Event opening speech
Ebert Knol, Managing Partner, Tacstone Technology
Jonathan Smith, UiPath MVP, RPA Lead, Ciphix
Cristina Vidu, Senior Marketing Manager, UiPath Community EMEA
Dion Mes, Principal Sales Engineer, UiPath
13:15 ASML: RPA as Tactical Automation
Tactical robotic process automation for solving short-term challenges, while establishing standard and re-usable interfaces that fit IT's long-term goals and objectives.
Yannic Suurmeijer, System Architect, ASML
13:30 PostNL: an insight into RPA at PostNL
Showcasing the solutions our automations have provided, the challenges we’ve faced, and the best practices we’ve developed to support our logistics operations.
Leonard Renne, RPA Developer, PostNL
13:45 Break (30')
14:15 Breakout Sessions: Round 1
Modern Document Understanding in the cloud platform: AI-driven UiPath Document Understanding
Mike Bos, Senior Automation Developer, Tacstone Technology
Process Orchestration: scale up and have your Robots work in harmony
Jon Smith, UiPath MVP, RPA Lead, Ciphix
UiPath Integration Service: connect applications, leverage prebuilt connectors, and set up customer connectors
Johans Brink, CTO, MvR digital workforce
15:00 Breakout Sessions: Round 2
Automation, and GenAI: practical use cases for value generation
Thomas Janssen, UiPath MVP, Senior Automation Developer, Automation Heroes
Human in the Loop/Action Center
Dion Mes, Principal Sales Engineer @UiPath
Improving development with coded workflows
Idris Janszen, Technical Consultant, Ilionx
15:45 End remarks
16:00 Community fun games, sharing knowledge, drinks, and bites 🍻
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
"Making .NET Application Even Faster", Sergey Teplyakov.pptxFwdays
In this talk we're going to explore performance improvement lifecycle, starting with setting the performance goals, using profilers to figure out the bottle necks, making a fix and validating that the fix works by benchmarking it. The talk will be useful for novice and seasoned .NET developers and architects interested in making their application fast and understanding how things work under the hood.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
This PDF delves into the aspects of information security from a forensic perspective, focusing on privacy leaks. It provides insights into the methods and tools used in forensic investigations to uncover and mitigate privacy breaches in mobile and cloud environments.
TrustArc Webinar - Innovating with TRUSTe Responsible AI CertificationTrustArc
In a landmark year marked by significant AI advancements, it’s vital to prioritize transparency, accountability, and respect for privacy rights with your AI innovation.
Learn how to navigate the shifting AI landscape with our innovative solution TRUSTe Responsible AI Certification, the first AI certification designed for data protection and privacy. Crafted by a team with 10,000+ privacy certifications issued, this framework integrated industry standards and laws for responsible AI governance.
This webinar will review:
- How compliance can play a role in the development and deployment of AI systems
- How to model trust and transparency across products and services
- How to save time and work smarter in understanding regulatory obligations, including AI
- How to operationalize and deploy AI governance best practices in your organization
5. MongoDB
•Dominant player in document databases
•Runs on nearly all platforms
•Strongly Consistent in default configuration
•Indexes are similar to traditional SQL indexes in nature
•Stores data in customized Binary JSON (BSON) format that allows typing
•Limit support for cross-collection querying in latest release
•Client API’s available in tons of languages
•Must use a third party provider like SOLR for advanced search capabilities
6. CouchDB
•Stores documents in plain JSON format
•Eventually consistent
•Indexes are map-reduce and defined in Javascript
•Clients in many languages
•Runs on Linux, OSX and Windows
•CouchDB-Lucene provides a Lucene integration for search
7. RavenDB
•Stores documents in plain JSON format
•Eventually consistent
•Indexes are built on Lucene. Lucene search is native to RavenDB.
•Server only runs on Windows
•.NET, Java, and HTTP Clients
•Limited support for cross-collection querying
8. Other Players
•Azure DocumentDB
• Very new product from Microsoft
•ReactDB
• Open source project that integrates push notifications into the database
•Cloudant
• IBM proprietary implementation of CouchDB
•DynamoDB
• Mixed model key value and document database
10. How do document databases work?
•Stores related data in a single document
•Usually uses JSON format for documents
•Enables the storage of complex object graphs together, instead of normalizing data out into
tables
•Stores documents in collections of the same type
•Allows querying within collections
•Does not typically allow querying across collections
•Offers high availability at the cost of consistency
11. Consideration: Schema Free
PROS
Easy to add properties
Simple migrations
Tolerant of differing data
CONS
Have to account for properties being missing
12. ACID
Atomicity
◦ Each transaction is all or nothing
Consistency
◦ Any transaction brings the database from one valid state to another
Isolation
◦ System ensures that transactions operated concurrently bring the database to the same state as if they
had been operated serially
Durability
◦ Once a transaction is committed, it remains so even in the event of power loss, etc
13. ACID in Document Databases
•Traditional transaction support is not available in any document database
•Document databases do support something like transactions within the scope of a document
•This makes document databases generally inappropriate for a wide variety of applications
16. Requirements
•An administration area is used to define ‘Surveys’.
• Surveys have Questions
• Questions have answers
•Surveys can be administrated in sets called workflows
•When a survey changes, this change can only apply to surveys moving forward
• Because of this, each user must receive a survey ‘instance’ to track the version of the survey he/she got
17. A Traditional SQL Schema
•With various other requirements not described here, this schema came out to 83 tables
•For one of our heaviest usage clients, the average user would have 119 answers in the ‘Saved
Answer’ table
•With over 200,000 users after two years of use, the ‘Saved Answer’ table had 24,014,330 rows
•This table was both read and write heavy, so it was extremely difficult to define effective SQL
indexes
•The hardware cost for these SQL servers was astronomical
•This sucked
18. Designing Documents
•An aggregate is a collection of objects that can be treated as one
•An aggregate root is the object that contains all other objects inside of it
•When designing document schema, find your aggregates and create documents around them
•If you have an entity, it should be persisted as it’s own document because you will likely have to
store references to it
19. Survey System Design
•A combination SQL and Document DB design was used
•Survey Templates (one type of entity) were put into the SQL Database
•When a survey was assigned to a user as part of a workflow (another entity, and also an
aggregate), it’s data at that time was put into the document database
•The user’s responses were saved as part of the workflow document
•Reading a user’s application data became as simple as making one request for her workflow
document
20. Consideration: Models Aggregates Well
PROS
Improves performance by reducing lookups
Allows for easy persistence of object oriented
designs
CONS
none
21. Sharding
•Sharding is the practice of distributing data across multiple servers
•All major document database providers support sharding natively
•Document Databases are ideal for sharding because document data is self contained (less need
to worry about a query having to run on two servers)
•Sharding is usually accomplished by selecting a shard key for a collection, and allowing the
collection to be distributed to different nodes based on that key
•Tenant Id and geographic regions are typical choices for shard keys
22. Replication
•All major document database providers support replication
•In most replication setups, a primary node takes all write operations, and a secondary node
asynchronously replicates these write operations
•In the event of a failure of the primary, the secondary begins to take write operations
•MongoDB can be configured to allow reads from secondaries as a performance optimization,
resulting in eventual instead of strong consistency
24. Survey System: End Result
•Each user is associated with about 20 documents
•Documents are distributed across multiple databases using sharding
•Master/Master replication is used to ensure extremely high availability
•There have been no database performance issues in the year and a half the app has been in
production
•Because there is no schema migration concern, deploying updates has been drastically
simplified
•Hardware cost is reasonable (but not cheap)
26. Indexes
•All document databases support some form of indexing to improve query performance
•Some document databases do not allow querying without an index
•In general, you shouldn’t query without an index anyways
31. CRM Requirements
•Track customers and basic information about them
•Track contacts and basic information about them
•Track sales deals and where they are in the pipeline
•Track orders generated from sales deals
•Track user tasks
32. Customers and Their Deals
•Customers and Deals are both entities, which is to say that they have distinct identity
•For this reason, Deals and Customer should be two separate collections
•There is no native support for cross-collection querying in most Document Databases
• The cross-collection querying support in RavenDB doesn’t perform well
33. Consideration: One document per
interaction
PROS
Improves performance
Encourages modeling aggregates well
CONS
Not actually achievable in most cases
34. Searching Deals by Customer Name
•The deal document must contain a denormalized customer object with the customer’s ID and
name
•We have a choice to make with this denormalization
• Allow the denormalization to just be wrong in the event the customer name is changed
• Maintain the denormalization when the customer name is changed
35. Denormalization Considerations
•Is stale data acceptable? This is the best option in all cases where it is possible.
•If stale data is unacceptable, how many documents are likely to need update when a change is
made? How often are changes going to be made?
•Using an event bus to move denormalization updates to a background process can be very
beneficial if failure of an update isn’t critical for the user to know
36. Consideration: Models Relationships
Poorly
PROS
None
CONS
Stale (out of date) data must be accepted in
the system
Large amounts of boilerplate code must be
written to maintain denormalizations
In certain circumstances a queuing/eventing
system is unavoidable
39. Consideration Recap
•Schema Free
•Non-Acid
•Models Aggregates Well
•Scales out well
•All queries must be indexed
•Eventual Consistency
•One document per interaction
•Models relationships poorly
•Requires administration
40. …nerds like us are allowed to be unironically
enthusiastic about stuff… Nerds are allowed to
love stuff, like jump-up-and-down-in-the-chair-
can’t-control-yourself love it.
-John Green