Building a complete social networking platform presents many challenges at scale. Socialite is a reference architecture and open source Java implementation of a scalable social feed service built on DropWizard and MongoDB. We'll provide an architectural overview of the platform, explaining how you can store an infinite timeline of data while optimizing indexing and sharding configuration for access to the most recent window of data. We'll also dive into the details of storing a social user graph in MongoDB.
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB.
This talk will introduce the philosophy and features of the open source, NoSQL MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app to store books. We’ll cover inserting, updating, and querying the database of books.
The document discusses schema design in MongoDB. It explains that MongoDB uses documents rather than tables and rows. Documents can be normalized, with links between separate documents, or denormalized by embedding related data. Embedding is recommended for fast queries but normalized schemas also work well. The document also covers indexing strategies, profiling database performance, and provides examples of schema designs for events, seating, and a feed reader application.
The document discusses schema design basics for MongoDB, including terms, considerations for schema design, and examples of modeling different types of data structures like trees, single table inheritance, and many-to-many relationships. It provides examples of creating indexes, evolving schemas, and performing queries and updates. Key topics covered include embedding data versus normalization, indexing, and techniques for modeling one-to-many and many-to-many relationships.
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
This document discusses different schema designs for common use cases in MongoDB. It presents four cases: (1) modeling a message inbox, (2) retaining historical data within limits, (3) storing variable attributes efficiently, and (4) looking up users by multiple identities. For each case, it analyzes different modeling approaches, considering factors like query performance, write performance, and whether indexes can be used. The goal is to help designers choose an optimal schema based on their application's access patterns and scale requirements.
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
This talk examines four real-world use cases for MongoDB document-based data modeling. We examine the implications of several possible solutions for each problem.
Building Your First MongoDB App ~ Metadata Cataloghungarianhc
These are the slides I used for a MongoDB webinar about creating your first application with MongoDB. They start with a general MongoDB overview, continuing onto how to model data for a metadata catalog. At this point in the presentation, I break to do a live demonstration. Afterwards, I touch on scaling your application with MongoDB.
This document discusses different design options for modeling messaging inboxes in MongoDB. It describes three main approaches: fan out on read, fan out on write, and fan out on write with bucketing. Fan out on read involves storing a single document per message with all recipients, requiring a scatter-gather query to read an inbox. Fan out on write stores one document per recipient but still involves random I/O to read an inbox. Bucketed fan out on write stores inbox messages in arrays within "inbox" documents for each user, allowing an entire inbox to be read with one or two documents. This provides the best read performance while also distributing writes across shards. The document concludes that bucketed fan out on write is typically the better
The document discusses schema design considerations for modeling data in MongoDB. It notes that while MongoDB is schemaless, applications are still responsible for schema design. It compares relational and MongoDB schema designs, highlighting that MongoDB uses embedded documents, has no joins, and requires duplicating or precomputing data. The document provides recommendations like combining related objects, optimizing for specific use cases, and doing aggregation work during writes rather than reads.
Webinar: Back to Basics: Thinking in DocumentsMongoDB
New applications, users and inputs demand new types of data, like unstructured, semi-structured and polymorphic data. Adopting MongoDB means adopting to a new, document-based data model.
While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't apply to MongoDB. Documents can represent rich data structures, providing lots of viable alternatives to the standard, normalized, relational model. In addition, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.
In this session, Buzz Moschetti explores how you can take advantage of MongoDB's document model to build modern applications.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
Schema design in MongoDB can be an art. Different trade offs should be considered when designing how to store your data. In this presentation we are going to cover some common scenarios, recommended practices and don'ts to avoid based on previous experiences
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
Will Shulman, from MongoLab, shows us how to to persist our mobile app data in the cloud using a super-scalable and amazingly developer-friendly MongoDB back-end.
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
This is the third webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will explain the architecture of document databases.
Dev Jumpstart: Schema Design Best PracticesMongoDB
New to MongoDB? We’ll discuss the tradeoff of various data modeling strategies in MongoDB. This talk will jumpstart your knowledge of how to work with documents, evolve your schema, and common schema design patterns. MongoDB’s basic unit of storage is a document. No prior knowledge of MongoDB is assumed.
MongoDB Basics - An introduction of mongo for beginners.
Covered basic of Indexing, Replicaset, Covered queries, Backup tools and Why we need mongo and some use cases.
This document provides an introduction and overview of MongoDB. It discusses how MongoDB is a document-oriented database that is open source, high performance, and horizontally scalable. It provides examples of using MongoDB with the mongo shell to create, query, update and index data. Key points covered include how MongoDB uses documents rather than tables, how data can be embedded or referenced between collections, and how to perform queries, sorting, pagination and more. Official drivers are available for connecting applications to MongoDB databases from many programming languages.
Mobile 2: What's My Place in the Universe? Using Geo-Indexing to Solve Existe...MongoDB
Second in the mobile series. Our mobile application takes shape as we build out a schema to support geo-specific data and friend finding. Learn how to use MongoDB geo-indexing for real-time proximity matching and gather analytics with the aggregation framework.
Building an Activity Feed with CassandraMark Dunphy
Mark Dunphy discusses the evolution of the Activity Feed system at Behance from MongoDB to Cassandra. The initial MongoDB implementation worked well for the smaller user base in 2011 but struggled with node failures, disk fragmentation, and downtime as the system scaled. Cassandra was chosen to replace it due to its scalability, performance, and ease of maintenance. Dunphy details the process of learning Cassandra, modeling the data, and developing a strategy for writes and reads that allowed the system to handle thousands of writes and reads per second while serving the Activity Feed to users.
Developers love MongoDB because its flexible document model enhances their productivity. But did you know that MongoDB supports rich queries and lets you accomplish some of the same things you currently do with SQL statements? And that MongoDB's powerful aggregation framework makes it possible to perform real-time analytics for dashboards and reports?
Attend this webinar for an introduction to the MongoDB aggregation framework and a walk through of what you can do with it. We'll also demo using it to analyze U.S. census data.
MongoGraph - MongoDB Meets the Semantic WebDATAVERSITY
AllegroGraph is a fully ACID and highly scalable RDF triplestore that can be programmed with compiled, server side JavaScript. This allows programmers to easily manipulate individual triples and create their own intelligent graph or reasoning algorithms. However, one wish that has been expressed by many programmers is to work on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject. So we created a MongoDB interface where programmers can add, delete and modify JSON objects directly into MongoDB like substores. This gives us the best of both worlds. We get the beautiful simplicity of the MongoDB interface for working with objects and we get the all the properties of an advanced triplestore, in this case joins through SPARQL queries, automatic indexing of all attributes/values, ACID properties all packaged to deliver a simple entry into the world of the Semantic Web.
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
The popularity of dedicated graph technologies has risen greatly in recent years, at least partly fuelled by the explosion in social media and similar systems, where a friend network or recommendation engine is often a critical component when delivering a successful application. MongoDB 3.4 introduces a new Aggregation Framework graph operator, $graphLookup, to enable some of these types of use cases to be built easily on top of MongoDB. We will see how semantic relationships can be modelled inside MongoDB today, how the new $graphLookup operator can help simplify this in 3.4, and how $graphLookup can be used to leverage these relationships and build a commercially focused news article recommendation system.
MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDBMongoDB
Presented by: John Barry, Senior Platform Developer, Apervita
Experience level: Deep dive
Graph databases are natural representation for many forms of data - from social network relationships, to music genres, to medical vocabulary data. Apervita’s challenge is importing and housing vast medical ontologies, and allowing efficient traversal of the graph retrieving both parent and child nodes across a variety of relationships.
In this session, we will discuss the characteristics of graph databases, an implementation strategy for MongoDB, and an approach for implementing transitive closure for efficient traversal of the graph with limited database trips.
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
This is the fourth webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will introduce you to the aggregation framework.
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
The document provides instructions for installing and using MongoDB to build a simple blogging application. It demonstrates how to install MongoDB, connect to it using the mongo shell, insert and query sample data like users and blog posts, update documents to add comments, and more. The goal is to illustrate how to model and interact with data in MongoDB for a basic blogging use case.
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
This is the first webinar of a Back to Basics series that will introduce you to the MongoDB database, what it is, why you would use it, and what you would use it for.
Using MongoDB as a high performance graph databaseChris Clarke
This document provides a summary of a presentation on using MongoDB as a high performance graph database. The presentation discusses:
1) The speaker's company Talis Education using MongoDB for 8 months to store and query graph data, and implementing a software wrapper on top of MongoDB to improve scalability and performance.
2) Common graph data models using nodes and edges or resources and properties. Basic graph concepts like undirected vs directed graphs are explained.
3) Storing graph data as RDF triples, with the standard subject-predicate-object structure. Benefits of RDF such as reusing common schemas and easy data merging are highlighted.
4) SPARQL, the query language for RDF,
This document outlines the topics covered in an Edureka course on MongoDB. The course contains 8 modules that cover MongoDB fundamentals, CRUD operations, schema design, administration, scaling, indexing and aggregation, application integration, and additional concepts and case studies. Each module contains multiple topics that will be taught through online instructor-led classes, recordings, quizzes, assignments, and support.
The document introduces MongoDB as a scalable, high-performance, open source, schema-free, document-oriented database. It discusses MongoDB's philosophy of flexibility and scalability over relational semantics. The main features covered are document storage, querying, indexing, replication, MapReduce and auto-sharding. Concepts like collections, documents and cursors are mapped to relational database terms. Examples uses include data warehousing and debugging.
MongoDB is an open-source, document-oriented database that provides high performance and horizontal scalability. It uses a document-model where data is organized in flexible, JSON-like documents rather than rigidly defined rows and tables. Documents can contain multiple types of nested objects and arrays. MongoDB is best suited for applications that need to store large amounts of unstructured or semi-structured data and benefit from horizontal scalability and high performance.
The document provides an overview of the activity feeds architecture. It discusses the fundamental entities of connections and activities. Connections express relationships between entities and are implemented as a directed graph. Activities form a log of actions by entities. To populate feeds, activities are copied and distributed to relevant entities and then aggregated. The aggregation process involves selecting connections, classifying activities, scoring them, pruning duplicates, and sorting the results into a merged newsfeed.
View all the MongoDB World 2016 Poster Sessions slides in one place!
Table of Contents:
1: BigData DB Infrastructure for Modeling the Fly Brain
2: Taming the WiredTiger Cache
3: Sharding with MongoDB 3.2 Kick the tires and pop the hood!
4: Scaling Proactive Anomaly Detection
5: MongoTx: Transactions with Sharding and Queries
6: MongoDB: It’s Not Too Late To Shard
7: DLIFLC usage of MongoDB
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
The document provides information about an upcoming webinar on the MongoDB aggregation framework. Key details include:
- The webinar will introduce the aggregation framework and provide an overview of its capabilities for analytics.
- Examples will use a real-world vehicle testing dataset to demonstrate aggregation pipeline stages like $match, $project, and $group.
- Attendees will learn how the aggregation framework provides a simpler way to perform analytics compared to other tools like Spark and Hadoop.
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
Pairing your real-time operational data stored in a modern database like MongoDB with first-class business intelligence platforms like Tableau enables new insights to be discovered faster than ever before.
Many leading organizations already use MongoDB in conjunction with Tableau including a top American investment bank and the world’s largest airline. With the Connector for BI 2.0, it’s never been easier to streamline the connection process between these two systems.
In this webinar, we will create a live connection from Tableau Desktop to a MongoDB cluster using the Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB.
You’ll walk away knowing:
- How to configure MongoDB with Tableau using the updated connector
- Best practices for working with documents in a BI environment
- How leading companies are using big data visualization strategies to transform their businesses
The document discusses MongoDB's Aggregation Framework, which allows users to perform ad-hoc queries and reshape data in MongoDB. It describes the key components of the aggregation pipeline including $match, $project, $group, $sort operators. It provides examples of how to filter, reshape, and summarize document data using the aggregation framework. The document also covers usage and limitations of aggregation as well as how it can be used to enable more flexible data analysis and reporting compared to MapReduce.
The document introduces a workshop on big data tools and MongoDB, discusses how MediaGlu uses big data for advertising by tracking user paths across different channels, and outlines an agenda covering MongoDB fundamentals, running MongoDB, and labs on shell commands, aggregation, replication, and sharding.
Extensible RESTful Applications with Apache TinkerPopVarun Ganesh
This document discusses building a graph database and domain-specific language (DSL) for analyzing Slack data. It defines entities like messages, users, and channels as graph nodes and their relationships as edges. A REST API is created to ingest and query the graph using TinkerPop and remote traversals. Custom traversal sources and classes define shorthand traversals and business logic to build the DSL, adding structure and meaning to queries over the Slack data graph.
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasMongoDB
In this talk, Appboy co-founder and CIO Jon Hyman will discuss various schemas that Appboy has evolved to use on MongoDB, remaining agile as Appboy has grown to massive scale. Jon will discuss topics such as random sampling of documents, multivariate testing and multi-arm bandit optimization of such tests, field tokenization, and how Appboy stores multi-dimensional data on an individual user basis to be able to quickly optimize for the best time to deliver messages to end users. Appboy is the global leader in Marketing Automation for Apps, helping clients such as Urban Outfitters, Shutterfly, Kixeye, PicsArt, USA Today Sports, and iHeartRadio increase engagement through automated messaging. Each month, Appboy collects tens of billions of data points from hundreds of millions of monthly active users.
This document summarizes an agenda for a Pinterest Engineering meeting. It includes discussions on mobile growth and monetization, deploying and shipping code. Specific topics that will be covered include scaling user education on mobile, growth strategies like user education, monetization through data, and how Pinterest deploys and ships code. Speakers will discuss mobile features, how user growth is driven through education, monetizing user data, and ensuring smooth code deployment.
This document summarizes a presentation about MongoDB given by Sean Laurent of StudyBlue, Inc. It introduces MongoDB and its features, including its document-oriented data model, flexible schemas, querying and updating capabilities, indexing, drivers, replication and sharding for scalability. Useful MongoDB tools are also discussed like the mongo shell, mongostat, mongotop and MMS monitoring. The document concludes with MongoDB's strengths like horizontal scaling and rapidly evolving schemas, and weaknesses like not being a drop-in replacement for SQL and limitations around transactions.
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
Presented by Austin Zellner, Solutions Architect, MongoDB
Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.
Christian Kvalheim gave an introduction to NoSQL and MongoDB. Some key points:
1) MongoDB is a scalable, high-performance, open source NoSQL database that uses a document-oriented model.
2) It supports indexing, replication, auto-sharding for horizontal scaling, and querying.
3) Documents are stored in JSON-like records which can contain various data types including nested objects and arrays.
S2Graph is a large-scale graph database built on Hbase that provides storage and graph APIs to enable real-time breadth-first search on large, constantly changing social graphs. It models social network data, such as users, messages, posts, and relationships between them as vertices and edges in a graph. This allows querying the graph through steps that traverse edges between vertices to retrieve related data in real-time with low latency. Some examples demonstrated include messaging, newsfeed, recommendation, and search applications.
Building Highly Flexible, High Performance Query EnginesMapR Technologies
The document discusses Apache Drill, an open source SQL query engine for analysis of data in Hadoop. It provides an overview of Drill, highlighting its ability to handle flexible schemas, analyze semi-structured and nested data from NoSQL sources, and integrate with existing business intelligence tools through a familiar SQL interface. The document also notes that traditional SQL approaches do not always work well for new big data applications and data models, and that Drill aims to address these challenges.
Traackr evaluated several NoSQL database options to store its heterogeneous, unstructured web data. Document databases were the best fit due to their flexibility to store variable length text like tweets and blog posts without predefined schemas. MongoDB was selected due to its maturity, adoption, and support for ad-hoc queries and batch processing needed by Traackr in early 2010.
Buzz Moschetti presents on using MongoDB and Hadoop together for success with big data projects. He outlines a real-time directed content system that uses MongoDB for operational data and recommendations, Hadoop for batch analytics, and integrates the two with real-time updates. The system dynamically updates user profiles and recommendations based on user clicks and periodic re-analysis of all data in Hadoop. It provides both real-time and long-term analytics capabilities through this integrated architecture.
Webinar: Building Your First Application with MongoDBMongoDB
This document discusses building a location-based check-in app using MongoDB. It begins with an introduction to MongoDB's key features like being document-oriented, open source, high performance, and horizontally scalable. It then demonstrates modeling the app's entities of users, places, and check-ins as MongoDB collections. It shows examples of queries, indexes, and updates used in the app. The document concludes by discussing more advanced features like aggregation and deployment options for MongoDB.
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
• The challenges involved with managing time-series data in IoT applications
• Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
• How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
• At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian
This document discusses using Elasticsearch for social media analytics and provides examples of common tasks. It introduces Elasticsearch basics like installation, indexing documents, and searching. It also covers more advanced topics like mapping types, facets for aggregations, analyzers, nested and parent/child relations between documents. The document concludes with recommendations on data design, suggesting indexing strategies for different use cases like per user, single index, or partitioning by time range.
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBaseHBaseCon
As the operator of the dominant messenger application in South Korea, KakaoTalk has more than 170 million users, and our ever-growing graph has more than 10B edges and 200M vertices. This scale presents several technical challenges for storing and querying the graph data, but we have resolved them by creating a new distributed graph database with HBase. Here you'll learn the methodology and architecture we used to solve the problems, compare it another famous graph database, Titan, and explore the HBase issues we encountered.
This document discusses MongoDB and the needs of Rivera Group, an IT services company. It notes that Rivera Group has been using MongoDB since 2012 to store large, multi-dimensional datasets with heavy read/write and audit requirements. The document outlines some of the challenges Rivera Group faces around indexing, aggregation, and flexibility in querying datasets.
Similar to Socialite, the Open Source Status Feed (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
It's your unstructured data: How to get your GenAI app to production (and spe...Zilliz
So you've successfully built a GenAI app POC for your company -- now comes the hard part: bringing it to production. Aparavi addresses the challenges of AI projects while addressing data privacy and PII. Our Service for RAG helps AI developers and data scientists to scale their app to 1000s to millions of users using corporate unstructured data. Aparavi’s AI Data Loader cleans, prepares and then loads only the relevant unstructured data for each AI project/app, enabling you to operationalize the creation of GenAI apps easily and accurately while giving you the time to focus on what you really want to do - building a great AI application with useful and relevant context. All within your environment and never having to share private corporate data with anyone - not even Aparavi.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
Choosing the Best Outlook OST to PST Converter: Key Features and Considerationswebbyacad software
When looking for a good software utility to convert Outlook OST files to PST format, it is important to find one that is easy to use and has useful features. WebbyAcad OST to PST Converter Tool is a great choice because it is simple to use for anyone, whether you are tech-savvy or not. It can smoothly change your files to PST while keeping all your data safe and secure. Plus, it can handle large amounts of data and convert multiple files at once, which can save you a lot of time. It even comes with 24*7 technical support assistance and a free trial, so you can try it out before making a decision. Whether you need to recover, move, or back up your data, Webbyacad OST to PST Converter is a reliable option that gives you all the support you need to manage your Outlook data effectively.
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Snarky Security
How wonderful it is that in our modern age, every bit of our biological data can be digitized, stored, and potentially pilfered by cyber thieves! Isn't it just splendid to think that while scientists are busy pushing the boundaries of biotechnology, hackers could be plotting the next big bio-data heist? This delightful scenario is brought to you by the ever-expanding digital landscape of biology and biotechnology, where the integration of computer science, engineering, and data science transforms our understanding and manipulation of biological systems.
While the fusion of technology and biology offers immense benefits, it also necessitates a careful consideration of the ethical, security, and associated social implications. But let's be honest, in the grand scheme of things, what's a little risk compared to potential scientific achievements? After all, progress in biotechnology waits for no one, and we're just along for the ride in this thrilling, slightly terrifying, adventure.
So, as we continue to navigate this complex landscape, let's not forget the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. After all, what could possibly go wrong?
-------------------------
This document provides a comprehensive analysis of the security implications biological data use. The analysis explores various aspects of biological data security, including the vulnerabilities associated with data access, the potential for misuse by state and non-state actors, and the implications for national and transnational security. Key aspects considered include the impact of technological advancements on data security, the role of international policies in data governance, and the strategies for mitigating risks associated with unauthorized data access.
This view offers valuable insights for security professionals, policymakers, and industry leaders across various sectors, highlighting the importance of robust data protection measures and collaborative international efforts to safeguard sensitive biological information. The analysis serves as a crucial resource for understanding the complex dynamics at the intersection of biotechnology and security, providing actionable recommendations to enhance biosecurity in an digital and interconnected world.
The evolving landscape of biology and biotechnology, significantly influenced by advancements in computer science, engineering, and data science, is reshaping our understanding and manipulation of biological systems. The integration of these disciplines has led to the development of fields such as computational biology and synthetic biology, which utilize computational power and engineering principles to solve complex biological problems and innovate new biotechnological applications. This interdisciplinary approach has not only accelerated research and development but also introduced new capabilities such as gene editing and biomanufact
Finetuning GenAI For Hacking and DefendingPriyanka Aash
Generative AI, particularly through the lens of large language models (LLMs), represents a transformative leap in artificial intelligence. With advancements that have fundamentally altered our approach to AI, understanding and leveraging these technologies is crucial for innovators and practitioners alike. This comprehensive exploration delves into the intricacies of GenAI, from its foundational principles and historical evolution to its practical applications in security and beyond.
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceQuentin Reul
The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to:
- Identify high-impact customer needs with precision
- Harness the power of large language models to address specific customer needs effectively
- Implement AI responsibly to build trust and foster strong customer relationships
Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.
2. Solutions Engineering
• Identify Popular Use Cases
– Directly from MongoDB Users
– Addressing "limitations"
• Go beyond documentation and blogs
• Create open source project
• Run it!
8. Pluggable Services
• Major components each have an interface
– see com.mongodb.socialite.services
• Configuration selects implementation to use
• ServiceManager organizes :
– Default implementations
– Lifecycle
– Binding configuration
– Wiring dependencies
– see com.mongodb.socialite.ServiceManager
9. Simple Interface
https://github.com/10gen-labs/socialite
GET /users/{user_id} Get a User by their ID
DELETE /users/{user_id} Remove a user by their ID
POST /users/{user_id}/posts Send a message from this user
GET /users/{user_id}/followers Get a list of followers of a user
GET /users/{user_id}/followers_count Get the number of followers of a user
GET /users/{user_id}/following Get the list of users this user is following
GET /users/{user_id}/following count Get the number of users this user follows
GET /users/{user_id}/posts Get the messages sent by a user
GET /users/{user_id}/timeline Get the timeline for this user
PUT /users/{user_id} Create a new user
PUT /users/{user_id}/following/{target} Follow a user
DELETE /users/{user_id}/following/{target} Unfollow a user
11. Operational Testing
Real life validation of our choices.
Most important criteria?
User facing latency
Linear scaling of resources
12. Scaling Goals
• Realistic real-life-scale workload
– compared to Twitter, etc.
• Understanding of HW required
– containing costs
• Confirm architecture scales linearly
– without loss of responsiveness
14. Operational Testing
• All hosts in AWS
• Each service used its own DB, cluster or shards
• All benchmarks through `mongos` (sharded config)
• Used MMS monitoring for measuring throughput
• Used internal benchmarks for measuring latency
• Based volume tested on real life social metrics
17. Socialite Content Service
• System of record for all user content
• Initially very simple (no search)
• Mainly designed to support feed
– Lookup/indexed by _id and userid
– Time based anchors/pagination
18. Social Data Ages Fast
• Half life of most content is 1 day !
• Popular content usually < 1 month
• Access to old data is rare
19. Content Service
• Index by userId, _id
• Shard by userId (or userId, _id)
• Supports “user data” as pass-through
{
"_id" : ObjectId("52aaaa14a0ee0d44323e623a"),
"_a" : "user1",
"_m" : "this is a post”,
"_d" : {
"geohash" : "6gkzwgjzn820"
}
}
35. Operational comparison
• Updates of embedded arrays
– grow non-linearly with number of indexed array elements
• Updating edge collection => inserts
– grows close to linearly with existing number of edges/user
38. Finding Followers
Consider our single follower collection :
> db.followers.find({from : "djw"}, {_id:0, to:1})
{
"to" : "jsr"
}
Using index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
Covered index
when searching on
"from" for all
followers
Specify only if
multiple edges
cannot exist
39. Finding Following
What about who a user is following?
Can use a reverse covered index :
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
Notice the flipped
field order here
40. Finding Following
Wait ! There is an issue with the reverse index…..
SHARDING !
{
"v" : 1,
"key" : { "from" : 1, "to" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "from_1_to_1"
}
{
"v" : 1,
"key" : { "to" : 1, "from" : 1 },
"unique" : true,
"ns" : "socialite.followers",
"name" : "to_1_from_1"
}
If we shard this collection
by "from", looking up
followers for a specific
user is "targeted" to a
shard
To find who the user is
following however, it must
scatter-gather the query to
all shards
42. Dual Edge Collections
When "following" queries are common
– Not always the case
– Consider overhead carefully
Can use dual collections storing
– One for each direction
– Edges are duplicated reversed
– Can be sharded independently
43. Edge Query Rate Comparison
Number of shards
vs
Number of queries
Followers collection
with forward and
reverse indexes
Two collections,
followers, following
one index each
1 10,000 10,000
3 90,000 30,000
6 360,000 60,000
12 1,440,000 120,000
45. Feed Service
• Two main functions :
– Aggregating “followed” content for a user
– Forwarding user’s content to “followers”
• Common implementation models :
– Fanout on read
• Query content of all followed users on fly
– Fanout on write
• Add to “cache” of each user’s timeline for every post
• Various storage models for the timeline
47. Fanout On Read
Pros
Simple implementation
No extra storage for timelines
Cons
– Timeline reads (typically) hit all shards
– Often involves reading more data than required
– May require additional indexing on Content
49. Fanout On Write
Pros
Timeline can be single document read
Dormant users easily excluded
Working set minimized
Cons
– Fanout for large follower lists can be expensive
– Additional storage for materialized timelines
50. Fanout On Write
• Three different approaches
– Time buckets
– Size buckets
– Cache
• Each has different pros & cons
51. Timeline Buckets - Time
Upsert to time range buckets for each user
> db.timed_buckets.find().pretty()
{
"_id" : {"_u" : "jsr", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}
]
}
{
"_id" : {"_u" : "ian", "_t" : 516935},
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
]
}
{
"_id" : {"_u" : "jsr", "_t" : 516934 },
"_c" : [
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
]
}
53. Timeline - Cache
Store a limited cache, fall back to "fanout on read"
– Create single cache doc on demand with upsert
– Limit size of cache with $slice
– Timeout docs with TTL for inactive users
> db.timeline_cache.find().pretty()
{
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},
{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],
"_u" : "jsr"
}
{
"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
],
"_u" : "ian"
}
54. Embedding vs Linking Content
Embedded content for direct access
– Great when it is small, predictable in size
Link to content, store only metadata
– Read only desired content on demand
– Further stabilizes cache document sizes
> db.timeline_cache.findOne({”_id" : "jsr"})
{
"_c" : [
{"_id" : ObjectId("...dc1”)},
{"_id" : ObjectId("...dd2”)},
{"_id" : ObjectId("...da7”)}
],
”_id" : "jsr"
}
55. Socialite Feed Service
• Implemented four models as plugins
– FanoutOnRead
– FanoutOnWrite – Buckets (size)
– FanoutOnWrite – Buckets (time)
– FanoutOnWrite - Cache
• Switchable by config
• Store content by reference or value
• Benchmark-able back to back
60. Benchmarking the Feed
• Results
– over two weeks
– ran load with one million users
– ran load with ten million users
– used avg send rate 1K/s; 2K/s; reads 10K-20k/s
– 22 AWS c3.2xlarge servers (7.5GB RAM)
– 18 across six shards (3 content, 3 user graph)
– 4 mongos and app machines
– 2 c2x4xlarge servers (30GB RAM)
– timeline feed cache (six shards)
62. Socialite
https://github.com/10gen-labs/socialite
• Real Working Implementation
– Implements All Components
– Configurable models and options
• Built-in benchmarking
• Questions?
– I will be at "Ask The Experts" this afternoon!
https://github.com/10gen-labs/socialite
News/Social Status Feed: popular and common
Internal goals: implement different schema options, builtin benchmarking for comparison
External goals: low latency from end-user perspective, linear scaling from operational perspective
News/Social Status Feed: popular and common
Internal goals: implement different schema options, builtin benchmarking for comparison
External goals: low latency from end-user perspective, linear scaling from operational perspective
image at https://dropwizard.github.io/dropwizard of the hat
add REST API calls
How to test, show how growing documents are very painful to update.
Add the MTV or appmetrics mtools plot showing what happens to outliers.
actual performance – show how inserting million users was easy – no point even trying to update embedded documents...
side-point of
Variants?
Should you embed the messages/content into "cache"/buckets/etc. or just store references?
WHICH ONE DID WE IMPLEMENT IN SOCIALITE???
All work with Async Service(? or mention later)
And we did benchmark them! -> Asya
examining latency of reading content by fanout type - note two types of latency – for sender and for recipient.
scaling throughput... THIS WILL NOT SCALE LINEARLY(!)
*RERUN WITH SEVERAL SHARDS* replace with new screenshot
MongoDB as a cache
Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ???
Cache only for active users.
Number of updates across all cache / number of documents updated
MongoDB as a cache
Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ???
Cache only for active users.
MongoDB as a cache
Storage amplification on a feed service – Justin Bieber makes a single post and we need to write it to 2 million timelines.... ???
Cache only for active users.