This document provides an overview of the Python programming language. It discusses Python's history and evolution, its key features like being object-oriented, open source, portable, having dynamic typing and built-in types/tools. It also covers Python's use for numeric processing with libraries like NumPy and SciPy. The document explains how to use Python interactively from the command line and as scripts. It describes Python's basic data types like integers, floats, strings, lists, tuples and dictionaries as well as common operations on these types.
This document provides an introduction to NumPy, the fundamental package for scientific computing with Python. It discusses what NumPy is, why it is useful compared to regular Python lists, how to define arrays of different dimensions, and how to initialize, manipulate, and perform operations on NumPy arrays. Some key capabilities of NumPy include N-dimensional arrays, broadcasting functions, integration with C/C++ and Fortran code, and tools for linear algebra and Fourier transforms.
This document discusses data visualization tools in Python. It introduces Matplotlib as the first and still standard Python visualization tool. It also covers Seaborn which builds on Matplotlib, Bokeh for interactive visualizations, HoloViews as a higher-level wrapper for Bokeh, and Datashader for big data visualization. Additional tools discussed include Folium for maps, and yt for volumetric data visualization. The document concludes that Python is well-suited for data science and visualization with many options available.
NumPy is a Python package that provides multidimensional array and matrix objects as well as tools to work with these objects. It was created to handle large, multi-dimensional arrays and matrices efficiently. NumPy arrays enable fast operations on large datasets and facilitate scientific computing using Python. NumPy also contains functions for Fourier transforms, random number generation and linear algebra operations.
This document discusses functions and methods in Python. It defines functions and methods, and explains the differences between them. It provides examples of defining and calling functions, returning values from functions, and passing arguments to functions. It also covers topics like local and global variables, function decorators, generators, modules, and lambda functions.
This presentation is a great resource for zero-based Python programmers who wants to learn Python 3. This course includes brief history of Python and familiarity of its basic syntax.
Introduction to Python Pandas for Data AnalyticsPhoenix
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, medical...
Python Pandas is a powerful library for data analysis and manipulation. It provides rich data structures and methods for loading, cleaning, transforming, and modeling data. Pandas allows users to easily work with labeled data and columns in tabular structures called Series and DataFrames. These structures enable fast and flexible operations like slicing, selecting subsets of data, and performing calculations. Descriptive statistics functions in Pandas allow analyzing and summarizing data in DataFrames.
Machine learning algorithms can adapt and learn from experience. The three main machine learning methods are supervised learning (using labeled training data), unsupervised learning (using unlabeled data), and semi-supervised learning (using some labeled and some unlabeled data). Supervised learning includes classification and regression tasks, while unsupervised learning includes cluster analysis.
The document provides an agenda for a Pandas workshop covering data wrangling, visualization, and statistical modeling using Pandas. The agenda includes introductions to Pandas fundamentals like Series and DataFrames, data importing and exploration, missing data handling, reshaping data through pivoting and stacking, merging datasets, and grouping and computation. Later sections cover plotting and visualization, as well as statistical modeling techniques like linear models, time series analysis and Bayesian models. The workshop aims to simplify learning and teach how to use Pandas for data preparation, analysis and modeling.
Abstract: This PDSG workshop introduces the basics of Python libraries used in machine learning. Libraries covered are Numpy, Pandas and MathlibPlot.
Level: Fundamental
Requirements: One should have some knowledge of programming and some statistics.
The document provides an introduction to Python programming. It discusses key concepts like variables, data types, operators, and sequential data types. Python is presented as an interpreted programming language that uses indentation to indicate blocks of code. Comments and documentation are included to explain the code. Various data types are covered, including numbers, strings, booleans, and lists. Operators for arithmetic, comparison, assignment and more are also summarized.
This document contains a presentation by Abhijeet Anand on NumPy. It introduces NumPy as a Python library for working with arrays, which aims to provide array objects that are faster than traditional Python lists. NumPy arrays benefit from being stored continuously in memory, unlike lists. The presentation covers 1D, 2D and 3D arrays in NumPy and basic array properties and operations like shape, size, dtype, copying, sorting, addition, subtraction and more.
This document provides an introduction and overview of NumPy, a Python library used for numerical computing. It discusses NumPy's origins and capabilities, how to install NumPy on Linux, key NumPy concepts like the ndarray object, and how NumPy can be used with Matplotlib for plotting. Examples are given of common NumPy operations and functions for arrays, as well as plotting simple graphs with Matplotlib.
Pandas is an open source Python library that provides data structures and data analysis tools for working with tabular data. It allows users to easily perform operations on different types of data such as tabular, time series, and matrix data. Pandas provides data structures like Series for 1D data and DataFrame for 2D data. It has tools for data cleaning, transformation, manipulation, and visualization of data.
Provides an introductory level understanding of the Python Programming Language and language features. Serves as a guide for beginners and a reference to Python basics and language use cases.
This presentation introduces naive Bayesian classification. It begins with an overview of Bayes' theorem and defines a naive Bayes classifier as one that assumes conditional independence between predictor variables given the class. The document provides examples of text classification using naive Bayes and discusses its advantages of simplicity and accuracy, as well as its limitation of assuming independence. It concludes that naive Bayes is a commonly used and effective classification technique.
This document discusses machine learning with Python. It provides an overview of Python, highlighting that it is easy to learn, has a vast community and documentation, and is versatile. It then defines machine learning and discusses popular Python libraries for machine learning like NumPy, SciPy, Matplotlib, Pandas, and OpenCV. It provides examples of operations that can be performed with OpenCV, like reading and manipulating images. Overall the document serves as an introduction to machine learning with Python and the main libraries used.
This document provides an overview of effective big data visualization. It discusses information visualization and data visualization, including common chart types like histograms, scatter plots, and dashboards. It covers visualization goals, considerations, processes, basics, and guidelines. Examples of good visualization are provided. Tools for creating infographics are listed, as are resources for learning more about data visualization and references. Overall, the document serves as a comprehensive introduction to big data visualization.
The document introduces Scipy, Numpy and related tools for scientific computing in Python. It provides links to documentation and tutorials for Scipy and Numpy for numerical operations, Matplotlib for data visualization, and IPython for an interactive coding environment. It also includes short examples and explanations of Numpy arrays, plotting, data analysis workflows, and accessing help documentation.
This document provides a summary of the history and capabilities of SciPy. It discusses how SciPy was founded in 2001 by Travis Oliphant with packages for optimization, sparse matrices, interpolation, integration, special functions, and more. It highlights key contributors to the SciPy community and ecosystem. It describes why Python is well-suited for technical computing due to its syntax, built-in array support, and ability to support multiple programming styles. It outlines NumPy's array-oriented approach and benefits for technical problems. Finally, it discusses new projects like Blaze and Numba that aim to further improve the SciPy software stack.
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
This document provides an overview of NoSQL databases and summarizes key information about several NoSQL databases, including HBase, Redis, Cassandra, MongoDB, and Memcached. It discusses concepts like horizontal scalability, the CAP theorem, eventual consistency, and data models used by different NoSQL databases like key-value, document, columnar, and graph structures.
Making your code faster cython and parallel processing in the jupyter notebookPyData
This document discusses using Cython and parallel processing in Jupyter notebooks to make code faster. It describes using Euler's method to approximate the function y=x^2 for a million points and determine the minimum step size such that the result is within 1e-5 of the correct answer. The author aims to show how to optimize this problem using Cython and parallel processing without financial interests or being a computer scientist.
This document provides an overview of the Natural Language Toolkit (NLTK), a Python library for natural language processing. It discusses NLTK's modules for common NLP tasks like tokenization, part-of-speech tagging, parsing, and classification. It also describes how NLTK can be used to analyze text corpora, frequency distributions, collocations and concordances. Key functions of NLTK include tokenizing text, accessing annotated corpora, analyzing word frequencies, part-of-speech tagging, and shallow parsing.
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
This document provides an overview of effective numerical computation in NumPy and SciPy. It discusses how Python can be used for numerical computation tasks like differential equations, simulations, and machine learning. While Python is initially slower than languages like C, libraries like NumPy and SciPy allow Python code to achieve sufficient speed through techniques like broadcasting, indexing, and using sparse matrix representations. The document provides examples of how to efficiently perform tasks like applying functions element-wise to sparse matrices and calculating norms. It also presents a case study for efficiently computing a formula that appears in a machine learning paper using different sparse matrix representations in SciPy.
Data Analytics with Pandas and Numpy - PythonChetan Khatri
This document discusses opportunities in data analytics. It notes that industries like finance, marketing, telecommunications, education, research, and healthcare are pursuing opportunities in data analytics. Indian IT outsourcing firms see opportunities in U.S. healthcare reform. The document outlines the data analytics life cycle and what metrics a CEO may want to understand about user engagement and retention for a mobile game. It proposes hands-on examples using data about weed use, the Titanic sinking, and an online community.
This document provides an agenda and overview for a Python training course. The agenda covers key Python topics like dictionaries, conditional statements, loops, functions, modules, input/output, error handling, object-oriented programming and more. The introduction section explains that Python is an interpreted, interactive and object-oriented language well-suited for beginners. It also outlines features like rapid development, automatic memory management and support for procedural and object-oriented programming. The document concludes by explaining Python's core data types including numbers, strings, lists, tuples and dictionaries.
The document provides instructions for running Python code in both interactive and script modes. It explains that in interactive mode, code is executed immediately after being typed, while scripts run entire files of code. Steps are given to start an interactive session in the terminal or IDLE and run script files with Python filename.py. Code examples are also provided to demonstrate basic Python operations in the interactive interpreter like arithmetic, variables, functions, strings and control flow.
- Python is an interpreted, object-oriented programming language that is beginner friendly and open source. It was created in the 1990s and named after Monty Python.
- Python is very suitable for natural language processing tasks due to its built-in string and list datatypes as well as libraries like NLTK. It also has strong numeric processing capabilities useful for machine learning.
- Python code is organized using functions, classes, modules, and packages to improve structure. It is interpreted at runtime rather than requiring a separate compilation step.
Introduction to Python 01-08-2023.pon by everyone else. . Hence, they must be...DRVaibhavmeshram1
Python
Language
is uesd in engineeringStory adapted from Stephen Covey (2004) “The Seven Habits of Highly Effective People” Simon & Schuster).
“Management is doing things right, leadership is doing the right things”
(Warren Bennis and Peter Drucker)
Story adapted from Stephen Covey (2004) “The Seven Habits of Highly Effective People” Simon & Schuster).
“Management is doing things right, leadership is doing the right things”
(Warren Bennis and Peter Drucker)
Story adapted from Stephen Covey (2004) “The Seven Habits of Highly Effective People” Simon & Schuster).
“Management is doing things right, leadership is doing the right things”
(Warren Bennis and Peter Drucker)
The Sponsor:
Champion and advocates for the change at their level in the organization.
A Sponsor is the person who won’t let the change initiative die from lack of attention, and is willing to use their political capital to make the change happen
The Role model:
Behaviors and attitudes demonstrated by them are looked upon by everyone else. . Hence, they must be willing to go first.
Employees watch leaders for consistency between words and actions to see if they should believe the change is really going to happen.
The decision maker:
Leaders usually control resources such as people, budgets, and equipment, and thus have the authority to make decisions (as per their span of control) that affect the initiative.
During change, leaders must leverage their decision-making authority and choose the options that will support the initiative.
The Decision-Maker is decisive and sets priorities that support change.
The Sponsor:
Champion and advocates for the change at their level in the organization.
A Sponsor is the person who won’t let the change initiative die from lack of attention, and is willing to use their political capital to make the change happen
The Role model:
Behaviors and attitudes demonstrated by them are looked upon by everyone else. . Hence, they must be willing to go first.
Employees watch leaders for consistency between words and actions to see if they should believe the change is really going to happen.
The decision maker:
Leaders usually control resources such as people, budgets, and equipment, and thus have the authority to make decisions (as per their span of control) that affect the initiative.
During change, leaders must leverage their decision-making authority and choose the options that will support the initiative.
The Decision-Maker is decisive and sets priorities that support change.
The Sponsor:
Champion and advocates for the change at their level in the organization.
A Sponsor is the person who won’t let the change initiative die from lack of attention, and is willing to use their political capital to make the change happen
The Role model:
Behaviors and attitudes demonstrated by them are looked upon by everyone else. . Hence, they must be willing to go first.
Employees watch leaders for consistency between words and actions to see if they s
Python is a general purpose programming language that can be used for both programming and scripting. It is an interpreted language, meaning code is executed line by line by the Python interpreter. Python code is written in plain text files with a .py extension. Key features of Python include being object-oriented, using indentation for code blocks rather than brackets, and having a large standard library. Python code can be used for tasks like system scripting, web development, data analysis, and more.
Python can be used as both an interpreted, interactive language and a scripting language. It supports common data types like integers, floats, strings, lists, and tuples. Tuples are immutable ordered sequences while lists are mutable. Strings support common sequence operations. Python code is indented with whitespace instead of braces. Variables are dynamically typed and assigned with '=' and objects can be sliced and tested for membership with operators like '+' and 'in'.
This presentation provides the information on python including the topics Python features, applications, variables and operators in python, control statements, numbers, strings, print formatting, list and list comprehension, dictionaries, tuples, files, sets, boolean, mehtods and functions, lambda expressions and a sample project using Python.
This document provides an overview of the Python programming language tutorial presented over multiple pages. It covers:
1) An introduction to Python, its features, and why it is useful including that it is easy to use, portable, object oriented, and has many standard libraries.
2) An explanation of the different parts of the tutorial covering basic concepts like variables, data types, control structures, functions and exceptions as well as data structures and files.
3) Hands-on examples of using Python's basic types like numbers, strings, lists, tuples and dictionaries along with operations on each and how to use the interactive shell and IDE interfaces.
This document provides an introduction to the Python language and discusses Python data types. It covers how to install Python, interact with the Python interpreter through command line and IDLE modes, and learn basic Python parts like data types, operators, functions, and control structures. The document discusses numeric, string, and other data types in Python and how to manipulate them using built-in functions and operators. It also introduces Python library modules and the arcpy package for geoprocessing in ArcGIS.
This document provides an introduction to the Python programming language. It discusses installing Python and interacting with it through command line and IDLE modes. It covers basic Python data types like numbers, strings, lists, and booleans. It demonstrates how to perform operations and call functions on these data types. It also discusses Python modules, getting input from users, and assigning values to variables.
This document provides an introduction to the Python programming language. It discusses installing Python and interacting with it through command line and IDLE modes. It covers basic Python data types like numbers, strings, lists, and booleans. It demonstrates how to perform operations and call functions on these data types. It also discusses Python modules, getting input from users, and commonly used string and list methods.
This document provides an introduction to the Python programming language. It discusses installing Python and interacting with it through command line and IDLE modes. It covers basic Python data types like numbers, strings, lists, and booleans. It demonstrates how to perform operations and call functions on these data types. It also discusses Python modules, getting input from users, and commonly used string and list methods.
This document provides an introduction to the Python programming language. It discusses installing Python and interacting with it through command line and IDLE modes. It covers basic Python data types like numbers, strings, lists, and booleans. It demonstrates how to perform operations and call functions on these data types. It also discusses Python modules, getting input from users, and assigning values to variables.
This document provides an introduction to the Python language and discusses Python data types. It covers how to install Python, interact with the Python interpreter through command line and IDLE modes, and learn basic Python parts like data types, operators, functions, and control structures. The document discusses numeric, string, and other data types in Python and how to manipulate them using built-in functions and operators. It also introduces Python library modules and the arcpy package for geoprocessing in ArcGIS.
This document provides an introduction to the Python programming language. It discusses installing Python and interacting with it through command line and IDLE modes. It covers basic Python data types like numbers, strings, lists, and booleans. It demonstrates how to perform operations and call functions on these data types. It also discusses Python modules, getting input from users, and assigning values to variables.
The document provides an overview of the basics of the Python programming language. It discusses that Python is an interpreted, interactive, object-oriented scripting language. It also covers Python's history and describes it as being easy to learn and read, easy to maintain, portable, and extensible. The document then details Python's core data types including numbers, strings, lists, tuples, and dictionaries. It provides examples of how to define and manipulate variables of each data type in Python.
This document provides an overview of the Python programming language. It discusses Python's history, how to install and run Python, basic data types like integers, floats, strings, lists and tuples. It also covers topics like functions, modules, files, and classes in Python.
Programming python quick intro for schoolsDan Bowen
This document provides an introduction to computing and programming concepts such as what a computer program is, binary and machine code, assembly code, interpreters and compilers, structured programs using sequences, branches, loops and modules. It discusses programming concepts like variables, strings, arithmetic operations, conditional statements, loops, functions, modules and file input/output. The key points are that a computer program is a set of instructions, programming involves different levels of representation from binary to assembly to high-level languages, and programming uses basic constructs like sequences, branches, loops to structure programs.
Spark Streaming allows processing live data streams using small batch sizes to provide low latency results. It provides a simple API to implement complex stream processing algorithms across hundreds of nodes. Spark SQL allows querying structured data using SQL or the Hive query language and integrates with Spark's batch and interactive processing. MLlib provides machine learning algorithms and pipelines to easily apply ML to large datasets. GraphX extends Spark with an API for graph-parallel computation on property graphs.
Spark is an open-source cluster computing framework that uses in-memory processing to allow data sharing across jobs for faster iterative queries and interactive analytics, it uses Resilient Distributed Datasets (RDDs) that can survive failures through lineage tracking and supports programming in Scala, Java, and Python for batch, streaming, and machine learning workloads.
Graph databases store data in graph structures with nodes, edges, and properties. Neo4j is a popular open-source graph database that uses a property graph model. It has a core API for programmatic access, indexes for fast lookups, and Cypher for graph querying. Neo4j provides high availability through master-slave replication and scales horizontally by sharding graphs across instances through techniques like cache sharding and domain-specific sharding.
This document discusses information retrieval techniques. It begins by defining information retrieval as selecting the most relevant documents from a large collection based on a query. It then discusses some key aspects of information retrieval including document representation, indexing, query representation, and ranking models. The document also covers specific techniques used in information retrieval systems like parsing documents, tokenization, removing stop words, normalization, stemming, and lemmatization.
The document provides an overview of various machine learning algorithms and methods. It begins with an introduction to predictive modeling and supervised vs. unsupervised learning. It then describes several supervised learning algorithms in detail including linear regression, K-nearest neighbors (KNN), decision trees, random forest, logistic regression, support vector machines (SVM), and naive Bayes. It also briefly discusses unsupervised learning techniques like clustering and dimensionality reduction methods.
This document provides an overview of natural language processing (NLP). It discusses topics like natural language understanding, text categorization, syntactic analysis including parsing and part-of-speech tagging, semantic analysis, and pragmatic analysis. It also covers corpus-based statistical approaches to NLP, measuring performance, and supervised learning methods. The document outlines challenges in NLP like ambiguity and knowledge representation.
This document provides an overview of recommender systems for e-commerce. It discusses various recommender approaches including collaborative filtering algorithms like nearest neighbor methods, item-based collaborative filtering, and matrix factorization. It also covers content-based recommendation, classification techniques, addressing challenges like data sparsity and scalability, and hybrid recommendation approaches.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It implements Google's MapReduce programming model and the Hadoop Distributed File System (HDFS) for reliable data storage. Key components include a JobTracker that coordinates jobs, TaskTrackers that run tasks on worker nodes, and a NameNode that manages the HDFS namespace and DataNodes that store application data. The framework provides fault tolerance, parallelization, and scalability.
This document provides an overview of the statistical programming language R. It discusses key R concepts like data types, vectors, matrices, data frames, lists, and functions. It also covers important R tools for data analysis like statistical functions, linear regression, multiple regression, and file input/output. The goal of R is to provide a large integrated collection of tools for data analysis and statistical computing.
The document provides an overview of functional programming, including its key features, history, differences from imperative programming, and examples using Lisp and Scheme. Some of the main points covered include:
- Functional programming is based on evaluating mathematical functions rather than modifying state through assignments.
- It uses recursion instead of loops and treats functions as first-class objects.
- Lisp was the first functional language in 1960 and introduced many core concepts like lists and first-class functions. Scheme was developed in 1975 as a simpler dialect of Lisp.
- Functional programs are more focused on what to compute rather than how to compute it, making them more modular and easier to reason about mathematically.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
Retrieval Augmented Generation Evaluation with RagasZilliz
Retrieval Augmented Generation (RAG) enhances chatbots by incorporating custom data in the prompt. Using large language models (LLMs) as judge has gained prominence in modern RAG systems. This talk will demo Ragas, an open-source automation tool for RAG evaluations. Christy will talk about and demo evaluating a RAG pipeline using Milvus and RAG metrics like context F1-score and answer correctness.
Top 12 AI Technology Trends For 2024.pdfMarrie Morris
Technology has become an irreplaceable component of our daily lives. The role of AI in technology revolutionizes our lives for the betterment of the future. In this article, we will learn about the top 12 AI technology trends for 2024.
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceQuentin Reul
The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to:
- Identify high-impact customer needs with precision
- Harness the power of large language models to address specific customer needs effectively
- Implement AI responsibly to build trust and foster strong customer relationships
Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.
"Hands-on development experience using wasm Blazor", Furdak Vladyslav.pptxFwdays
I will share my personal experience of full-time development on wasm Blazor
What difficulties our team faced: life hacks with Blazor app routing, whether it is necessary to write JavaScript, which technology stack and architectural patterns we chose
What conclusions we made and what mistakes we committed
Generative AI technology is a fascinating field that focuses on creating comp...Nohoax Kanont
Generative AI technology is a fascinating field that focuses on creating computer models capable of generating new, original content. It leverages the power of large language models, neural networks, and machine learning to produce content that can mimic human creativity. This technology has seen a surge in innovation and adoption since the introduction of ChatGPT in 2022, leading to significant productivity benefits across various industries. With its ability to generate text, images, video, and audio, generative AI is transforming how we interact with technology and the types of tasks that can be automated.
This PDF delves into the aspects of information security from a forensic perspective, focusing on privacy leaks. It provides insights into the methods and tools used in forensic investigations to uncover and mitigate privacy breaches in mobile and cloud environments.
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
3. Object-oriented - Structure supports polymorphism, operation overloading and
multiple inheritance
Open source
Coherence - not hard to read, write and maintain
Portable
Runs on virtually every major platform
Dynamic typing
Built-in types and tools
3
4. Library utilities
Third party utilities
Numeric, NumPy, SciPy
Strong numeric processing capabilities: matrix operations
Suitable for probability and machine learning code
Automatic memory management
Linkable to components written in other languages
Linking to fast compiled code useful for computation intensive problems
Good for code steering and merging multiple programs in conflicting languages
Python/C integration quite common
4
5. Easy to use
Rapid turnaround - no intermediate compile and link steps as in C or C++
Programs compiled automatically to an intermediate form called byte-code,
which the interpreter then reads
Development speed of an interpreter without the performance loss inherent in
purely interpreted languages
Intuitive structure and syntax
Multi-purpose (Web, GUI, Scripting)
5
7. Python also an interpreter
Interpreter reads other Python programs and commands and executes them
Python programs are compiled automatically before being scanned into the interpreter
The fact that this process is hidden makes Python faster than a pure interpreter.
Type python into command line
$ sign - start of a terminal command line
# sign - a comment
>>> - start of a Python interpreter's command line
7
8. /usr/local/bin/python
#! /usr/bin/env python
interactive use
Python 3.0 (#1, Sep 24 2008, 20:40:45) [GCC 2.95.1 19990816 (release)] on sunos5
Copyright (c) 1995-2008 Corporation for National Research Initiatives.
All Rights Reserved.
Copyright (c) 1991-2008 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved.
>>>
python –c command [arg] ...
python –i script
read script first, then interactive
8
9. >>> print 'Hello world'
Hello world
# Relevant output is displayed on subsequent lines without the >>> symbol
>>> x = [0,1,2]
>>> x
[0,1,2]
>>> 2+3
5
Ctrl-D exits the interpreter
$
9
10. Python scripts written in text files with the suffix .py
Scripts can be read into the interpreter in several ways
$ python script.py
▪ # Executes the script and returns to the terminal
$ python -i script.py
▪ # Flag –i keeps interpreter open after script is executed
$ python #starts command line interpreter
>>> execfile('script.py')
▪ #execfile command reads script and executes them immediately as though it had been typed into the interpreter directly
$ python
>>> import script # DO NOT add the .py suffix. Script is a module here
▪ #Import command runs the script, displays any un-stored outputs and creates a lower level (or context) within the program
10
11. Assume script.py with following lines
print 'Hello world'
x = [0,1,2]
$ python script.py
Hello world
$ # Script is executed and interpreter is immediately closed. x is lost.
$ python -i script.py
Hello world
>>> x
[0,1,2]
x is stored and can be called later and interpreter is left open
11
12. >>> execfile('script.py')
Hello world
>>> x
▪ [0,1,2]
>>> # Identical to calling the script from terminal with command python -i script.py
$ python
>>> import script
Hello world
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'x' is not defined
When script.py loaded in this way, x is not defined on the top level
12
13. To make use of x, let Python know which module it came from- provide context
>>> script.x
[0,1,2]
>>>
If script.py contains multiple stored quantities, to promote x (and only x) to the top level context
$ python
>>> from script import x
▪ Hello world
>>> x
▪ [0,1,2]
>>>
To promote all quantities in script.py to the top level context, type from script import * into the
interpreter. Of course, if that's what you want, you might as well type python -i script.py into the
terminal
13
14. Whitespace is meaningful in Python: especially indentation and placement of
newlines
Use a newline to end a line of code
Not a semicolon like in C++ or Java
Use when must go to next line prematurely
No braces { } to mark blocks of code in Python…
Use consistent indentation instead. The first line with a new indentation is considered
outside of the block.
Often a colon appears at the start of a new block
14
15. Comments starting with # – rest of line is ignored
Can include a “documentation string” as the first line of any new function or class
Development environment, debugger and other tools use it
def my_function(x, y):
“““This is the docstring.This
function does xyz”””
#The code would go here...
15
16. Dynamic Typing - Python determines the data types automatically
StrongTyping - Python’s not casual about types, it enforces them after it figures them out
Can not just append an integer to a string
First assignment to a variable creates it
Variable types need not be declared
Python figures out the variable types on its own
First convert the integer to a string itself
x = “the answer is ” # Decides x is string
y = 23 # Decides y is integer
print x + y # Python will complain about this
16
17. You can print a string to the screen using “print.”
Use % string operator in combination with print command to format output text
>>> print “%s xyz %d” % (“abc”, 34)
abc xyz 34
“Print” automatically adds a newline to the end of the string
A list of strings are concatenated with a space between them
>>> print “abc”
Abc
>>> print “abc”, “def”
abc def
17
18. Integers - 0, 1, 1234, -56
Dividing an integer by integer returns only the integer part - 7/2 yields 3
Arbitrarily long integers - must end in l or L - 999999999999999999999L
Floating point numbers - 0., 1.0, 1e10, 3.14e-2, 6.99E4
Division works normally - 7./2. = 3.5
18
19. Operations having floats and integers yields floats
6.4 – 2 = 4.4
Octal constants start with leading 0 - 0177, -01234
Hex constants start with leading 0x or 0X - 0x9ff, 0X7AE
Complex numbers must end in j or J
3+4j, 3.0+4.0j, 2J
Typing imaginary part first will return the complex number in the order
Re+ImJ
19
20. Arithmetic operations: a+b, a-b, a*b, a/b
Exponentiation: a**b
Other elementary functions but included in packages like NumPy and
SciPy
Comparison operators - a < b, a > b, a <= b, a >= b
Logical operators are words (and, or, not) not symbols (&&, ||, !)
20
21. Identity tests - a == b, a != b
Bitwise or: a | b
Bitwise exclusive or: a ^ b
Bitwise and: a & b
Shift a left or right by b bits: a << b, a >> b
21
22. Enclosed in single or double quotation marks
Unmatched ones can occur within the string “matt’s”
Use triple double-quotes for multi-line strings or strings than contain
both ‘ and “ inside of them - “““a‘b“c”””
Double quotation marks allow extending strings over multiple lines
without backslashes
'abc', “ABC”
22
23. Concatenation using + sign
>>> 'abc'+'def‘
'abcdef‘
Word = 'Help' 'a‘
Strings repeated using *
>>> 'abc'*3
'abcabcabc‘
Built-in methods for formatting
>>> “hello”.upper()
‘HELLO’
23
24. Indexing starts at 0 to end at len-1
>>> s = 'string‘
>>> s[1]
't’
s[i:j] fetches elements i (inclusive) through j (not inclusive)
>>> s[1:4]
'tri’
24
25. s[:j] fetches all elements up to, but not including j
>>> s[:3]
'str'
s[i:] fetches all elements from i onward (inclusive)
>>> s[2:]
'ring'
25
26. s[i:j:k] extracts every kth element starting with index i (inclusive) and
ending with index j (not inclusive)
>>> s[0:5:2]
'srn‘
Negative indexes - s[-1] means extract first element from end (s[len(s)-1])
>>> s[-1]
'g‘
>>> s[-2]
'n’
26
27. Contained in square brackets []
Members - numbers, strings, nested sub-lists or nothing
L1 = [0,1,2,3], L2 = ['zero', 'one'], L3 = [0,1,[2,3],'three',['four,one']], L4 = []
Indexing works same as strings
Mutable: individual elements can be reassigned in place
Members can grow and shrink in place
>>> L1 = [0,1,2,3]
>>> L1[0] = 4
>>> L1[0]
4
27
30. append(x)
extend(L) - append all items in list (likeTcl lappend)
insert(i,x)
remove(x)
pop([i]), pop() - create stack (FIFO), or queue (LIFO) pop(0)
index(x) - return the index for value x
count(x) - how many times x appears in list
sort() - sort items in place
reverse() - reverse list
30
31. filter(function, sequence)
def f(x): return x%2 != 0 and x%3 0
filter(f, range(2,25))
map(function, sequence)
call function for each item
return list of return values
reduce(function, sequence)
return a single value
call binary function on the first two items
then on the result and next item
iterate
31
32. Create lists without map(), filter(), lambda
= expression followed by for clause + zero or more for or of clauses
>>> vec = [2,4,6]
>>> [3*x for x in vec]
[6, 12, 18]
>>> [{x: x**2} for x in vec}
[{2: 4}, {4: 16}, {6: 36}]
32
33. Cross products
>>> vec1 = [2,4,6]
>>> vec2 = [4,3,-9]
>>> [x*y for x in vec1 for y in vec2]
[8,6,-18, 16,12,-36, 24,18,-54]
>>> [x+y for x in vec1 and y in vec2]
[6,5,-7,8,7,-5,10,9,-3]
>>> [vec1[i]*vec2[i] for i in
range(len(vec1))]
[8,12,-54]
can also use if
>>> [3*x for x in vec if x > 3]
[12, 18]
>>> [3*x for x in vec if x < 2]
[]
33
34. Remove by index
Remove slices from list (rather than by assigning an empty list)
>>> a = [-1,1,66.6,333,333,1234.5]
>>> del a[0]
>>> a
[1,66.6,333,333,1234.5]
>>> del a[2:4]
>>> a
[1,66.6,1234.5]
34
35. Contained in parentheses ()
Members - numbers, strings, nested
sub-tuples or nothing
t1 = (0,1,2,3), t2 = ('zero', 'one'),t3 =
(0,1,(2,3), 'three', ('four,one')), t4 = ()
IF not nesting tuples, can omit
parentheses - t1 = 0,1,2,3 is same as t1
= (0,1,2,3)
Indexing works string
Immutable: individual elements
cannot be reassigned in place
Concatenation
>>> t1 = (0,1,2,3); t2 = (4,5,6)
>>> t1+t2
(0,1,2,3,4,5,6)
Repetition
>>> t1*2
(0,1,2,3,0,1,2,3)
Length: len(t1) (also works for lists
and strings)
35
36. LikeTcl or awk associative arrays
Indexed by keys
Keys are any immutable type - tuples
But not lists (mutable!)
Uses 'key: value' notation
>>> tel = {'hgs' : 7042, 'lennox': 7018}
>>> tel['cs'] = 7000
>>> tel
36
37. No particular order
Delete elements with del
>>> del tel['foo']
Keys() method unsorted list of keys
>>> tel.keys()
['cs', 'lennox', 'hgs']
Use has_key() to check for existence
>>> tel.has_key('foo')
0
37
38. Speed – Array operations are much faster than on each list member
Loading in array capabilities
>>> from numpy import *
Create
>>> vec = array([1,2,3])
Create 3x3 matrix
>>> mat = array([[1,2,3],[4,5,6],[7,8,9]])
Initialize a dummy array
zeros((m,n), 'typecode')
Creates an m x n array of zeros
Can be integers, floats, double precision floats depending on type code
38
39. Similarities
Mutable: can have elements reassigned in place
indexed and sliced identically
len command works identically
Sort and reverse attributes
Differences
For arrays the + and * signs do not refer to concatenation or repetition
>>> ar1 = array([2,4,6])
>>> ar1+2 # Adding a constant to an array adds it to each member
▪ [4,6,8,]
>>> ar1*2
▪ [4,8,12,]
39
40. Adding two arrays - adding two vectors
>>> ar1 = array([2,4,6]); ar2 = array([1,2,3])
>>> ar1+ar2
▪ [3,6,9,]
Multiply two arrays – multiplies term by
term
>>> ar1*ar2
▪ [2,8,18,]
Division
>>> ar1/ar2
▪ [2,2,2,]
A function acting on an array acts on each
term in the array
>>> ar2**2
▪ [1,4,9,]
>>> ar3 = (pi/4)*arange(3) # like range, but
an array
>>> sin(ar3)
▪ [ 0. , 0.70710678, 1. ,]
40
41. Mutable types - dictionaries, lists, arrays - can have individual items reassigned in place
Immutable types - numbers, strings, tuples - cannot
>>> L = [0,2,3]
>>> L[0] = 1
>>> L
▪ [1,2,3]
>>> s = 'string'
>>> s[3] = 'o'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object does not support item assignment
41
42. Handle name assignments differently
If you assign a name to an immutable item, then set a second name equal to the first, changing the
value of the first name will not change that of the second
However, for mutable items, changing the value of the first name will change that of the second
>>> a = 2
>>> b = a # a and b are both numbers, and are thus immutable
>>> a = 3
>>> b
2
42
43. Even though we set b equal to a, changing the value of a
does not change the value of b.
However, for mutable types, this property does not hold.
>>> La = [0,1,2]
>>> Lb = La # La and Lb are both lists, and are thus mutable
>>> La = [1,2,3]
>>> Lb
▪ [1,2,3]
43
44. Setting Lb equal to La means that changing the value of La changes that
of Lb
To circumvent this property, we would make use of the function
copy.copy()
>>> La = [0,1,2]
>>> Lb = copy.copy(La)
Now changing the value of La will not change the value of Lb
44
45. Can check for sequence membership with is and is not
>>> if (4 in vec):
... print '4 is'
Chained comparisons - a less than b AND b equals c:
a < b == c
And, or are short-circuit operators
evaluated from left to right
stop evaluation as soon as outcome clear
45
46. Can assign comparison to variable
>>> s1,s2,s3='', 'foo', 'bar'
>>> non_null = s1 or s2 or s3
>>> non_null
▪ foo
UnlikeC, no assignment within expression
46
48. >>> if condition
... action
...
Subsequent indented lines are assumed to be part of the if statement
The same is true for most other types of python statements
A statement typed into an interpreter ends once an empty line is entered and a
statement in a script ends once an un-indented line appears
The same is true for defining functions
48
49. Can be combined with else if (elif) and else statements
if condition1: # if condition1 is true, execute action1
▪ action1
elif condition2: # if condition1 is not true, but condition2 is, execute
▪ action2 # action2
else: # if neither condition1 nor condition2 is true, execute
▪ action3 # action3
49
50. Combined using and & or
if condition1 and condition2
Action1
if condition1 or condition2:
action2
Condition operations - <, <=, >, >=, ==, !=, in
>>> x = 2; y = 3; L = [0,1,2]
>>> if (1<x<=3 and 4>y>=2) or (1==1 or 0!=1) or 1 in L:
... print 'Hello world‘
...
Hello world
50
51. >>> while condition
... action
...
>>>
>>> x = 1
>>> while x < 4:
... print x**2
... x = x+1
...
1 # only the squares of 1, 2, and 3 are printed, because
4 # once x = 4, the condition is false
9
51
52. for item i in set s:
action on item i
>>> for i in range(1,7):
... print i, i**2, i**3, i**4
...
1 1 1 1
2 4 8 16
3 9 27 81
4 16 64 256
5 25 125 625
6 36 216 1296
52
53. >>> L = [0,1,2,3] # or, equivalently, range(4)
>>> for i in range(len(L)):
... L[i] = L[i]**2
...
>>> L
[0,1,4,9]
>>>
#CompactVersion
>>> L = arange(4)
>>> L = L**2
>>> L
[0,1,4,9,]
53
54. >>> L = [0,1,2,3] # or, equivalently, range(4)
>>> for i in range(len(L)):
... j = i/2.
... if j – int(j) == 0.0:
... L[i] = L[i]+1
... else: L[i] = -i**2
...
>>> L
[1,-1,3,-9]
>>>
54
55. def func(args):
return values
A function call must end with
parentheses
Functions may be simple one-to-
one mappings
>>> def f1(x):
... return x*(x-1)
...
>>> f1(3)
6
They may contain multiple input
and/or output variables
>>> def f2(x,y):
... return x+y,x-y
...
>>> f2(3,2)
(5,1)
55
56. Do not need to contain arguments at
all
>>> def f3():
... print 'Hello world'
...
>>> f3()
Hello world
Can set arguments to default values in
function definitions
>>> def f4(x,a=1):
... return a*x**2
...
>>>
Function is called with only one
argument, the default value of 1 is
assumed for the second argument
>>> f4(2)
4
However can change the second
argument from its default value
>>> f4(2,a=2) # f4(2,2) would also
work
8
56
58. Anything calculated inside a function but
not specified as an output quantity (either
with return or global) will be deleted once
the function stops running
>>> def f5(x,y):
... a = x+y
... b = x-y
... return a**2,b**2
...
>>> f5(3,2)
(25,1)
If we try to call a or b, we get an error
message
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'a' is not defined
58
59. Anonymous functions
may not work in older versions
def make_incrementor(n):
return lambda x: x + n
f = make_incrementor(42)
f(0)
f(1)
59
60. >>> help(execfile) # don't include parentheses when using the function name as
an argument
WARP has a similar library function called doc()
>>> from warp import *
>>> doc(execfile)
The main difference between help() and doc() is that doc() prints the relevant
documentation onto the interpreter screen.
60
61. Scoping hierarchy, in decreasing order of breadth
Built-in (Python)
Predefined names (len, open, execfile, etc.) and types
Global (module)
Names assigned at the top level of a module, or directly in the interpreter
Names declared global in a function
Local (function)
Names assigned inside a function definition or loop
61
62. >>> a = 2 # a is assigned in the interpreter, so it's global
>>> def f(x): # x is in the function's argument list, so it's local
... y = x+a # y is only assigned inside the function, so it's local
... return y # using the sa
...
>>>
62
63. If a module file is read into the interpreter via execfile, any quantities defined in
the top level of the module file will be promoted to the top level of the program
script.py:
print 'Hello world'
x = [0,1,2]
>>> execfile('script.py')
Hello world
>>> x
[0,1,2]
63
64. If we had imported script.py instead, the list x would not be defined on the top
level
To call x, we would need to explicitly tell Python its scope, or context.
>>> import script
Hello world
>>> script.x
[0,1,2]
If we had tried to call x without a context flag, an error message would have
appeared
64
65. Modules may well contain sub-modules
For a file named module.py which, in its definition, imports a sub-module named submodule, which in
turn contains some quantity named x.
>>> import module
If we load the module this way, we would type the following to call x:
>>> module.submodule.x
We can also import the sub-module without importing other quantities defined in module.py:
>>> from module import submodule
In this case, we would type the following to call x:
>>> submodule.x
We would also call x this way if we had read in module.py with execfile()
65
66. Can use the same names in different scopes
>>> a = 2
>>> def f5(x,y)
... a = x+y # this a has no knowledge of the global a, and vice-versa
... b = x-y
... return a**2,b**2
...
>>> a
2
66
67. The local a is deleted as soon as the function stops running
>>> x = 5
>>> import script # same script as before
Hello world
>>> x
5
>>> script.x # script.x and x are defined in different scopes, and
[0,1,2] # are thus different
67
68. Changing a global name used in a function definition changes the function
>>> a = 2
>>> def f(x):
... return x+a # this function is, effectively, f(x) = x+2
...
>>> f(4)
6
68
69. >>> a = 1
>>> f(4) # since we set a=1, f(x) = x+1 now
5
Unlike some other languages, Python function arguments are not modified by default:
>>> x = 4
>>> f(x)
5
>>> x
4
69
70. Collection of functions and variables, typically in scripts
Definitions can be imported
File name is module name + .py
Create module fibo.py
def fib(n): # write Fib. series up to n
...
def fib2(n): # return Fib. series up to n
70
71. Import module
import fibo
Use modules via "name space"
>>> fibo.fib(1000)
>>> fibo.__name__
▪ 'fibo'
can give it a local name
>>> fib = fibo.fib
>>> fib(500)
71
72. Function definition + executable statements
Executed only when module is imported
Modules have private symbol tables
Avoids name clash for global variables
Accessible as module.globalname
Can import into name space
>>> from fibo import fib, fib2
>>> fib(500)
Can import all names defined by module
>>> from fibo import *
72
73. Current directory
List of directories specified in PYTHONPATH environment variable
Uses installation-default if not defined - .:/usr/local/lib/python
Uses sys.path
>>> import sys
>>> sys.path
['', 'C:PROGRA~1Python2.2', 'C:Program FilesPython2.2DLLs', 'C:Program
FilesPython2.2lib', 'C:Program FilesPython2.2liblib-tk', 'C:Program
FilesPython2.2', 'C:Program FilesPython2.2libsite-packages']
73
74. Include byte-compiled version of module if there exists fibo.pyc in same directory
as fibo.py
Only if creation time of fibo.pyc matches fibo.py
Automatically write compiled file, if possible
Platform independent
Doesn't run any faster, but loads faster
Can have only .pyc file hide source
74
77. Mixture of C++ and Modula-3
Multiple base classes
Derived class can override any methods of its base class(es)
Method can call the method of a base class with the same name
Objects have private data
C++ terms
all class members are public
all member functions are virtual
no constructors or destructors (not needed)
77
78. Classes (and data types) are objects
Built-in types cannot be used as base classes by user
Arithmetic operators, subscripting can be redefined for class instances
(like C++, unlike Java)
78
80. Mapping from name to object
built-in names (abs())
global names in module
local names in function invocation
Attributes = any following a dot
z.real, z.imag
Attributes read-only or writable
module attributes are writeable
80
81. Scope = textual region of Python program where a namespace is directly
accessible (without dot)
innermost scope (first) = local names
middle scope = current module's global names
outermost scope (last) = built-in names
Assignments always affect innermost scope
don't copy, just create name bindings to objects
Global indicates name is in global scope
81
82. obj.name references (plus module!):
class MyClass:
"A simple example class"
i = 123
def f(self):
return 'hello world'
>>> MyClass.i
123
MyClass.f is method object
82
83. Class instantiation
>>> x = MyClass()
>>> x.f()
'hello world'
Creates new instance of class
note x = MyClass vs. x = MyClass()
___init__() special method for initialization of object
def __init__(self,realpart,imagpart):
self.r = realpart
self.i = imagpart
83
84. Attribute references
Data attributes (C++/Java data members)
created dynamically
x.counter = 1
while x.counter < 10:
x.counter = x.counter * 2
print x.counter
del x.counter
84
85. Called immediately
x.f()
can be referenced
xf = x.f
while 1:
print xf()
object is passed as first argument of function 'self'
x.f() is equivalent to MyClass.f(x)
85
86. Data attributes override method attributes with the same name
No real hiding not usable to implement pure abstract data types
Clients (users) of an object can add data attributes
First argument of method usually called self
'self' has no special meaning (cf. Java)
86
91. No real support, but textual replacement (name mangling)
__var is replaced by _classname_var
Prevents only accidental modification, not true protection
91
92. Packages
Numeric – good for numerical algebra, trigonometry, etc.CAUTION: no longer supported
NumPy – similar to Numeric, but handles arrays slightly differently and has a few other built-in commands and
functions
SciPy – useful for numerical integration,ODE solutions, interpolations, etc.: based on NumPy
Websites
http://docs.python.org - online version of built-in Python function documentation
http://laurent.pointal.org/python/pqrc - Python Quick Reference Card
http://rgruet.free.fr - long version of Python Quick Reference Card
http://mail.python.org - extensive Python forum
https://docs.python.org/2/tutorial/
92
94. Open-source add-on modules to Python
Common mathematical and numerical routines in pre-compiled, fast functions
Functionality similar to commercial software like MatLab
NumPy (Numeric Python) package - provides basic routines to manipulate large
arrays and matrices of numeric data
SciPy (Scientific Python) package – extends functionality of NumPy with a
substantial collection of useful algorithms, like minimization, Fourier
transformation, regression, and other applied mathematical techniques
94
95. Linear algebra – matrix inversion, decompositions...
Fast Fourier transforms
Masked arrays
Random number generation
Rich set of numerical data types
Fast
Loops mostly unnecessary
Operate on entire array
95
96. ndarray
A fast built-in N-dimensional homogeneous array object containing elements of same type
Elements can be C-structure or simple data-types
Fast algorithms on machine data-types (int, float...)
Universal functions
Element-by-element function objects
>>> a. = numpy.array([20,30,40,50])
>>> b = 10*numpy.sin(a)
>>>b
▪ Array([9.12945351, -9.88031624, 7.4511316, -2.62374854])
96
97. Can be any arbitrary structure specified using data-
type
Every dimension is accessed by stepping (striding) a
fixed number of bytes through memory
If memory is contiguous, then the strides are pre-
computed indexing-formula
Indexing starts at 0
Dimensions are called axes
97
99. Create
>>> a = array([0,1,2,3])
array([0, 1, 2, 3])
Check type
>>> type(a)
<type 'array'>
Numeric type of element
>>> a.dtype
dtype(‘int32’)
Bytes per element
>>> a.itemsize
4
Shape - returns aTuple listing the
length of array along each dimension
>>> a.shape
(4,)
>>> shape(a)
(4,)
99
100. Size – reports entire number of
elements in an array
>>> a.size
4
>>> size(a)
4
GET / SET elements
>>> a[1,3]
13
>>> a[1,3] = -1
array([[ 0, 1, 2, 3], [10,11,12,-1]])
Address first row using index
>>> a[1]
array([10, 11, 12, -1])
100
101. Memory usage - returns number of
bytes
>>> a.nbytes
12
Dimensions
>>> a.ndim
1
Create a copy of the array
>>> b = a.copy()
array([0, 1, 2, 3])
Convert numpy array to a python list
>>> a.tolist()
[0, 1, 2, 3]
101
102. For 1D arrays, list works similarly but
slow
>>> list(a)
[0, 1, 2, 3]
Indexing
>>> a[0]
0
>>> a[1] = 10
>>> a
[10, 1, 2, 3]
Test if values are present in an array
>>> a = np.array([[1, 2, 3], [4, 5, 6]],
float)
>>> 2 in a
True
102
103. Fill - set all values in an array
>>> a.fill(0)
[0, 0, 0, 0]
This also works, but slow
>>> a[:] = 1
[1, 1, 1, 1]
dtype property tells type of values
stored by the array
>>> a.dtype
dtype('int32')
Assigning a float to int32 array
truncates decimal part
>>> a[0] = 10.6
[10, 1, 2, 3]
103
104. Fill behaves same
>>> a.fill(-4.8)
[-4, -4, -4, -4]
Arrays can be reshaped using tuples that specify new dimensions
Turn a ten-element one-dimensional array into a two-dimensional one whose
first axis has five elements and second axis has two
104
107. Memory model allows simple indexing (integers and slices) to be a view of the same data
>>> b = a[:,::2]
>>> b[0,1] = 100
>>> print a
[[ 1. 2. 100.]]
[ 4. 5. 6.]]
>>> c = a[:,::2].copy()
>>> c[1,0] = 500
>>> print a
[[ 1. 2. 100.]]
[ 4. 5. 6.]]
107
108. References to memory in array
Changing values in a slice also changes array
>>> a = array((0,1,2,3,4))
Create a slice containing only the last element of a
>>> b = a[2:4]
>>> b[0] = 10
>>> a #changing b changes a
array([ 1, 2, 10, 3, 4])
108
109. INDEXING BY POSITION
>>> a = arrange(0,80,10)
fancy indexing
>>> y = a[[1, 2, -3]]
>>> print y
[10 20 50]
using take
>>> y = take(a,[1,2,-3])
>>> print y
[10 20 50]
INDEXINGWITH BOOLEANS
>>> mask = array([0,1,1,0,0,1,0,0],dtype = bool)
fancy indexing
>>> y = a[mask]
>>> print y
[10,20,50]
using compress
>>> y = compress(mask, a)
>>> print y
[10,20,50]
109
111. 21 built-in static data-type objects
Provides details of how to interpret the memory for an item
An instance of a single dtype class
Every object has a type attribute which provides the Python object returned
when an element is selected from the array
111
112. New dynamic data-type objects created to handle
Alteration of the byte-order
Change in the element size for string, unicode and void built-ins
Addition of fields
Change of the type object (C-structure arrays)
Creation is flexible
New user-defined built-in data-types can be added
but must be done in C and involves filling a function-pointer table
112
113. An item can include fields of different data types
A field is described by a data-type object and a byte offset
allows nested records
Array construction command interprets tuple elements as field entries
>>> dt = N.dtype(“i4,f8,a5”)
>>> print dt.fields
{'f1': (dtype('<i4'), 0), 'f2': (dtype('<f8'), 4), 'f3': (dtype('|S5'), 12)}
>>> a = N.array([(1,2.0,”Hello”), (2,3.0,”World”)], dtype=dt)
>>> print a['f3']
[HelloWorld]
113
114. >>> a = array([[1,2,3], [4,5,6]], float)
>>> sum(a) #summing all array values
21
Keyword axis used to sum along the 0th axis
>>> sum(a, axis=0)
array([5., 7., 9.])
Keyword axis to sum along the last axis
>>> sum(a, axis=-1)
array([6., 15.])
>>> a.sum() # Sum array method
21
114
115. Axis argument to sum along a specific axis
>>> a.sum(axis=0)
array([5., 7., 9.])
Product along columns
>>> a.prod(axis=0)
array([ 4., 10., 18.])
Functional form
>>> prod(a, axis=0)
array([ 4., 10., 18.])
115
116. Min
>>> a = array([2.,3.,0.,1.])
>>> a.min(axis=0)
0
Use amin() instead of Python’s built-in min() for speed operations on multi-dimensional arrays
>>> amin(a, axis=0)
0
ArgMin - Find index of minimum value
>>> a.argmin(axis=0)
2
functional form
>>> argmin(a, axis=0)
2
116
117. Max
>>> a = array([2.,1.,0.,3.])
>>> a.max(axis=0)
3
functional form
>>> amax(a, axis=0)
3
ArgMax - Find index of maximum value
>>> a.argmax(axis=0)
1
functional form
>>> argmax(a, axis=0)
1
117
118. >>> a = array([[1,2,3], [4,5,6]], float)
Mean value of each column
>>> a.mean(axis=0)
array([ 2.5, 3.5, 4.5])
>>> mean(a, axis=0)
array([ 2.5, 3.5, 4.5])
>>> average(a, axis=0)
array([ 2.5, 3.5, 4.5])
Weighted average
>>> average(a, weights=[1,2],
... axis=0)
array([ 3., 4., 5.])
118
120. Clip - limit values to a range
>>> a = array([[1,2,3], [4,5,6]], float)
set values < 3 equal to 3 and values > 5 equal to 5
>>> a.clip(3,5)
>>> a
array([[ 3., 3., 3.], [ 4., 5., 5.]])
Round values in an array. Rounds to even, so 1.5 and 2.5 round to 2
>>> a = array([1.35, 2.5, 1.5])
>>> a.round()
array([ 1., 2., 2.])
120
121. Round to first decimal place
>>> a.round(decimals=1)
array([ 1.4, 2.5, 1.5])
Point to point calculate max – min for array along columns
>>> a.ptp(axis=0)
array([ 3.0, 3.0, 3.0])
Max – min for entire array
>>> a.ptp(axis=None)
5.0
121
122. a.dtype – Numerical type of array elements - float32, uint8
a.shape – shape of the array. (m,n,o,...)
a.size – elements in array
a.itemsize – bytes used by a single element in the array
a.nbytes – bytes used by entire array (data only)
a.ndim – dimensions in the array
122
123. a.flat – iterator to step through array as if it is 1D
a.flatten() – returns a 1D copy of a multi-dimensional array
a.ravel() – same as flatten(), but returns a view if possible
a.resize(new_size) – change the size/shape of an array in-place
a.swapaxes(axis1, axis2) – swap order of two axes in an array
a.transpose(*axes) – swap order of any number of array axes
a.T – a.transpose()
a.squeeze() – remove any length=1 dimensions from an array
123
124. a.copy() – copy of the array
a.fill(value) – fill array with a scalar value
a.tolist() – convert array into nested lists of values
a.tostring() – raw copy of array memory into a python string
a.astype(dtype) – return array coerced to given dtype
a.byteswap(False) – convert byte order
a.real – return real part of the array
a.imag – return imaginary part of the array
a.conjugate() – return complex conjugate of the array
a.conj()– return the complex conjugate of an array
124
125. a.dump(file) – store binary array data to file
a.dumps() – returns the binary pickle of the array as a string
a.tofile(fid, sep="", format="%s") - formatted ascii output to file
a.nonzero() – return indices for all non-zero elements in a
a.sort(axis=-1) – in-place sort of array elements along axis
a.argsort(axis=-1) – return indices for element sort order along axis
a.searchsorted(b) – return index where elements from b would go in a
a.clip(low, high) – limit values in array to the specified range
a.round(decimals=0) – round to the specified number of digits
a.cumsum(axis=None) – cumulative sum of elements along axis
a.cumprod(axis=None) – cumulative product of elements along axis
125
126. These methods reduce size of the array by 1 dimension by carrying out an
operation along the specified axis
If axis is none, operation is carried out across the entire array
a.sum(axis=None) – sum up values along axis
a.prod(axis=None) – product of all values along axis
a.min(axis=None)– minimum value along axis.
a.max(axis=None) – maximum value along axis
a.argmin(axis=None) – index of the minimum value along axis
126
127. a.argmax(axis=None) – index of the maximum value along axis
a.ptp(axis=None) – a.max(axis) – a.min(axis)
a.mean(axis=None) – mean average value along axis
a.std(axis=None) – standard deviation along axis
a.var(axis=None) – variance along axis
a.any(axis=None) –True if any value along axis is non-zero
a.all(axis=None) –True if all values along axis are non-zero
127
128. Simple array math
>>> a = array([1,2,3,4])
>>> b = array([2,3,4,5])
>>> a + b
array([3, 5, 7, 9])
NumPy defined constants
pi = 3.14159265359
e = 2.71828182846
128
129. >>> x = arange(11.) #Create array
from 0 to 10
Multiply entire array by scalar value
>>> a = (2*pi)/10
>>> a
0.62831853071795862
>>> a*x
array([ 0.,0.628,…,6.283])
In-place operations
>>> x *= a
>>> x
array([ 0.,0.628,…,6.283])
Apply functions to array
>>> y = sin(x)
129
130. Objects that rapidly evaluate a function element-by-element over an array
Core piece is a 1-d loop written in C that performs the operation over the largest
dimension of the array
For 1-d arrays - much faster than list comprehension
>>> type(N.exp)
<type 'numpy.ufunc'>
>>> x = array([1,2,3,4,5])
>>> print N.exp(x)
[ 2.71828183 7.3890561 20.08553692 54.59815003 148.4131591 ]
>>> print [math.exp(val) for val in x]
[2.7182818284590451, 7.3890560989306504,20.085536923187668,
54.598150033144236,148.4131591025766]
130
131. Mathematic, comparative, logical, and bitwise operators with two
arguments (binary operators) have special methods that operate on
arrays
op.reduce(a,axis=0)
op.accumulate(a,axis=0)
op.outer(a,b)
op.reduceat(a,indices)
131
132. a + b add(a, b)
a - b subtract(a, b)
a % b remainder(a, b)
a * b multiply(a, b)
a / b divide(a, b)
a ** b power(a, b)
Multiply by a scalar
>>> a = array((1,2))
>>> a*3
▪ array([3., 6.])
Element by element addition
>>> a = array([1,2])
>>> b = array([3,4])
>>> a + b
▪ array([4, 6])
Addition using operator function
>>> add(a, b)
▪ array([4, 6])
In place operation - overwrite contents of a -
saves array creation overhead
>>> add(a, b, a) # a += b
▪ array([4, 6])
>>> a
▪ array([4, 6])
132
136. Describes how NumPy treats arrays with different shapes during arithmetic
operations
Smaller array is broadcast across the larger array so that they have compatible
shapes
All arrays are promoted to the same number of dimensions
136
+
30 30 30
20 20 20
10 10 10
0 0 0 0 1 2
=
3
4x3 4
mismatch!
137. Two dimensions are compatible when
they are equal, or
one of them is 1
If conditions not met exception thrown
ValueError: frames are not aligned
Size of the resulting array is the maximum size along each dimension of the input
arrays
137
139. >>> x = [1,2,3,4];
>>> y = [[10],[20],[30]]
>>> print N.add(x,y)
[[11 12 13 14]
[21 22 23 24]
[31 32 33 34]]
x has shape (4,). Ufunc sees it as having shape
(1,4). y has shape (3,1).The ufunc result has
shape (3,4)
>>> x = array(x)
>>> y = array(y)
>>> print x+y
▪ [[11 12 13 14]
▪ [21 22 23 24]
▪ [31 32 33 34]]
>>> a = array((0,10,20,30))
>>> b = array((0,1,2))
>>> y = a[:, None] + b
139
+
30
20
10
0 0 1 2
=
30 31 32
20 21 22
10 11 12
0 1 2
140. Optimization
Data fitting
Interpolation
Spatial analysis
Clustering
Signal and image processing
Sparse matrices
Statistics
Integration
Determining a function’s maxima or
minima
Eigenvectors for large sparse matrices
Testing if two distributions are the
same...
140
141. Fftpack - Fast Fourier transform
Integrate - integration routines
Interpolate - Routines and classes for interpolation objects that can be used with
discrete numeric data
Linalg - linear algebra routines - inverse, determinant, solving a linear system of
equations, computing norms and pseudo/generalized inverses, eigenvalue /
eigenvector decomposition, singular value decomposition, LU decomposition,
Cholesky decomposition, QR decomposition, Schur decomposition
141
142. Optimize - Linear regression - finding a function’s minimum and maximum
values, determining the root of a function , finding where two functions
intersect...
Signal - convolution, correlation, finite Fourier transforms, B-spline smoothing,
filtering
Sparse - sparse matrices
Stats - functions for statistical distributions
IO data - input and output
Special - definitions of many usual math functions
Weave - C/C++ integration
142
143. 2D graphics library for Python that produces publication quality figures
Graphics in R are much easier to construct, and require less code
Graphics in matplotlib may look nicer; require much more code
143
144. Much larger community (SciPy + Python).
Better, and cleaner graphics
Sits on atop a programming language rather than being one
Dictionary data type is very useful (Python)
Memory mapped files and extensive sparse matrix packages
144
145. Documentation, Examples and cookbook http://www.numpy.org
Documentation, Examples and cookbook http://www.scipy.org3
http://stackoverflow.com/
http://matplotlib.org/
http://www.scipy.org/install.html
145