Peipei Zhou

Pittsburgh, Pennsylvania, United States Contact Info

Sign in to view Peipei’s full profile

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

4K followers 500+ connections

View mutual connections with Peipei

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

University of Pittsburgh

University of California, Los Angeles

Personal Website

About

I am a tenure-track assistant professor at University of Pittsburgh, ECE department. I…

Activity

Such an intellectually stimulating, thought provoking weekend at the inaugural Asian American Pioneer Medal (#AAPM) symposium at Stanford University.…

Such an intellectually stimulating, thought provoking weekend at the inaugural Asian American Pioneer Medal (#AAPM) symposium at Stanford University.…

Liked by Peipei Zhou
We're #UMNProud of incoming associate professor Caiwen Ding for winning the National Science Foundation (NSF) Faculty Early Career Development…

We're #UMNProud of incoming associate professor Caiwen Ding for winning the National Science Foundation (NSF) Faculty Early Career Development…

Liked by Peipei Zhou
Last Friday, the day before the Asian American Pioneer Medal Symposium and Ceremony (AAPM), in the Y2E2 building at Stanford, AASF held a small-scale…

Last Friday, the day before the Asian American Pioneer Medal Symposium and Ceremony (AAPM), in the Y2E2 building at Stanford, AASF held a small-scale…

Liked by Peipei Zhou

Join now to see all activity

Experience & Education

University of Pittsburgh

*******

***** ******** ********
****

**** ******** ******* ***
********** ** **********, *** *******

****** ** ********** - *** ******** ******* *.* / *.*

2014 - 2019
****

******'* ****** ********** *********** *.** / *.*

2012 - 2014

View Peipei’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture

Association for Computing Machinery February 12, 2023

We design end-to-end deep learning acceleration framework on AMD Versal ACAP

See publication
(PhD Dissertation) Modeling and Optimization for Customized Computing: Performance, Energy and Cost Perspective

UCLA Electronic Theses and Dissertations August 5, 2019

This dissertation investigates design target, modeling, and optimization for field-programmable gate array (FPGA) customized computing at chip-level, node-level and cluster-level. FPGAs have gained popularity in the acceleration of a wide range of applications with 10x-100x performance/energy efficiency over the general-purpose processors. The design choices of FPGA accelerators for different targets at different levels are enormous. To guide the designers to find the best design choices…

This dissertation investigates design target, modeling, and optimization for field-programmable gate array (FPGA) customized computing at chip-level, node-level and cluster-level. FPGAs have gained popularity in the acceleration of a wide range of applications with 10x-100x performance/energy efficiency over the general-purpose processors. The design choices of FPGA accelerators for different targets at different levels are enormous. To guide the designers to find the best design choices, modeling is inevitable.

See publication
(IEEE TCAD Donald O. Pederson Best Paper Award) Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networksloja virtual

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Dec 2018
With the recent advancement of multilayer convolutional neural networks (CNNs) and fully connected networks (FCNs), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper, we design and implement Caffeine, a hardware/software co-designed library to…

With the recent advancement of multilayer convolutional neural networks (CNNs) and fully connected networks (FCNs), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper, we design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN and FCN on FPGAs. First, we propose a uniformed convolutional matrix-multiplication representation for both computation-bound convolutional layers and communication-bound FCN layers. Based on this representation, we optimize the accelerator microarchitecture and maximize the underlying FPGA computing and bandwidth resource utilization based on a revised roofline model. Moreover, we design an automation flow to directly compile highlevel network definitions to the final FPGA accelerator. As a case study, we integrate Caffeine into the industry-standard software deep learning framework Caffe. We evaluate Caffeine and its integration with Caffe by implementing VGG16 and AlexNet networks on multiple FPGA platforms. Caffeine achieves a peak performance of 1460 giga fixed point operations per second on a medium-sized Xilinx KU060 FPGA board; to our knowledge, this is the best published result. It achieves more than 100× speedup on FCN layers over prior FPGA accelerators. An end-to-end evaluation with Caffe integration shows up to 29× and 150× performance and energy gains over Caffe on a 12-core Xeon server, and 5.7× better energy efficiency over the GPU implementation. Performance projections for a system with a high-end FPGA (Virtex7 690t) show even higher gains.

Other authors
See publication
(Best Paper Nominee) SODA: Stencil with Optimized Dataflow Architecture

2018 International Conference On Computer Aided Design November 1, 2018
Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial diferential equations, and cellular automata. Many of the stencil kernels are complex, usually consist of multiple stages or iterations, and are often computation-bounded. Such kernels are often off-loaded to FPGAs to take advantages of the efficiency of dedicated hardware. However, implementing such complex kernels efficiently is not trivial, due to complicated data…

Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial diferential equations, and cellular automata. Many of the stencil kernels are complex, usually consist of multiple stages or iterations, and are often computation-bounded. Such kernels are often off-loaded to FPGAs to take advantages of the efficiency of dedicated hardware. However, implementing such complex kernels efficiently is not trivial, due to complicated data dependencies, difficulties of programming FPGAs with RTL, as well as large design space.
In this paper we present SODA, an automated framework for implementing Stencil algorithms with Optimized Datalow Architecture on FPGAs. The SODA microarchitecture minimizes the on-chip reuse bufer size required by full data reuse and provides flexible and scalable fine-grained parallelism. The SODA automation framework takes high-level user input and generates efficient, high-frequency datalow implementation. This significantly reduces the difficulty of programming FPGAs efficiently for stencil algorithms. The SODA design-space exploration framework models the resource constraints and searches for the performance-optimized coniguration with accurate models for post-synthesis resource utilization and on-board execution throughput. Experimental results from on-board execution using a wide range of benchmarks show up to 3.28x speed up over 24-thread CPU and our fully automated framework achieves better performance compared with manually designed state-of-the-art FPGA accelerators.

Other authors
See publication
Latte: Locality Aware Transformation for High-Level Synthesis

2018 IEEE International Symposium on Field-Programmable Custom Computing Machines May 1, 2018
First-author paper
Modern FPGA chips feature abundant reconfigurable resources such as LUTs, FFs, BRAMs and DSPs. High-level synthesis (HLS) further advances users productivity in designing accelerators and scaling out the design quickly via fine-grain and coarse-grain pipelining and duplication to utilize on-chip resources. However, current HLS tools fail to consider data locality in the scaled-out design; this leads to a long critical path which results in a low operating frequency. In…

First-author paper
Modern FPGA chips feature abundant reconfigurable resources such as LUTs, FFs, BRAMs and DSPs. High-level synthesis (HLS) further advances users productivity in designing accelerators and scaling out the design quickly via fine-grain and coarse-grain pipelining and duplication to utilize on-chip resources. However, current HLS tools fail to consider data locality in the scaled-out design; this leads to a long critical path which results in a low operating frequency. In this paper we summarize the timing degradation problems to four common collective communication and computation patterns in HLS-based accelerator design: scatter, gather, broadcast and reduce. These widely used patterns scale poorly in one-to-all or all-to-one data movements between off-chip communication interface and on-chip storage, or inside the computation logic. Therefore, we propose the Latte microarchitecture featuring pipelined transfer controllers (PTC) along data paths in these patterns. Furthermore, we implement an automated framework to apply our Latte implementation in HLS with minimal user efforts. Our experiments show that Latte-optimized designs greatly improve the timing of baseline HLS designs by 1.50x with only 3.2% LUT overhead on average, and 2.66x with 2.7% overhead at maximum.

Other authors
See publication
ST-Accel: A High-Level Programming Platform for Streaming Applications on FPGA

2018 IEEE International Symposium on Field-Programmable Custom Computing Machines May 1, 2018
In recent years we have witnessed the emergence of the FPGA in many high-performance systems. This is due to FPGA's high reconfigurability and improved user-friendly programming environment. OpenCL, supported by major FPGA vendors, is a high-level programming platform that liberates hardware developers from having to deal with the complex and error-prone HDL development. While OpenCL exposes a GPU-like programming model, which is well-suited for compute-intensive tasks, in many state-of-art…

In recent years we have witnessed the emergence of the FPGA in many high-performance systems. This is due to FPGA's high reconfigurability and improved user-friendly programming environment. OpenCL, supported by major FPGA vendors, is a high-level programming platform that liberates hardware developers from having to deal with the complex and error-prone HDL development. While OpenCL exposes a GPU-like programming model, which is well-suited for compute-intensive tasks, in many state-of-art systems that deploy FPGA, we observe that the workloads are streaming-like, which is communication-intensive. This mismatch leads to low throughput and high end-to-end latency.
In this paper, we propose ST-Accel, a new high-level programming platform for streaming applications on FPGA. It has the following advantages: (i) ST-Accel adopts the multiprocessing programming model to capture the inherent pipeline-level parallelism of streaming applications while reducing the end-to-end latency. (ii) A message-passing-based host/FPGA communication model is used to avoid the coherency issue of shared memory, thus enabling host/FPGA communication during kernel execution. (iii) ST-Accel provides a high-level abstraction for I/O devices to support direct I/O device access that eliminates the overhead of host CPU and reduces the I/O latency. (iv) ST-Accel enables the decoupled access/execute architecture to maximize the utilization of I/O devices. (v) The host/FPGA communication interface is redesigned to cater to the demands of both latency-critical and throughput-critical scenarios. The experimental results on the Amazon AWS cloud and local machine show that ST-Accel can achieve 1.6X-166X throughput and 1/3 latency for typical streaming workloads when compared to OpenCL.

Other authors
See publication
(Best Paper Nominee) Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-Memory Computing Framework

2018 IEEE International Symposium on Performance Analysis of Systems and Software April 1, 2018
First-author paper
In conventional Hadoop MapReduce applications, I/O used to play a heavy role in the overall system performance. More recently, a study from the Apache Spark community—state-of-the-art in-memory cluster computing framework—reports that I/O is no longer the bottleneck and has a marginal performance impact on applications like SQL processing. However, we observe that simply replacing HDDs with SSDs in a Spark cluster can have over 10x performance improvement for certain…

First-author paper
In conventional Hadoop MapReduce applications, I/O used to play a heavy role in the overall system performance. More recently, a study from the Apache Spark community—state-of-the-art in-memory cluster computing framework—reports that I/O is no longer the bottleneck and has a marginal performance impact on applications like SQL processing. However, we observe that simply replacing HDDs with SSDs in a Spark cluster can have over 10x performance improvement for certain stages in large-scale production-quality genome processing. Therefore, one key question arises: How does I/O quantitatively impact the performance of today’s big data applications developed using in-memory cluster computing frameworks like Apache Spark? In this paper we select an important yet complex application—the Spark-based Genome Analysis ToolKit (GATK4)—to guide our modeling. We first use different combinations of HDDs and SSDs to measure the I/O impact on GATK4 and change the CPU core number to discover the relation between computation and I/O access. Combining with Spark underlying implementations, we further analyze the inherent cause of the above observations and build our model based on the analysis. Although building upon GATK4, our model maintains generality to other applications. Experimental results show that we can achieve an performance prediction error rate within 10% for typical Spark applications of both iterative and shuffle-heavy algorithms. Finally, we further extend our model to a broader area - that of optimal configuration selection in the public cloud. In Google Cloud, our model enables us to save 38% to 57% cost for genome sequencing compared with its recommended default configurations. Currently, more and more companies are adopting cloud computing for specific workloads. Our proposed model can have a huge impact on their choices, while also enabling them to significantly reduce their costs.

Other authors
See publication
Bandwidth Optimization Through On-Chip Memory Restructuring for HLS

54th Annual Design Automation Conference June 1, 2017
High-level synthesis (HLS) is getting increasing attention from both academia and industry for high-quality and high-productivity designs. However, when inferring primitive-type arrays in HLS designs into on-chip memory buffers, commercial HLS tools fail to effectively organize FPGAs’ on-chip BRAM building blocks to realize high-bandwidth data communication; this often leads to suboptimal quality of results. This paper addresses this issue via automated on-chip buffer restructuring…

High-level synthesis (HLS) is getting increasing attention from both academia and industry for high-quality and high-productivity designs. However, when inferring primitive-type arrays in HLS designs into on-chip memory buffers, commercial HLS tools fail to effectively organize FPGAs’ on-chip BRAM building blocks to realize high-bandwidth data communication; this often leads to suboptimal quality of results. This paper addresses this issue via automated on-chip buffer restructuring. Specifically, we present three buffer restructuring approaches and develop an analytical model for each approach to capture its impact on performance and resource consumption. With the proposed model, we formulate the process of identifying the optimal design choice into an integer non-linear programming (INLP) problem and demonstrate that it can be solved efficiently with the help of a one-time C-to-HDL(hardware description language) synthesis. The experimental results show that our automated source-to-source code transformation tool improves the performance of a broad class of HLS designs by averagely 4.8x.

Other authors
See publication
Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks

36th International Conference on Computer-Aided Design November 1, 2016
With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware/soft-ware co-designed library to efficiently accelerate the entire CNN on…

With the recent advancement of multilayer convolutional neural networks (CNN), deep learning has achieved amazing success in many areas, especially in visual content understanding and classification. To improve the performance and energy-efficiency of the computation-demanding CNN, the FPGA-based acceleration emerges as one of the most attractive alternatives. In this paper we design and implement Caffeine, a hardware/soft-ware co-designed library to efficiently accelerate the entire CNN on FPGAs. First, we propose a uniformed convolutional matrix-multiplication representation for both computation-intensive con-volutional layers and communication-intensive fully connected (FCN) layers. Second, we design Caffeine with the goal to maximize the underlying FPGA computing and bandwidth resource utilization , with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work. Moreover , we implement Caffeine in the portable high-level synthesis and provide various hardware/software definable parameters for user configurations. Finally, we also integrate Caffeine into the industry-standard software deep learning framework Caffe. We evaluate Caffeine and its integration with Caffe by implementing VGG16 and AlexNet network on multiple FPGA platforms. Caffeine achieves a peak performance of 365 GOPS on Xilinx KU060 FPGA and 636 GOPS on Virtex7 690t FPGA. This is the best published result to our best knowledge. We achieve more than 100x speedup on FCN layers over previous FPGA accelerators. An end-to-end evaluation with Caffe integration shows up to 7.3x and 43.5x performance and energy gains over Caffe on a 12-core Xeon server, and 1.5x better energy-efficiency over the GPU implementation on a medium-sized FPGA (KU060). Performance projections to a system with a high-end FPGA (Virtex7 690t) shows even higher gains.

Other authors
See publication
Energy Efficiency of Full Pipelining: A Case Study for Matrix Multiplication

24th IEEE International Symposium on Field-Programmable Custom Computing Machines May 1, 2016
First-author paper
Customized pipeline designs that minimize the pipeline initiation interval (II) maximize the throughput of FPGA accelerators designed with high-level synthesis (HLS). What is the impact of minimizing II on energy efficiency? Using a matrix-multiply accelerator, we show that matrix multiplies with II>1 can sometimes reduce dynamic energy below II=1 due to interconnect savings, but II=1 always achieves energy close to the minimum. We also identify sources of inefficient…

First-author paper
Customized pipeline designs that minimize the pipeline initiation interval (II) maximize the throughput of FPGA accelerators designed with high-level synthesis (HLS). What is the impact of minimizing II on energy efficiency? Using a matrix-multiply accelerator, we show that matrix multiplies with II>1 can sometimes reduce dynamic energy below II=1 due to interconnect savings, but II=1 always achieves energy close to the minimum. We also identify sources of inefficient mapping in the commercial tool flow.

Other authors
See publication
A Fully Pipelined and Dynamically Composable Architecture of CGRA

2014 FCCM March 11, 2014
Other authors

Courses

Advanced Computer Architecture

CS251A
Algorithms

CS280
Arithmetic Algorithm and Processor

CS252A
Data Science and Data Analytics

CS249
Database Systems

CS143
Design of VLSI Circuits and Systems (section 1)

EE216A
Domain Specific Computing

CS259
Machine Learning Algorithm

CS260
Object-Oriented Programming in C++

CS32
Parallel Computer Architecture

CS251B
Parallel and Distributed Computing

CS133
Programming Languages

CS131
Special Topics in Circuits & Embedded System

EE209AS
Special Topics in Signals & Systems

EE239AS

Projects

Sorting Algorithm in OpenCL on heterogeneous platform: CPU/GPU/FPGA

Jan 2015

Using Opencl, implemented sorting algorithm on heterogeneous platform including
Intel 32 cores Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
Tesla K10 GPU,
Xilinx Zynq 7000 FPGA.
CS-BWAMEM: A Cloud-Scale Sequence Aligner for DNA Sequencing

Jul 2014 - Present
Goal: Create a new tool to handle the ever increasing large-scale human genome data (300+GB per individual)
Duty: The main architect of CS-BWAMEM
Result:
(1) Developed > 80% of the codes on top of Spark and HDFS/Tachyon
(2) Maintain the github repository
Highlights:
(1) 25K code base, 15K Scala, and 2-3K C/C++ libraries;
(2) Parquet: reduce disk I/O overhead;
(3) Bypass the Spark broadcast path to improve software scalability;
(4) Use native execution…

Goal: Create a new tool to handle the ever increasing large-scale human genome data (300+GB per individual)
Duty: The main architect of CS-BWAMEM
Result:
(1) Developed > 80% of the codes on top of Spark and HDFS/Tachyon
(2) Maintain the github repository
Highlights:
(1) 25K code base, 15K Scala, and 2-3K C/C++ libraries;
(2) Parquet: reduce disk I/O overhead;
(3) Bypass the Spark broadcast path to improve software scalability;
(4) Use native execution (C/C++) and hardware accelerators to replace slower Java;
(5) Can finish a task (300GB data) in 300+ cores cluster and provide 12x speedup over the best existing tool

CS-BWAMEM available at: https://github.com/ytchen0323/cloud-scale-bwamem

Other creators
See project
Movie Search Website and Database System

Feb 2014

RDBMS with an implementation of index structure in B+tree. Developed in C++
Designed website on movie database online search system based on MySQL language and PHP
Optimized the search algorithm and improved the performance by 100x-1000x speed up compared to linear search algorithm

See project
Object-Oriented C++ projects

Jan 2014

Maze video game with 7 different characters in C++, optimized the path find algorithm and improved the runtime performance
Designed the database system based on binary tree in C++, optimized the search algorithm
Accelerator-Rich Architectures Exploration

Jun 2013
Goal: Develop a prototyping flow to enable rapid design space exploration for accelerator-rich architectures
Product: ARAPrototyper, including an automated synthesis flow, system software stack, and user APIs
Highlight: Users can evaluate their designs and applications on a real silicon prototype (Xilinx Zynq SoC)

Other creators

Honors & Awards

IEEE Transactions on Computer-Aided Design Donald O. Peder- son Best Paper Award

IEEE Council on EDA

Jun 2019

Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks won Donald O. Pederson Best Paper Award, which is awarded annually to recognize the
best paper published in the IEEE Transactions on CAD in the two calendar years preceding
the award.
UCLA Samueli School of Engineering Outstanding Ph.D. Researcher award

UCLA Samueli School of Engineering

Jun 2019

UCLA recently graduated its first Centennial Class (1919-2019). Congratulations on graduating Ph.D. Peipei Zhou from the VAST Lab for receiving the 2019 Computer Science Department Outstanding Ph.D. Researcher award in the first Centennial Class. Her dissertation title is "Modeling and Optimization for Customized Computing: Performance, Energy and Cost Perspective".
Best Paper Nominee at ICCAD'18

2018 International Conference On Computer Aided Design

Nov 2018

SODA: Stencil with Optimized Dataflow Architecture won "Best Paper Nominee" in 2018 International Conference On Computer Aided Design (ICCAD'18)
Phi Tau Phi Scholarship

Phi Tau Phi Scholastic Honor Society of America

May 2018

The West America Chapter of the Phi Tau Phi Scholastic Honor Society offers four or more awards each year to undergraduate and graduate students in recognition of their academic achievements and scholarly contributions. In addition to accomplished scholars, students who are talented in areas other than academics, such as those who have demonstrated exceptional leadership in society, special talents in fine arts, or strong commitment to Chinese heritage and culture, are encouraged to apply.
Best Paper Nominee at ISPASS'18

2018 IEEE International Symposium on Performance Analysis of Systems and Software

Apr 2018

Doppio: I/O-Aware Performance Analysis, Modeling and Optimization for In-Memory Computing Framework won "Best Paper Nominee" in 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'18)

Languages

English

-
Chinese

-

Recommendations received

4 people have recommended Peipei

Join now to view

More activity by Peipei

Many people I met at the Asian American Pioneer Medal Symposium and Ceremony (AAPM) said this might be the most prestigious event in the history of…

Many people I met at the Asian American Pioneer Medal Symposium and Ceremony (AAPM) said this might be the most prestigious event in the history of…

Liked by Peipei Zhou
Excellent collaboration with HP Research labs!

Excellent collaboration with HP Research labs!

Liked by Peipei Zhou
📢📢 [ICML 2024] You are welcome to check out our ICML paper titled "𝗪𝗵𝗲𝗻 𝗟𝗶𝗻𝗲𝗮𝗿 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗠𝗲𝗲𝘁𝘀 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲…

📢📢 [ICML 2024] You are welcome to check out our ICML paper titled "𝗪𝗵𝗲𝗻 𝗟𝗶𝗻𝗲𝗮𝗿 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗠𝗲𝗲𝘁𝘀 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲…

Liked by Peipei Zhou
Welcome Professor Yu Wang, our new Chair of CIS! With a PhD from Illinois Institute of Technology and over 300 published papers, Professor Wang…

Welcome Professor Yu Wang, our new Chair of CIS! With a PhD from Illinois Institute of Technology and over 300 published papers, Professor Wang…

Liked by Peipei Zhou
Another successful mnisymposium on Modeling and Simulation for Additive Manufacturing at WCCM 2024 in Vancouver, featuring 31 talks across 7…

Another successful mnisymposium on Modeling and Simulation for Additive Manufacturing at WCCM 2024 in Vancouver, featuring 31 talks across 7…

Liked by Peipei Zhou
Incredibly productive space summer. 🛰 🚀 CHEESE is alive! Our partnership with NASA's SSERVI is now official, marking Switzerland as an affiliated…

Incredibly productive space summer. 🛰 🚀 CHEESE is alive! Our partnership with NASA's SSERVI is now official, marking Switzerland as an affiliated…

Liked by Peipei Zhou
Thanks USC Molinaroli College of Engineering and Computing for featuring our project and National Science Foundation (NSF) for supporting iCAS…

Thanks USC Molinaroli College of Engineering and Computing for featuring our project and National Science Foundation (NSF) for supporting iCAS…

Liked by Peipei Zhou
Proud advisor moment: My latest PhD graduate, Dr. Shehzeen Hussain, has received the 2024 Best Dissertation Award in UCSD ECE. Shehzeen’s outstanding…

Proud advisor moment: My latest PhD graduate, Dr. Shehzeen Hussain, has received the 2024 Best Dissertation Award in UCSD ECE. Shehzeen’s outstanding…

Liked by Peipei Zhou
Our VP of Technology Strategy, Brandon Wang, brought Synopsys innovation center stage with his recent IEEE symposium keynote focused on the era of…

Our VP of Technology Strategy, Brandon Wang, brought Synopsys innovation center stage with his recent IEEE symposium keynote focused on the era of…

Liked by Peipei Zhou

View Peipei’s full profile

See who you know in common
Get introduced
Contact Peipei directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Peipei Zhou

182 others named Peipei Zhou are on LinkedIn

See others named Peipei Zhou

Add new skills with these courses

See all courses

About

Activity

Such an intellectually stimulating, thought provoking weekend at the inaugural Asian American Pioneer Medal (#AAPM) symposium at Stanford University.…

Liked by Peipei Zhou

We're #UMNProud of incoming associate professor Caiwen Ding for winning the National Science Foundation (NSF) Faculty Early Career Development…

Liked by Peipei Zhou

Last Friday, the day before the Asian American Pioneer Medal Symposium and Ceremony (AAPM), in the Y2E2 building at Stanford, AASF held a small-scale…

Liked by Peipei Zhou

Experience & Education

University of Pittsburgh

******-***** ********* *********

View Peipei’s full experience

See their title, tenure and more.

Publications

Association for Computing Machinery February 12, 2023

UCLA Electronic Theses and Dissertations August 5, 2019

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Dec 2018

2018 International Conference On Computer Aided Design November 1, 2018

2018 IEEE International Symposium on Field-Programmable Custom Computing Machines May 1, 2018

2018 IEEE International Symposium on Field-Programmable Custom Computing Machines May 1, 2018

2018 IEEE International Symposium on Performance Analysis of Systems and Software April 1, 2018

54th Annual Design Automation Conference June 1, 2017

36th International Conference on Computer-Aided Design November 1, 2016

24th IEEE International Symposium on Field-Programmable Custom Computing Machines May 1, 2016

A Fully Pipelined and Dynamically Composable Architecture of CGRA

2014 FCCM March 11, 2014

Courses

Advanced Computer Architecture

CS251A

Algorithms

CS280

Arithmetic Algorithm and Processor

CS252A

Data Science and Data Analytics

CS249

Database Systems

CS143

Design of VLSI Circuits and Systems (section 1)

EE216A

Domain Specific Computing

CS259

Machine Learning Algorithm

CS260

Object-Oriented Programming in C++

CS32

Parallel Computer Architecture

CS251B

Parallel and Distributed Computing

CS133

Programming Languages

CS131

Special Topics in Circuits & Embedded System

EE209AS

Special Topics in Signals & Systems

EE239AS

Projects

Sorting Algorithm in OpenCL on heterogeneous platform: CPU/GPU/FPGA

Jan 2015

Jul 2014 - Present

Feb 2014

Object-Oriented C++ projects

Jan 2014

Accelerator-Rich Architectures Exploration

Jun 2013

Honors & Awards

IEEE Transactions on Computer-Aided Design Donald O. Peder- son Best Paper Award

IEEE Council on EDA

UCLA Samueli School of Engineering Outstanding Ph.D. Researcher award

UCLA Samueli School of Engineering

Best Paper Nominee at ICCAD'18

2018 International Conference On Computer Aided Design

Phi Tau Phi Scholarship

Phi Tau Phi Scholastic Honor Society of America

Best Paper Nominee at ISPASS'18

2018 IEEE International Symposium on Performance Analysis of Systems and Software

Languages

English

-

Chinese