Hot Chips Tutorial, Part-I:RISC-V Overview and ISA DesignKrste AsanovicProf. EECS, UC Berkeley;Chairman of the Board,RISC-V Foundation;Co-Founder and Chief Architect,SiFive Inc.Stanford, CAAugust 18, [email protected] v

Why Instruction Set Architecture matters Why can’t Intel sell mobile chips?- 99% of mobile phones/tablets based on ARM v7/v8 ISA Why can’t ARM partners sell servers?- 99% of laptops/desktops/servers based on AMD64 ISA(over 95% built by Intel) How can IBM still sell mainframes?- IBM 360, oldest surviving ISA (50 years)ISA is most important interface in computer systemwhere software meets hardware

Open Interfaces Work for Software!FieldNetworkingOpen StandardEthernet, TCP/IPFree, Open Implement.ManyProprietary Implement.ManyOSCompilersDatabasesPosixCSQLLinux, FreeBSDgcc, LLVMMySQL, PostgresSQLM/S WindowsIntel icc, ARMccOracle 12C, M/S DB2GraphicsISAOpenGLMesa3D-----------M/S DirectXx86, ARM, IBM360? Why not successful free & open standards and free & openimplementations, like other fields?3

Companies and their ISAs Come and GoProprietary ISA fortunes tied to business fortunes and whims Digital Equipment Corporation- PDP-11, VAX, Alpha Intel- i960, i860, Itanium MIPS- Sold to Imagination, then bought by Wave AI startup, now opening R6? SPARC- Was opened by Sun, acquired by Oracle, now closed down ARM- Sold to Softbank at 40% premium- Now 25% sold off to Abu Dhabi investment fund4

Today, many ISAs on one SoC Applications processor (usually ARM)Graphics processorsImage processorsRadio DSPsAudio DSPsSecurity processorsPower-management processor dozen ISAs on some SoCs – each with uniquesoftware stackWhy? Apps processor ISA too big, inflexible for accelerators IP bought from different places, each proprietary ISA Engineers build home-grown ISA coresNVIDIA Tegra SoC5

Do we need all these different ISAs?Must they be proprietary?Must they keep disappearing?What if there was one stable free and open ISAeveryone could use for everything?6

RISC-V Background In 2010, after many years and many research projects using MIPS, SPARC, and x86,time for architecture group at UC Berkeley to choose ISA for next set of projects Obvious choices: x86 and ARM- x86 impossible – too complex, IP issues- ARM mostly impossible – complex, no 64-bit in 2010, IP issues So we started “3-month project” during summer 2010 to develop clean-slate ISA- Principal designers: Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovic Four years later, May 2014, released frozen base user spec- many tapeouts and several research publications along the way Name RISC-V (pronounced “risk-five”) represents fifth major Berkeley RISC ISARISC-I(1981)RISC-II(1983)SOAR akaRISC-III(1984)SPUR akaRISC-IV(1988)First RISC-V(Raven-1,28nm FDSOI,2011)7

Hot Chips 20148

RISC-V Foundation (2015- ) RISC-V is the open-source hardwareInstruction SetArchitecture (ISA) Frozen base user specreleased in 2014,contributed, ratified,and openly publishedby the RISC-VFoundationThe RISC - V Fo unda tio n is a no n- pro fit e ntityse rving m e m b e rs a nd the industryO ur m issio n is to a c c e le ra te RISC -Va d o p tio n with sha re d b e ne fit to thee ntire c o m m unity o f sta ke ho ld e rs. Drive progression of ratified specs, compliance suite, andother technical deliverables Grow the overall ecosystem / membership, promotingdiversity while preventing fragmentation Deepen community engagement and visibilityRISC-VFoundationRISC-V Found a tio n

Mo re tha n 250 RISC -V Me m b e rs in 28C o untrie s Aro und the Wo rldRISC-V Foundation Growth HistorySeptember 2015 to May 201930027525022520017513 Universities15029 Consulting; Research12523 Development Tools; SW and Cloud100104 Individual RISC-V developers and advocates7551 Machine Learning/AI; Commercial Chip Vendors; FPGA;Broad Market; Networking; Application Processors, Graphics45 Semiconductor IP; IP and Design Services; Foundry Services50250Q3Q4Q1Q2Q3Q4Q1Q2Q3Q4Q1Q2Q3Q4Q1Q22015 2015 2016 2016 2016 2016 2017 2017 2017 2017 2018 2018 2018 2018 2019 2019May 201910

Calista Redmond, CEO, RISC-V FoundationPreviously, Vice-President ofIBM Z Ecosystem division;President of OpenPOWERFoundation.12

Programs increase member value engagementTechnical Deliverables Guard against fragmentation Manage and progress technicaldeliverables through work groups anddevelopment team Process and initiate technical workgroups Develop and manage member sandboxportalLearning TalentCompliance CertificationVisibility Develop self serve testing andcompliance certification suite Drive constant drumbeat of memberand foundation visibility throughmultiple media Provide visibility to additionalcompliance certification andverification options Engage in industry events and hostFoundation events Cultivate strategic visibility throughindustry forums, analysts, and mediaAdvocacy OutreachMarketplace Develop multi-level learning modules Establish technical advocate program Connect universities, professors, andcourse material Engage geographic and domainspecific engineers via advocate-ledformal and informal opportunitiesProvide online marketplace ofproviders and products Offer RFP matching to members Develop badge and skill certification Match talent via online and eventforums Establish alliances with otherorganizations13

RISC-V EcosystemSoftwareOpen-source software:Gcc, binutils, glibc, Linux, BSD,LLVM, QEMU, FreeRTOS,ZephyrOS, LiteOS, SylixOS, FoundationHardwareCommercial software:Lauterbach, Segger, IAR,Micrium, ExpressLogic, Ashling,Imperas, AntMicro, ISA specification Golden Model ComplianceOpen-source cores:Rocket, BOOM, RI5CY,Ariane, PicoRV32, Piccolo,SCR1, Swerv, Hummingbird, Commercial core providers:Andes, Bluespec, Cloudbear,Codasip, Cortus, C-Sky,Nuclei, SiFive, Syntacore, Inhouse cores:Nvidia, WDC, Alibaba, others14

Why is RISC-V so popular? Engineers sometimes “don’t see forest for the trees” The movement is not happening because somebenchmark ran 10% faster, or some implementation was30% lower power The movement is happening because new businessmodel changes everything- Pick ISA first, then pick vendor or build own core- Add your own extension without getting permission Implementation features/PPA will follow- Whatever is broken/missing in RISC-V will get fixed15

Modest RISC-V Project GoalBecome the industry-standard ISA for allcomputing devices16

Industry Adoption Status Large companies adopting RISC-V for deeplyembedded controllers in their SoCs (“minion cores”)- 2016 NVIDIA announced all future GPUs will use RISC-V- 2017 Western Digital announced transition of all billioncores/year to RISC-V- Others waiting in the wingsCTOs across entire worldwide value chain of ICsuppliers, system providers, service providers, areevaluating RISC-V strategies17

RISC-V: An Everyday Design Choice For embedded/IoT, RISC-V is already strongcompetitor, and other areas adopting RISC-V also Production ramp starting, expect “millions” of SoCs toship with RISC-V cores in 2019 SiFive announced 100 RISC-V IP design wins Andes announced 21 wins in 2018, 60 in 2019 Message: You won’t get fired for choosing RISC-V!18

Replacing 2nd-tier ISAs Smaller proprietary-ISA soft-core IP companiesswitching to RISC-V standard to access larger market:-AndesCodasipCortusC-Skyothers to announceIf you’re a softcore IP provider,you should have a RISC-V product in development19

Startups Many startups choosing RISC-V for new products Esperanto announced 4,096-core 7nm RISC-V chip,with high-end OoO cores Fadu SSD controller announcement Kendryte AI microcontroller, 3 chip with two RISC-Vcores from open-source Rocket codebase Most are stealthy so will not be visible for a whileWe haven’t had to tell startups about RISC-V; they findout pretty quickly when shopping for processor IP20

Commercial Ecosystem Providers Mainstream commercial ecosystem support rapidlyappearing- Lauterbach, Micrium, Segger, IAR, Express Logic, Imperas,UltraSOC, AntMicro, Demand driving supply in commercial ecosystem21

Government Adoption India has adopted RISC-VUS DARPA mandated RISC-V in recent security call for proposalsIsrael Innovation Authority creating GenPro incubator around RISC-VShanghai Municipal Govt supporting RISC-V companiesOther governments at various stages of investigationIf your country wishes to control security of its owninformation infrastructure, and promote its indigenoussemiconductor industry, support RISC-V22

RISC-V in EducationBooks available now!In multiple languagesRISC-V spreading quickly throughoutcurricula of top schools23

RISC-V: Completing the Innovation CycleResearchOpen ecosystem is keyto keeping the virtuouscycle goingIndustryEducation24

RISC-V ISA Tutorial25

What’s Different about RISC-V? Simple- Far smaller than other commercial ISAs Clean-slate design- Clear separation between user and privileged ISA- Avoids µarchitecture or technology-dependent features Modular ISA designed for extensibility/specialization- Small standard base ISA, with multiple standard extensions- Sparse &variable-length instruction encoding for vast opcode space Stable- Base and first standard extensions are frozen- Additions via optional extensions, not new versions Community designed- Developed with leading industry/academic experts and software developers26

RISC-V Base Plus Standard Extensions Four base integer ISAs- RV32E, RV32I, RV64I, RV128I- RV32E is 16-register subset of RV32I- Only 50 hardware instructions needed for base Standard extensions- M: Integer multiply/divide- A: Atomic memory operations (AMOs LR/SC)- F: Single-precision floating-point- D: Double-precision floating-point- G IMAFD, “General-purpose” ISA- Q: Quad-precision floating-point Above use standard RISC encoding in fixed 32-bit instruction word Frozen in 2014, ratified 2019, supported forever after27

RISC-V ISA String Conventions RV32I- 32-bit address space, only basic integer instructions RV64IMAFDC (aka RV64GC)- 64-bit address space with integer multiply/divide, atomics, single anddouble precision floating-point and compressed- This is what current standard Linux distributions assume RV32EC (RV32E not ratified yet)- 32-bit address space with 16 integer registers and basic integeroperations and compressed instructions RV128IMAFDQC (RV128 not ratified yet)- 128-bit address space with atomics, single/double/FP, and compressedinstructions28

RISC-V Processor Unprivileged State XLEN address width (32,64,128) XLEN-bit program counter (pc) 32 XLEN-bit integer registers (x0-x31) x0 always 0 RV32E variant has 16 registers (x0-x15) Optional 32 IEEE floating-point (FP)registers (f0-f31) FLEN floating-point width (extensionsF 32,D 64,Q 128) FP status register (fcsr), used for FProunding mode & exception reporting

RISC-V Standard Base ISA Details 32-bit fixed-width, naturally aligned instructions31 integer registers x1-x31, plus x0 zero registerrd/rs1/rs2 in fixed location, no implicit registersImmediate field always sign-extended (from instr[31])Floating-point adds f0-f31 registers plus FP CSR, also fusedmul-add four-register format Designed to support PIC and dynamic linking30

RV32I BaseUnprivilegedInstructions3131

“M” Integer Multiply-Divide Extension MUL returns lower XLEN of 2*XLEN multiply productMULH returns upper XLEN bits of signed productMULHU returns upper XLEN bits of unsigned productMULHSU returns upper XLEN bits of signed*unsigned productImplementation can fuse MUL MULH{S}{U} for single microarch multiply

RISC-V Memory Model RISC-V has a base weak memory model (RVWMO)- Multi-copy atomic- store becomes visible to all other threads at same point- Similar to revised ARM v8 memory model Optional TSO extension defined (RVTSO)- Strictly upwards-compatible with RVWMO- Similar to x86 memory model Complete axiomatic and operational formal modelsavailable33

“A”: Atomic Operations ExtensionTwo classes: Atomic Memory Operations (AMO)- Fetch-and-op,op ADD,OR,XOR,MAX,MIN,MAXU,MINU Load–Reserved/Store Conditional- With forward-progress guarantee for short sequences All atomic operations can be annotated with two bits(Acquire/Release) to implement release consistency orsequential consistency34

Floating-Point Extensions “F”,”D”,”Q” FP extensions add set of 32 FP registers f0-f31, width is FLEN- F 32-bit single-precision IEEE FP (FLEN 32)- D 64-bit double-precision IEEE FP (FLEN 64)- Q 128-bit quad-precision IEEE FP (FLEN 128)- Q implies D, D implies F Non-destructive fused multiply-adds supported- New instruction format with three sources and one destination Narrower FP results are “NaN-boxed” to wider FP regs- Result 1-extended to full FLEN width to avoid implementation-definedbehavior, e.g., on RV64ID system, 32-bit FP result widened to FLEN 64by filling upper 32 bits with all “1”s.- Narrower results treated as NaN if incorrectly used as source to widerFP instruction

Variable-Length Encoding Extensions can use any multiple of 16 bits as instruction length Branches/Jumps target 16-bit boundaries even in fixed 32-bit base- Consumes 1 extra bit of jump/branch address36

“C”: Compressed Instruction Extension Compressed code important for:- low-end embedded to savestatic code space- high-end commercial workloadsto reduce cache footprint C extension adds 16-bit compressed instructions- 2-address forms with all 32 registers- 2-address forms with most frequent 8 registers 1 compressed instruction expands to 1 base instruction Assembly lang. programmer & compiler oblivious Assembler and linker perform compression in current tool chains) RVC RVI decoder only 700 gates ( 2% of small core) All original 32-bit instructions retain encoding but now can be 16-bit aligned 50%-60% instructions compress 25%-30% smaller37

RV32I / RV64I / RV128I M, A, F, D, Q, C 14Privileged 8 for M 34for F, D, Q 46 for C 11 for A38

RV32I / RV64I / RV128I M, A, F, D, Q, C 12for 64I/128I 4 for64M/128M 11 for64A/128A 6 for64{F D Q}/128{F D Q}39

RV32I / RV64I / RV128I M, A, F, D, Q, CRISC-V “Green Card”40

Simplicity breeds Contempt How can simple ISA compete with industry monsters? How do measure ISA quality?- Static code bytes for program- Dynamic code bytes fetched for execution- Microarchitectural work generated for execution41

SPECint2006 compressed code sizewith save/restore optimization(relative to “standard” RVC)64-bit Address32-bit Address173%180%140%140%126%136%141%140%126%131% %80%RV64CRV64X86-64 RISC-V now smallest ISA for 32- and 64-bit addresses All results with same GCC compiler and optionsARMv8 MIPS6442

Dynamic Bytes Fetched RV64GC is lowest overall in dynamic bytes fetched Despite current lack of support for vector operations43

Converting Instructions to MicroopsMicroops are measure of microarchitectural work performedMultiple microinstructions from one macroinstructionOr one microinstruction from multiple macroinstructions44

RISC-V Macro-Op Fusion Examples “Load effective address LEA” &(array[offset])slli rd, rs1, {1,2,3}add rd, rd, rs2 “indexed load” M[rs1 rs2]add rd, rs1, rs2ld rd, 0(rd) “clear upper word” // rd rs1 & 0xffff ffffslli rd, rs1, 32srli rd, rd, 32 Can all be fused simply in decode stage- Many are expressible with 2-byte compressed instructions, so effectively justadds new 4-byte instructions RISC-V approach: use macroop fusion, don’t grow ISA45

RISC-V Competitive µarch Effort after Fusion[Details in UCB2016 TR and 4thRISC-V workshoptalk by Chris Celio]46

Dave Ditzel, EsperantoRISC-V wasn’t even on the shopping list of alternatives, but themore Esperanto’s engineers looked at it, the more they realizedit was more than a toy or just a teaching tool. “We assumed thatRISC-V would probably lose 30% to 40% in compiler efficiency[versus Arm or MIPS or SPARC] because it’s so simple,” saysDitzel. “But our compiler guys benchmarked it, and darned if itwasn’t within 1%.”[Article by Jim Turley, EE Journal, December 13, 2017]47

Fragmentation versus DiversityFragmentation:Same thing done different waysDiversity:Solving different problems48

RISC-V Encoding TerminologyStandard: defined by the FoundationReserved: Foundation might eventually use thisspace for future standard extensionsCustom: Space for implementer-specificextensions, never claimed by Foundation49

RISC-V Custom Extension brariesStandard RV32IMAFDSoftware50

RISC-V Custom Extension Example 2StandardRV32IMASoftwareANon-Conforming ExtensionCustomCustomMReservedBaseRV32ICustom SW Libraries51

RISC-V Privileged Architecture Three privilege modes- User (U-mode)- Supervisor (S-mode)- Machine (M-mode) Supported combinations of modes:-M(simple embedded systems)- M, U(embedded systems with protection)- M, S, U(systems running Unix-style operating systems) Hypervisors run in modified S mode (HS) (in progress)- Prioritizes support for Type-2 Hypervisors like KVM- Can also support Type-1 Hypervisors in same model52

Simple Embedded Systems(M-mode only) No address translation/protection- “Mbare” bare-metal mode- Trap bad physical addresses precisely All code inherently trusted Low implementation cost- 27 bits of architectural state (in addition to user ISA)- 27 more bits for timers- 27 more for basic performance counters53

Secure Embedded Systems(M, U modes) M-mode runs secure boot and runtime monitorEmbedded code runs in U-modePhysical memory protection (PMP) on U-mode accessesInterrupt handling can be delegated to U-mode code- User-level interrupt support Provides arbitrary number of isolated subsystems Ongoing work to define trusted execution environmentsDevice 1InterruptsOtherInterruptsU-modeU-modeprocess 2process 1PMPPMPM-mode monitorDevice 2Interrupts54

Virtual Memory Architectures(M, S, U modes) Designed to support current Unix-style operating systems Sv32 (RV32)- Demand-paged 32-bit virtual-address spaces- 2-level page table- 4 KiB pages, 4 MiB megapages Sv39 (RV64)- Demand-paged 39-bit virtual-address spaces- 3-level page table- 4 KiB pages, 2 MiB megapages, 1 GiB gigapages Sv48, Sv57, Sv64 (RV64)- Sv39 1/2/3 more page-table levels55

S-Mode runs on top of M-mode M-mode runs secure boot and monitor S-mode runs OS U-mode runs application on top of OS or M-modeU-modeappDevice 1InterruptsOtherInterruptsU-modesystem processS-modeOSPMPPMPDevice 2InterruptsM-mode monitor56

Hypervisor Supports Type-2 hypervisors (e.g., KVM) as well asType-1 hypervisor (e.g., Xen) Current draft spec implemented in QEMU with earlyKVM port Supports recursive virtualization57

RISC-V and SecuritySecurity is one of biggest challenges in contemporarycomputer architecture, so which to trust? Simple free ISA with open implementations andpublicly scrutinized security systems Complex proprietary ISAs with NDA-only securitysystemsRISC-V already the center of security architectureresearch Small set of hardware primitives support everythingfrom embedded security to remote cloud enclaves58

Foundation ISA Standards Development Unprivileged base and initial extensions now ratified- RV32IMFDC, RV64IMFDC- ”A” extension has one minor issue to resolve (LR/SC progress)- User ISA stable since 2014 release Run/halt debug spec ratifiedMemory model ratifiedPrivileged spec 1.11 ratifiedFormal model available (SAIL)Vector specification 0.7.1 and tools released June 2019- Largest single extension to date- Target of advanced implementation work Other new ISA modules in advanced development:- Fast interrupts, DSP, Bit manipulation, Hypervisor, Member-driven ISA roadmap59

RISC-V Vector Extension OverviewvlVector length CSR sets number ofelements active in each instructionvtypeVtype sets width of elementin each vector register (e.g.,32-bit, ]v1[VLMAX-1]32 vectorregistersV0[VLMAX-1] Unit-stride, strided, scatter-gather, structure load/storeinstructions Rich set of integer, fixed-point, and floating-point instructions Vector-vector, vector-scalar, and vector-immediate instructions Multiple vector registers can be combined to form longer vectorsto reduce instruction bandwidth or support mixed-precisionoperations (e.g., 16b*16b- 32b multiply-accumulate) Designed for extension with custom datatypes and widthsMaximum vector length (VLMAX)depends on implementation,number of vector registers used,and type of each element.

Join and Engage!Drive technical priorities in 20 focus areasOpcode Space Mgmt Standing CommitteeSoftware Standing CommitteeV Extension (Vector Ops) Task GroupCryptographic Extension Task GroupBase ISA Ratification Task GroupDebug Specification Task GroupPrivileged ISA Spec Task GroupFast Interrupts Spec Task GroupUNIX-Class Platform Spec Task GroupMemory Model Spec Task GroupFormal Specification Task GroupTrusted Execution Env Spec Task GroupProcessor Trace Spec Task GroupCompliance Task GroupB Extension (Bit Manipulation) Task GroupJ Extension (Dynam. Translated Lang) Task GroupP Extension (Packed-SIMD Inst) Task Group Security Committee andnew Safety Task GroupRISC -V Fo und a tio n

Aug 18, 2019 · Co-Founder and Chief Architect, SiFive Inc. Stanford, CA August 18, 2019 @risc_v Hot Chips