https://riscv.org/specifications/にある
User-Level ISA Specification v2.1を貧弱な英語力で解釈


==========================
RV32I Base Instruction Set
==========================
レジスタは32個ある
1番目のレジスタは、ゼロ・レジスタである
このレジスタは、読んでも書いても常に内容は、ゼロである
残りの2番目から32番目は、整数値を保持する汎用レジスタである
この文書ではXLENという語を使うが、これはレジスタのビット幅を指す(RV32なら32、RV64なら64)
これに加えて、もうひとつ、現在の命令アドレスを保持するPC（プログラム・カウンター）レジスタがある


==================
基本命令コード書式
==================
命令コードはRV32とRV64のどちらでも、32ビットである
命令コードは４バイト境界に揃えられなければならない


<-上位 - 下位->
上段の数字は使用ビット数
・opcodeはオペコード。命令の大まかな種類を表す
・funct3は3ビット機能値。命令の細かな種類を表す
・funct7は7ビット機能値。命令の細かな種類を表す
・rdはデスティネーション・レジスタ指定
・rs1は第一ソース・レジスタ指定
・rs2は第二ソース・レジスタ指定
・immは即値。括弧内の数字は即値の対応ビットの範囲を表す
  imm[11:0]であれば、即値の0番ビットから11番ビットまでの12ビットに当たることを表す


R-type
+--------+-----+-----+--------+----+--------+
|   7    |  5  |  5  |    3   |  5 |    7   |
+--------+-----+-----+--------+----+--------+
| funct7 | rs2 | rs1 | funct3 | rd | opcode |
+--------+-----+-----+--------+----+--------+

I-type
+-----------+-----+--------+----+--------+
|     12    |  5  |   3    |  5 |   7    |
+-----------+-----+--------+----+--------+
| imm[11:0] | rs1 | funct3 | rd | opcode |
+-----------+-----+--------+----+--------+

S-type
+-----------+-----+-----+--------+----------+--------+
|     7     |  5  |  5  |    3   |     5    |    7   |
+-----------+-----+-----+--------+----------+--------+
| imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode |
+-----------+-----+-----+--------+----------+--------+

SB-type
+---------+-----------+-----+-----+--------+----------+---------+--------+
|    1    |     6     |  5  |  5  |   3    |     4    |    1    |   7    |
+---------+-----------+-----+-----+--------+----------+---------+--------+
| imm[12] | imm[10:5] | rs2 | rs1 | funct3 | imm[4:1] | imm[11] | opcode |
+---------+-----------+-----+-----+--------+----------+---------+--------+

U-type
+------------+----+--------+
|     20     |  5 |    7   |
+------------+-------------+
| imm[31:12] | rd | opcode |
+-----   ----+----+--------+

UJ-type
+---------+-----------+---------+------------+----+--------+
|    1    |     10    |    1    |      8     |  5 |   7    |
+---------+-----------+---------+------------+----+--------+
| imm[20] | imm[10:1] | imm[11] | imm[19:12] | rd | opcode |
+---------+-----------+---------+------------+----+--------+


================
即値の種類と構成
================
即値はどれも符号付きである
即値を持つ命令コードの最も左のビットが、符号ビットである

*テーブル下に、RV32でのデコード例をC++14で示す*


<-上位 - 下位->
I型即値
+--------------+-------------+-------------+----------+
|      1       |      6      |      4      |    1     |
+--------------+-------------+-------------+----------+
| — inst[31] — | inst[30:25] | inst[24:21] | inst[20] |
+--------------+-------------+-------------+----------+
{
  uint32_t  imm = 0;
//                 28   24   20   16   12    8    4    0
  imm |= (inst&0b0000'0000'0001'0000'0000'0000'0000'0000)>>20;
  imm |= (inst&0b0000'0001'1110'0000'0000'0000'0000'0000)>>20;
  imm |= (inst&0b0111'1110'0000'0000'0000'0000'0000'0000)>>20;

    if(inst&0b1000'0000'0000'0000'0000'0000'0000'0000)
    {
      imm |= 0b1111'1111'1111'1111'1111'1000'0000'0000;
    }


  return static_cast<int32_t>(imm);
}


S型即値
+--------------+-------------+------------+---------+
|      1       |      6      |      4     |    1    |
+--------------+-------------+------------+---------+
| — inst[31] — | inst[30:25] | inst[11:8] | inst[7] |
+--------------+-------------+------------+---------+
{
  uint32_t  imm = 0;
//                 28   24   20   16   12    8    4    0
  imm |= (inst&0b0000'0000'0000'0000'0000'0000'1000'0000)>> 7;
  imm |= (inst&0b0000'0000'0000'0000'0000'1111'0000'0000)>> 7;
  imm |= (inst&0b0111'1110'0000'0000'0000'0000'0000'0000)>>20;

    if(inst&0b1000'0000'0000'0000'0000'0000'0000'0000)
    {
      imm |= 0b1111'1111'1111'1111'1111'1000'0000'0000;
    }


  return static_cast<int32_t>(imm);
}


B型即値
+--------------+---------+-------------+------------+---+
|      1       |    1    |      6      |     4      | 1 |
+--------------+---------+-------------+------------+---+
| — inst[31] — | inst[7] | inst[30:25] | inst[11:8] | 0 |
+--------------+---------+-------------+------------+---+
{
  uint32_t  imm = 0;
//                 28   24   20   16   12    8    4    0
  imm |= (inst&0b0000'0000'0000'0000'0000'1111'0000'0000)>> 7;
  imm |= (inst&0b0111'1110'0000'0000'0000'0000'0000'0000)>>20;
  imm |= (inst&0b0000'0000'0000'0000'0000'0000'1000'0000)<< 4;

    if(inst&0b1000'0000'0000'0000'0000'0000'0000'0000)
    {
      imm |= 0b1111'1111'1111'1111'1111'0000'0000'0000;
    }


  return static_cast<int32_t>(imm);
}


U型即値
+----------+-------------+-------------+-------+
|     1    |     11      |      8      |  12   |
+----------+-------------+-------------+-------+
| inst[31] | inst[30:20] | inst[19:12] | — 0 — |
+----------+-------------+-------------+-------+
{
  uint32_t  imm = 0;
//                 28   24   20   16   12    8    4    0
  imm |= (inst&0b0000'0000'0000'1111'1111'0000'0000'0000);
  imm |= (inst&0b0111'1111'1111'0000'0000'0000'0000'0000);
  imm |= (inst&0b1000'0000'0000'0000'0000'0000'0000'0000);


  return static_cast<int32_t>(imm);
}


J型即値
+--------------+-------------+----------+-------------+-------------+---+
|      1       |      8      |     1    |      6      |      4      | 1 |
+--------------+-------------+----------+-------------+-------------+---+
| — inst[31] — | inst[19:12] | inst[20] | inst[30:25] | inst[24:21] | 0 |
+--------------+-------------+----------+-------------+-------------+---+
{
//
  uint32_t  imm = 0;
//                 28   24   20   16   12    8    4    0
  imm |= (inst&0b0000'0001'1110'0000'0000'0000'0000'0000)>>20;
  imm |= (inst&0b0111'1110'0000'0000'0000'0000'0000'0000)>>20;
  imm |= (inst&0b0000'0000'0001'0000'0000'0000'0000'0000)>> 9;
  imm |= (inst&0b0000'0000'0000'1111'1111'0000'0000'0000)    ;

    if(inst&0b1000'0000'0000'0000'0000'0000'0000'0000)
    {
      imm |= 0b1111'1111'1110'0000'0000'0000'0000'0000;
    }


  return static_cast<int32_t>(imm);
}


============
整数計算命令
============
多くの整数計算命令は、整数レジスタに収まったXLENビットの値を演算します
整数計算命令は、レジスタ・即値演算はI型形式、レジスタ・レジスタ演算はR型形式にエンコードされます
レジスタ・即値演算とレジスタ・レジスタ演算のどちらも、格納先はrdレジスタです
  Most integer computational instructions operate on XLEN bits of values held in the integer register
  file. Integer computational instructions are either encoded as register-immediate operations using
  the I-type format or as register-register operations using the R-type format. The destination is
  register rd for both register-immediate and register-register instructions. No integer computational
  instructions cause arithmetic exceptions.


------------------
レジスタ・即値命令
------------------
ADDIは符号拡張した12ビット即値をrs1レジスタに加算します
算術オーバーフローは無視され、結果は単純にXLENビットになります
SLTI (set less than immediate)は、両方を符号付き数として扱い、rs1レジスタの内容が符号拡張した即値より小さければ、
1をそうでなければ０をrdレジスタに格納します 
SLTIUは値を符号無し数として比較します (即値はまず符号拡張されてから符号無し数となります).
ANDI, ORI, XORIはです are logical operations that perform bitwise  on
rs1レジスタの内容と符号拡張した12ビット即値とでAND,OR,XORビット論理演算します。結果はrdレジスタに入れられます
  ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and
  the result is simply the low XLEN bits of the result. ADDI rd, rs1, 0 is used to implement the MV
  rd, rs1 assembler pseudo-instruction.
  SLTI (set less than immediate) places the value 1 in register rd if register rs1 is less than the signextended
  immediate when both are treated as signed numbers, else 0 is written to rd. SLTIU is
  similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to
  XLEN bits then treated as an unsigned number). Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals
  zero, otherwise sets rd to 0 (assembler pseudo-op SEQZ rd, rs).
  ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1
  and the sign-extended 12-bit immediate and place the result in rd. Note, XORI rd, rs1, -1 performs
  a bitwise logical inversion of register rs1 (assembler pseudo-instruction NOT rd, rs).


I型命令書式
      +--------------+-------+--------+-------+---------+
      |       imm    |  rs1  | funct3 |   rd  |  opcode |
      +--------------+-------+--------+-------+---------+
 ADDI | bbbbbbbbbbbb | bbbbb |   000  | bbbbb | 0010011 |
 SLTI | bbbbbbbbbbbb | bbbbb |   010  | bbbbb | 0010011 |
SLTIU | bbbbbbbbbbbb | bbbbb |   011  | bbbbb | 0010011 |
 XORI | bbbbbbbbbbbb | bbbbb |   100  | bbbbb | 0010011 |
  ORI | bbbbbbbbbbbb | bbbbb |   110  | bbbbb | 0010011 |
 ANDI | bbbbbbbbbbbb | bbbbb |   111  | bbbbb | 0010011 |
      +--------------+-------+--------+-------+---------+

      +---------+-------+-------+--------+-------+---------+
      |   imm   | shamt |  rs1  | funct3 |   rd  |  opcode |
      +---------+-------+-------+--------+-------+---------+
 SLLI | 0000000 | bbbbb | bbbbb |   001  | bbbbb | 0010011 |
 SRLI | 0000000 | bbbbb | bbbbb |   101  | bbbbb | 0010011 |
 SRAI | 0100000 | bbbbb | bbbbb |   101  | bbbbb | 0010011 |
      +---------+-------+-------+--------+-------+---------+
I型形式で指定された定数でシフトを掛けます
シフト量はI即値フィールドの下位5ビットで、rs1レジスタの内容がシフトされます
SLLIは論理左シフト(空いたビットはゼロで埋められる)
SRLIは右論理シフト(空いたビットはゼロで埋められる）
SRAIは算術右シフト(元の符号ビットが空いたビットに写る)
  Shifts by a constant are encoded as a specialization of the I-type format. The operand to be shifted
  is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field. The right
  shift type is encoded in a high bit of the I-immediate. SLLI is a logical left shift (zeros are shifted
  into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI
  is an arithmetic right shift (the original sign bit is copied into the vacated upper bits).


U型命令書式
      +----------------------+-------+---------+
      |          imm         |  rd   |  opcode |
      +----------------------+-------+---------+
  LUI | bbbbbbbbbbbbbbbbbbbb | bbbbb | 0110111 |
AUIPC | bbbbbbbbbbbbbbbbbbbb | bbbbb | 0010111 |
      +----------------------+-------+---------+

LUI (load upper immediate)は32ビット定数を作るのに使われる
rdの上位20ビットに符号無し即値iを入れ、下位12ビットはゼロで埋められる
  LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format. LUI
  places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest
  12 bits with zeros.

AUIPC (add upper immediate to pc)はPC相対アドレスを作るのに使われる
AUIPCは20ビット符号無し即値から32ビットオフセットを形成する。下位12ビットはゼロで埋められる
このオフセットがPCに加算され、その結果はrdに入れられる
  AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type
  format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with
  zeros, adds this offset to the pc, then places the result in register rd.

AUIPC命令は、制御フロー転送とデータアクセスの両方のための、２連続命令によるPCから任意のオフセットへのアクセスをサポートします
The combination of an AUIPC and the
12ビット即値 in a JALR can transfer control to any 32-bit PC-relative address, while an
AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any
32-bit PC-relative data address.
The current PC can be obtained by setting the U-immediate to 0. Although a JAL +4
instruction could also be used to obtain the PC, it might cause pipeline breaks in simpler microarchitectures
or pollute the BTB structures in more complex microarchitectures.


----------------------
レジスタ・レジスタ命令
----------------------
RV32Iはいくつかの算術R型命令を定義します
全ての演算はrs1レジスタとrs2レジスタから、ソース・オペランドとして読み、結果をrdレジスタに書き込みます
funct7とfunct3のフィールドは演算種を選択します
  RV32I defines several arithmetic R-type operations. All operations read the rs1 and rs2 registers
  as source operands and write the result into register rd. The funct7 and funct3 fields select the
  type of operation.


ADDとSUBはそれぞれ加算と減算として機能します
オーバーフローは無視されて、rdレジスタに書き込まれます
SLTとSLTUはそれぞれ
符号付きと符号無しの、rs1 < rs2 ならrレジスタに１を、そうでなければ０を書き込む比較として機能します
  ADD and SUB perform addition and subtraction respectively. Overflows are ignored and the low
  XLEN bits of results are written to the destination. SLT and SLTU perform signed and unsigned
  compares respectively, writing 1 to rd if rs1 < rs2, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1
  if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudo-op SNEZ rd, rs). AND, OR,
  and XOR perform bitwise logical operations.
  SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in
  register rs1 by the shift amount held in the lower 5 bits of register rs 2


R型命令書式
     +---------+-------+-------+--------+-------+---------+
     |  funct7 |  rs2  |  rs1  | funct3 |  rd   |  opcode |
     +---------+-------+-------+--------+-------+---------+
 ADD | 0000000 | bbbbb | bbbbb |   000  | bbbbb | 0110011 |
 SUB | 0100000 | bbbbb | bbbbb |   000  | bbbbb | 0110011 |
 SLT | 0000000 | bbbbb | bbbbb |   010  | bbbbb | 0110011 |
SLTU | 0000000 | bbbbb | bbbbb |   011  | bbbbb | 0110011 |
 XOR | 0000000 | bbbbb | bbbbb |   100  | bbbbb | 0110011 |
 SRL | 0000000 | bbbbb | bbbbb |   101  | bbbbb | 0110011 |
 SLL | 0000000 | bbbbb | bbbbb |   001  | bbbbb | 0110011 |
 SRA | 0100000 | bbbbb | bbbbb |   101  | bbbbb | 0110011 |
  OR | 0000000 | bbbbb | bbbbb |   110  | bbbbb | 0110011 |
 AND | 0000000 | bbbbb | bbbbb |   111  | bbbbb | 0110011 |
     +---------+-------+-------+--------+-------+---------+


============
制御転送命令
============
RV32Iは二通りの制御転送命令を提供する
無条件ジャンプと条件分岐である
RV32Iにおいて制御転送命令は遅延スロットを持たない.
  RV32I provides two types of control transfer instructions: unconditional jumps and conditional
  branches. Control transfer instructions in RV32I do not have architecturally visible delay slots.


--------------
無条件ジャンプ
--------------
JAL命令(jump and link))は２バイトの倍数の符号付きオフセットを使う
ジャンプ先アドレスを形成するために、オフセットは符号拡張されて、PCに加算される
±１MBの範囲にジャンプが可能である
JALはrdにPC+4した命令アドレスを格納する
  The jump and link (JAL) instruction uses the UJ-type format, where the J-immediate encodes a
  signed offset in multiples of 2 bytes. The offset is sign-extended and added to the pc to form the
  jump target address. Jumps can therefore target a ±1 MiB range. JAL stores the address of the
  instruction following the jump (pc+4) into register rd. The standard software calling convention
  uses x1 as the return address register.
  Plain unconditional jumps (assembler pseudo-op J) are encoded as a JAL with rd=x0.

J型命令書式
    +----------------------+-------+---------+
    |          offset      |   rd  |  opcode |
    +----------------------+-------+---------+
JAL | bbbbbbbbbbbbbbbbbbbb | bbbbb | 1101111 |
    +----------------------+-------+---------+
間接ジャンプ命令である JALR (jump and link register)はI型の書式を使います
対象アドレスは、rs1の内容に12ビット符号付きI型即値を加算することで、得られます
その結果の最下位ビットはゼロにされます
ジャンプ後の命令アドレス(pc+4)は、rdの指すレジスタへ書き込まれます
もし結果が不要なら、格納先にゼロ・レジスタを使うことができます
  The indirect jump instruction JALR (jump and link register) uses the I-type encoding. The target
  address is obtained by adding the 12-bit signed I-immediate to the register rs1, then setting the
  least-significant bit of the result to zero. The address of the instruction following the jump (pc+4)
  is written to register rd. Register x0 can be used as the destination if the result is not required.


I型命令書式
     +--------------+-------+--------+-------+---------+
     |      imm     |  rs1  | funct3 |   rd  |  opcode |
     +--------------+-------+--------+-------+---------+
JALR | bbbbbbbbbbbb | bbbbb |   000  | bbbbb | 1100111 |
     +--------------+-------+--------+-------+---------+
   
JALとJALR命令は、対象アドレスが４バイト境界に揃っていないなら、
命令フェッチ不揃い例外を発生させることができます
  The JAL and JALR instructions can generate a misaligned instruction fetch exception if the target
  address is not aligned to a four-byte boundary.


--------
条件分岐
--------
全ての分岐命令は、SB型命令書式を使います
12ビットB-即値は、２の倍数の符号付きオフセットにエンコードされ、
対象アドレスを得るために現在PCに加算される
条件分岐の範囲は±4KBです
  All branch instructions use the SB-type instruction format. The 12-bit B-immediate encodes signed
  offsets in multiples of 2, and is added to the current pc to give the target address. The conditional
  branch range is ±4 KiB.


SB型命令書式
     +---------+-------+-------+--------+-------+---------+
     |  offset |  rs2  |  rs1  | funct3 |   rd  |  opcode |
     +---------+-------+-------+--------+-------+---------+
 BEQ | bbbbbbb | bbbbb | bbbbb |  000   | bbbbb | 1100011 |
 BNE | bbbbbbb | bbbbb | bbbbb |  001   | bbbbb | 1100011 |
 BLT | bbbbbbb | bbbbb | bbbbb |  100   | bbbbb | 1100011 |
 BGE | bbbbbbb | bbbbb | bbbbb |  101   | bbbbb | 1100011 |
BLTU | bbbbbbb | bbbbb | bbbbb |  110   | bbbbb | 1100011 |
BGEU | bbbbbbb | bbbbb | bbbbb |  111   | bbbbb | 1100011 |
     +---------+-------+-------+--------+-------+---------+

分岐命令はふたつのレジスタの内容を比較します
BEQとBNEはそれぞれ、rs1レジスタの内容とrs2レジスタの内容が、等しいが等しくないかで分岐します
BLTとBLTUはそれぞれ、符号付き符号無しとして、rs1レジスタの内容がrs2レジスタの内容より小さい場合に分岐します
BGEとBGEUはそれぞれ、符号付き符号無しとして、rs1レジスタの内容がrs2レジスタの内容より大きいか等しい場合に分岐します


==================
ロード・ストア命令
==================
RV32Iとは、ロード・ストア命令はメモリにのみアクセスし、
算術命令はCPUレジスタ間の演算のみを行う、
ロード・ストア型アーキテクチャです
.


RV32Iは、リトルエンディアン・バイトアドレスの32ビットユーザーアドレス空間を提供します
実行環境は
The execution environment will define what portions of
the address space are legal to access.

  RV32I is a load-store architecture, where only load and store instructions access memory and
  arithmetic instructions only operate on CPU registers. RV32I provides a 32-bit user address space
  that is byte-addressed and little-endian. The execution environment will define what portions of
  the address space are legal to access.


I型命令書式
    +--------------+-------+--------+-------+---------+
    |     offset   |  rs1  | funct3 |   rd  |  opcode |
    +--------------+-------+--------+-------+---------+
 LB | bbbbbbbbbbbb | bbbbb |  000   | bbbbb | 0000011 |
 LH | bbbbbbbbbbbb | bbbbb |  001   | bbbbb | 0000011 |
 LW | bbbbbbbbbbbb | bbbbb |  010   | bbbbb | 0000011 |
LBU | bbbbbbbbbbbb | bbbbb |  100   | bbbbb | 0000011 |
LHU | bbbbbbbbbbbb | bbbbb |  101   | bbbbb | 0000011 |
    +--------------+-------+--------+-------+---------+


S型命令書式
   +---------+-------+-------+--------+--------+---------+
   |  offset |  rs2  |  rs1  | funct3 | offset |  opcode |
   +---------+-------+-------+--------+--------+---------+
SB | bbbbbbb | bbbbb | bbbbb |   000  |  bbbbb | 0100011 |
SH | bbbbbbb | bbbbb | bbbbb |   001  |  bbbbb | 0100011 |
SW | bbbbbbb | bbbbb | bbbbb |   010  |  bbbbb | 0100011 |
   +---------+-------+-------+--------+--------+---------+
ロード・ストア命令はレジスタ・メモリ間で値を転送します
ロード系命令はI型形式、ストア系はS型形式にエンコードされています
実効バイトアドレスは、rs1レジスタに符号拡張した12ビットオフセットを加算することで、得られます
ロード系命令は、メモリーからrdレジスタへ値をコピーします
ストア系命令は、rs2レジスタからメモリーへ値をコピーします
LW命令は32ビット値をメモリからrdレジスタへ読み込みます
LH命令は16ビット値をメモリから読み出し、32ビット値に符号拡張してからrdレジスタに格納します
LHU命令は16ビット値をメモリから読み出し、32ビット値にゼロ拡張してからrdレジスタに格納します
LBとLBUは8ビット値を同様に処理します
SW,SH,SB命令は、32ビット値,16ビット値,8ビット値をrs2レジスタの下位ビットからメモリへ格納します.
最高の性能のためには、すべてのストア・ロード命令のための実効アドレスは、それぞれのデータ型に適合していなければなりません
(例えば、32ビットアクセスは4バイト境界に、16ビット値は２バイトに).
基本ISAは不揃いなアクセスをサポートしますが、それらは実装に応じて、非常に遅くなる可能性があります.
  Load and store instructions transfer a value between the registers and memory. Loads are encoded
  in the I-type format and stores are S-type. The effective byte address is obtained by adding register
  rs1 to the sign-extended 12-bit offset. Loads copy a value from memory to register rd. Stores copy
  the value in register rs2 to memory.
  The LW instruction loads a 32-bit value from memory into rd. LH loads a 16-bit value from memory,
  then sign-extends to 32-bits before storing in rd. LHU loads a 16-bit value from memory but then
  zero extends to 32-bits before storing in rd. LB and LBU are defined analogously for 8-bit values.
  The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register
  rs2 to memory.
  For best performance, the effective address for all loads and stores should be naturally aligned
  for each data type (i.e., on a four-byte boundary for 32-bit accesses, and a two-byte boundary for
  16-bit accesses). The base ISA supports misaligned accesses, but these might run extremely slowly
  depending on the implementation. Furthermore, naturally aligned loads and stores are guaranteed
  to execute atomically, whereas misaligned loads and stores might not, and hence require additional
  synchronization to ensure atomicity


============
メモリモデル
============
基本RISC-V ISAは単一ユーザーアドレス空間内での、複数の並行スレッド実行をサポートする
それぞれのRISC-Vスレッドは、所有ユーザーのプログラム・カウンターとレジスタの状態を持ち、
独立した命令の流れを実行する
実行環境は、どのようにRISC-Vスレッドの生成と管理をするかを定義するでしょう
RISC-Vスレッドは、別途に記載された各実行環境の仕様の、実行環境への呼び出しを通じて
または直接、共有メモリシステム通じての、
各スレッドが他のスレッドと通信と同期することが可能です
RISC-Vスレッドは、IOデバイスとの対話や、お互いに間接的な、アドレス空間のIOへの代入の部分のロードとストアを通じてができる
基本RISC-V ISAでは、各RISC-Vスレッドは、それらがプログラム順序での連続的に実行した、自身の共有メモリ操作を監視する
RISC-Vは、異なるRISC-Vスレッドからのメモリ操作との間のいかなる指定順序を保証する
明示的なFENCE命令を必要とするスレッドとの間の緩やかなメモリ・モデルを持つ
  The base RISC-V ISA supports multiple concurrent threads of execution within a single user address
  space. Each RISC-V thread has its own user register state and program counter, and executes an
  independent sequential instruction stream. The execution environment will define how RISC-V
  threads are created and managed. RISC-V threads can communicate and synchronize with other
  threads either via calls to the execution environment, which are documented separately in the
  specification for each execution environment, or directly via the shared memory system. RISC-V
  threads can also interact with I/O devices, and indirectly with each other, via loads and stores to
  portions of the address space assigned to I/O.
  In the base RISC-V ISA, each RISC-V thread observes its own memory operations as if they
  executed sequentially in program order. RISC-V has a relaxed memory model between threads,
  requiring an explicit FENCE instruction to guarantee any specific ordering between memory operations
  from different RISC-V threads. Chapter 6 describes the optional atomic memory instruction
  extensions “A”, which provide additional synchronization operations.


      +------+----+----+----+----+----+----+----+----+-------+--------+-------+---------+
      |   0  | PI | PO | PR | PW | SI | SO | SR | SW |  rs1  | funct3 |  rd   |  opcode |
      +------+----+----+----+----+----+----+----+----+-------+--------+-------+---------+
FENCE | 0000 |  b |  b |  b |  b |  b |  b |  b |  b | bbbbb |   000  | bbbbb | 0001111 |
      +------+----+----+----+----+----+----+----+----+-------+--------+-------+---------+
FENCE命令は、他のRISCVスレッドと、外部デバイスかコプロセッサから見たメモリアクセスと、オーダーデバイスIOに使われる
デバイス入力(I), デバイス出力(O), メモリ読み込み(R),メモリ書き込み(W) のいかなる組み合わせも
同じいかなる組み合わせを尊重して、順序づけられる
Informally, no other RISC-V thread or external device can observe any operation
in the successor set following a FENCE before any operation in the predecessor set preceding the
FENCE.
The execution environment will define what I/O operations are possible, and in particular,
which load and store instructions might be treated and ordered as device input and device output
operations respectively rather than memory reads and writes.
For example, memory-mapped I/O
devices will typically be accessed with uncached loads and stores that are ordered using the I and O
bits rather than the R and W bits.
Instruction-set extensions might also describe new coprocessor
I/O instructions that will also be ordered using the I and O bits in a FENCE.
The unused fields in the FENCE instruction, imm[11:8], rs1, and rd, are reserved for finer-grain
fences in future extensions.
 For forward compatibility, base implementations shall ignore these
fields, and standard software shall zero these fields.

  The FENCE instruction is used to order device I/O and memory accesses as viewed by other RISCV
  threads and external devices or coprocessors. Any combination of device input (I), device output
  (O), memory reads (R), and memory writes (W) may be ordered with respect to any combination
  of the same. Informally, no other RISC-V thread or external device can observe any operation
  in the successor set following a FENCE before any operation in the predecessor set preceding the
  FENCE. The execution environment will define what I/O operations are possible, and in particular,
  which load and store instructions might be treated and ordered as device input and device output
  operations respectively rather than memory reads and writes. For example, memory-mapped I/O
  devices will typically be accessed with uncached loads and stores that are ordered using the I and O
  bits rather than the R and W bits. Instruction-set extensions might also describe new coprocessor
  I/O instructions that will also be ordered using the I and O bits in a FENCE.
  The unused fields in the FENCE instruction, imm[11:8], rs1, and rd, are reserved for finer-grain
  fences in future extensions. For forward compatibility, base implementations shall ignore these
  fields, and standard software shall zero these fields.


        +--------------+-------+--------+-------+---------+
        |      imm     |  rs1  | funct3 |   rd  |  opcode |
        +--------------+-------+--------+-------+---------+
FENCE.I | 000000000000 | 00000 |   001  | 00000 | 0001111 |
        +--------------+-------+--------+-------+---------+
FENCE.I命令はデータストリームと命令の同期に使われます
RISC-Vは、命令メモリに格納されることを保証しない
 will be made visible to 命令フェッチ on
同じRISC-Vスレッドuntil a FENCE.I命令が実行されるまで
FENCE.I命令 only
ensures that a subsequent 命令フェッチ on a RISC-Vスレッドwill see any previous data stores
already visible to RISC-Vスレッド
FENCE.I does not ensure that 他のRISC-Vスレッドの命令フェッチは、マルチプロセッサーシステムでのローカルスレッドの格納を監視します.
すべてのRISC-Vスレッドへ命令メモリ、 
, the 書き込みスレッドはデータFENCEを実行するための
 全てのリモートRISC-VスレッドがFENCE.Iを実行することを要求する前に、

FENCE.I命令の未使用フィールド, imm[11:0], rs1, and rd,は将来の拡張のより細かいフェンスのために予約されています
前方互換性のために, 基本実装はこれらを無視されるべきです
標準的なソフトウェアはこれらのフィールドをゼロにされるべきです
  The FENCE.I instruction is used to synchronize the instruction and data streams. RISC-V does
  not guarantee that stores to instruction memory will be made visible to instruction fetches on
  the same RISC-V thread until a FENCE.I instruction is executed. A FENCE.I instruction only
  ensures that a subsequent instruction fetch on a RISC-V thread will see any previous data stores
  already visible to the same RISC-V thread. FENCE.I does not ensure that other RISC-V threads’
  instruction fetches will observe the local thread’s stores in a multiprocessor system. To make a
  store to instruction memory visible to all RISC-V threads, the writing thread has to execute a data
  FENCE before requesting that all remote RISC-V threads execute a FENCE.I.
  The unused fields in the FENCE.I instruction, imm[11:0], rs1, and rd, are reserved for finer-grain
  fences in future extensions. For forward compatibility, base implementations shall ignore these
  fields, and standard software shall zero these fields.


======================
制御と状態レジスタ命令
======================
SYSTEM命令は、特権アクセスを要するシステムの機能にアクセスするのに使われます
これらはI型命令コード書式を使ってエンコードされています
これらは主に二つのクラスの分けることができます：
不可分 read-modify-write制御と状態レジスタ、
そして、他の全ての潜在的特権命令です


-------
CSR命令
-------
CSR命令を定義した
ただし、標準ユーザーレベルベースISAでは
一握りの読み取り専用カウンターCSRにのみ、アクセスが可能です
  We define the full set of CSR instructions here, although in the standard user-level base ISA, only
  a handful of read-only counter CSRs are accessible.


       +--------------+-------+--------+-------+---------+
       |     csr      |  rs1  | funct3 |  rd   |  opcode |
       +--------------+-------+--------+-------+---------+
 CSRRW | bbbbbbbbbbbb | bbbbb |   001  | bbbbb | 1110011 |
 CSRRS | bbbbbbbbbbbb | bbbbb |   010  | bbbbb | 1110011 |
 CSRRC | bbbbbbbbbbbb | bbbbb |   011  | bbbbb | 1110011 |
       +--------------+-------+--------+-------+---------+
       +--------------+-------+--------+-------+---------+
       |     csr      |  zimm | funct3 |  rd   |  opcode |
       +--------------+-------+--------+-------+---------+
CSRRWI | bbbbbbbbbbbb | bbbbb |   101  | bbbbb | 1110011 |
CSRRSI | bbbbbbbbbbbb | bbbbb |   110  | bbbbb | 1110011 |
CSRRCI | bbbbbbbbbbbb | bbbbb |   111  | bbbbb | 1110011 |
       +--------------+-------+--------+-------+---------+


CSRRW (Atomic Read/Write CSR)命令は、CSRと整数レジスタで、値を不可分にスワップします
CSRRWは
CSRのXLENビットの値にゼロ拡張した古い値を読み込んでから、rd整数レジスタにそれを書き込みます.
rs1レジスタの初期値がCSRに書き込まれます
もしrdレジスタがゼロレジスタなら、命令はCSRを読まず、いかなる副作用も起こしません
CSRRS (Atomic Read and Set Bits in CSR)命令は、CSRの値をゼロ拡張した値を読み、rd整数レジスタにそれを書き込みます

rs1整数レジスタの初期値は、CSRに設定されたビットマスクを施されます
rs1レジスタのどんな高ビットも対応するビット もしCSRビットが書き込み可能なら、to be set in the CSR,
CSRの他のビットは影響を受けません
The CSRRC (Atomic Read and Clear Bits in CSR)命令はCSRの値を読み込み、ゼロ拡張して、rd整数レジスタに書き込みます
整数レジスタrs1の初期値は、CSRでクリアされるビット位置を指定するビットマスクとして扱われます
rs1が高いビットは、対応するビットがCSRでクリアされます。そのCSRビットが
書き込み可能
CSRの他のビットは影響を受けません
For both CSRRSとCSRRCの両方は, もし rs1=x0,なら命令はCSRへ書き込まない
何も副作用を起こさない
Note that もし rs1レジスタに指定したレジスタが、ゼロ・レジスタ以外で、内容値がゼロなら、
命令はCSRに未修正の値を書き戻そうとして、副作用を引き起こす
CSRRWI, CSRRSI, CSRRCIの派生はそれぞれCSRRW, CSRRS,CSRRCと同じ
except they update the CSR using an XLENビット値obtained 5ビット即値をゼロ拡張して得られたzimmfield encoded in the rs1レジスタ指定フィールドの整数レジスタの値の代わりに

ForCSRRSI and CSRRCI, もしzimmフィールドがゼロなら、これらの命令はCSRへ書き込まない,副作用を起こすことはない
ForCSRRWI, もし rdレジスタがゼロ・レジスタなら、命令はCSRから読み込まず、副作用を起こすことはない
Some retired counter, instretといったいくつかのCSRsは命令実行の副作用として変更されるかもしれません
  The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in the CSRs and
  integer registers. CSRRW reads the old value of the CSR, zero-extends the value to XLEN bits,
  then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd=x0, then
  the instruction shall not read the CSR and shall not cause any of the side-effects that might occur
  on a CSR read.
  The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the value of the CSR, zeroextends
  the value to XLEN bits, and writes it to integer register rd. The initial value in integer
  register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that
  is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable.
  Other bits in the CSR are unaffected (though CSRs might have side effects when written).
  The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zeroextends
  the value to XLEN bits, and writes it to integer register rd. The initial value in integer
  register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit
  that is high in rs1 will cause the corresponding bit to be cleared in the CSR, if that CSR bit is
  writable. Other bits in the CSR are unaffected.
  For both CSRRS and CSRRC, if rs1=x0, then the instruction will not write to the CSR at all, and
  so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising
  illegal instruction exceptions on accesses to read-only CSRs. Note that if rs1 specifies a register
  holding a zero value other than x0, the instruction will still attempt to write the unmodified value
  back to the CSR and will cause any attendant side effects.
  The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively,
  except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit
  immediate (zimm[4:0]) field encoded in the rs1 field instead of a value from an integer register. For
  CSRRSI and CSRRCI, if the zimm[4:0] field is zero, then these instructions will not write to the
  CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write. For
  CSRRWI, if rd=x0, then the instruction shall not read the CSR and shall not cause any of the
  side-effects that might occur on a CSR read.
  Some CSRs, such as the instructions retired counter, instret, may be modified as side effects
  of instruction execution. In these cases, if a CSR access instruction reads a CSR, it reads the
  value prior to the execution of the instruction. If a CSR access instruction writes a CSR, the
  update occurs after the execution of the instruction. In particular, a value written to instret by
  one instruction will be the value read by the following instruction (i.e., the increment of instret
  caused by the first instruction retiring happens before the write of the new value).
  The assembler pseudo-instruction to read a CSR, CSRR rd, csr, is encoded as CSRRS rd, csr, x0.
  The assembler pseudo-instruction to write a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr,
  rs1, while CSRWI csr, zimm, is encoded as CSRRWI x0, csr, zimm.
  Further assembler pseudo-instructions are defined to set and clear bits in the CSR when the old
  value is not required: CSRS/CSRC csr, rs1; CSRSI/CSRCI csr, zimm.


====================
タイマーとカウンター
====================
+--------------+-------+--------+-------+---------+
|      csr     |  rs1  | funct3 |  rd   |  opcode |
+--------------+-------+--------+-------+---------+
| bbbbbbbbbbbb | 00000 |   010  | bbbbb | 1110011 |
| bbbbbbbbbbbb | 00000 |   010  | bbbbb | 1110011 |
| bbbbbbbbbbbb | 00000 |   010  | bbbbb | 1110011 |
+--------------+-------+--------+-------+---------+

RV32Iは、64ビットの読み取り専用ユーザーレベルカウンター値を提供します
 which are mapped into the 12ビットCSRアドレス空間とaccessed in 32ビット pieces using CSRRS命令


RDCYCLE偽命令は、プロセッサによって実行された過去に任意の開始時刻からハードウェアスレッドが稼働している
クロックサイクル数のカウントを保持するcycle CSRのXLENビット列の下位を読み込みます
RDCYCLEHは同サイクルカウンターの53-32ビットを読み込む、RV32Iのみの命令です
64ビットカウンターはオーバーフローすることはありません
サイクルカウンターの進む比率は実装と実行環境に依存するでしょう
実行環境は、サイクルカウンターが増加したときの現在の比率(cycles/second)を決定する手段を提供しなければなりません

RDTIME偽命令は、過去に任意の開始時間から経過した壁時計の実時間をカウントするtime CSRの、
XLENビット列の下位を読み込みます
RDTIMEHは同実時間カウンターから63–32ビットを読み込む、RV32Iのみの命令です
64ビットカウンターはオーバーフローすることはありません
実行環境は、サイクルカウンターが増加したときの現在の比率(cycles/second)を決定する手段を提供しなければなりません
RDINSTRET偽命令は、過去のある任意の開始時点から、このハードウェアスレッドによって通過した命令の数をカウントするinstret CSRのXLENビット列の下位を読み込みます
RDINSTRETHは63–32ビットを読み込む、同カウンターのRV32Iのみの命令です
  RV32I provides a number of 64-bit read-only user-level counters, which are mapped into the 12-bit
  CSR address space and accessed in 32-bit pieces using CSRRS instructions.
  The RDCYCLE pseudo-instruction reads the low XLEN bits of the cycle CSR which holds a count
  of the number of clock cycles executed by the processor on which the hardware thread is running
  from an arbitrary start time in the past. RDCYCLEH is an RV32I-only instruction that reads bits
  63–32 of the same cycle counter. The underlying 64-bit counter should never overflow in practice.
  The rate at which the cycle counter advances will depend on the implementation and operating
  environment. The execution environment should provide a means to determine the current rate
  (cycles/second) at which the cycle counter is incrementing.
  The RDTIME pseudo-instruction reads the low XLEN bits of the time CSR, which counts wall-clock
  real time that has passed from an arbitrary start time in the past. RDTIMEH is an RV32I-only instruction
  that reads bits 63–32 of the same real-time counter. The underlying 64-bit counter should
  never overflow in practice. The execution environment should provide a means of determining the
  period of the real-time counter (seconds/tick). The period must be constant. The real-time clocks
  of all hardware threads in a single user application should be synchronized to within one tick of the
  real-time clock. The environment should provide a means to determine the accuracy of the clock.
  The RDINSTRET pseudo-instruction reads the low XLEN bits of the instret CSR, which counts
  the number of instructions retired by this hardware thread from some arbitrary start point in the
  past. RDINSTRETH is an RV32I-only instruction that reads bits 63–32 of the same instruction
  counter. The underlying 64-bit counter that should never overflow in practice.


==============================
環境呼び出しとブレークポイント
==============================
       +--------------+-------+--------+-------+---------+
       |    funct12   |   rs1 | funct3 |   rd  |  opcode |
       +--------------+-------+--------+-------+---------+
 ECALL | 000000000000 | 00000 |   000  | 00000 | 1110011 |
EBREAK | 000000000001 | 00000 |   000  | 00000 | 1110011 |
       +--------------+-------+--------+-------+---------+
       

ECALL命令は、通常OS実行環境のサポートへの要求を作るために使われる
システムABIは、環境要求を通すパラメーターを定義するでしょう
しかし通常これらは、整数レジスタファイルの決まったところになります
EBREAK命令は、デバッギングの経路制御にのため、デバッガーによって使われます
  The ECALL instruction is used to make a request to the supporting execution environment, which is
  usually an operating system. The ABI for the system will define how parameters for the environment
  request are passed, but usually these will be in defined locations in the integer register file.
  The EBREAK instruction is used by debuggers to cause control to be transferred back to a debugging