科目:計算機結構【資工系碩士班】√ #### 1. (20%) The cost of an integrated circuit can be expressed in three simple equations: cost per die = $$\frac{\text{cost per wafter}}{\text{dies per wafter} \times \text{yield}}$$ dies per wafter $\approx \frac{\text{wafer area}}{\text{die area}}$ yield = $\left[1 + \left(\text{defects per area} \times \frac{\text{die area}}{\alpha}\right)\right]^{-\alpha}$ The first equation is straightforward to derive. The second is an approximation, since it does not subtract the area near the border of the round wafer that cannot accommodate the rectangular dies. The final equation is based on years of empirical observations of yields at integrated circuit factories, with the exponent $\alpha$ related to the number of critical processing steps in the manufacturing process. Assume that $\alpha = 2$ in this problem. - 1.1 (5%) What is the relationship between the die cost and die area? Express the relationship in terms of a polynomial $f(x) = \sum_{i=0}^{n} a_i x^i$ where x denotes the die area and f(x) denotes the die cost. What is the degree of the polynomial? - 1.2 (5%) What is the approximate cost of a die of area 250 mm<sup>2</sup> in an 8-inch (200 mm) diameter wafer that contains 165 dies where each die is a Pentium 4 processor? Assume that an 8-inch wafer costs US dollars \$1000 and that the defect density is 1 per square centimeter (cm<sup>2</sup>). - 1.3 (5%) DRAM chips have significantly increased in die size with each generation, yet yields have stayed about the same. The following table shows key statistics for DRAM production over 12 years. Given the increase in die area of DRAMs, what parameter (see the above equations) must improve to maintain yield? Derive a formula for the improving parameter in terms of other parameters. | Year | Capacity (Kbits) | Die area (sq. cm) | Wafer diameter (inches) | Yield | |------|------------------|-------------------|-------------------------|-------| | 1980 | 64 | 0.16 | . 5 | 48% | | 1983 | 256 | 0.24 | 5 | 46% | | 1985 | 1,024 | 0.42 | 6 | 45% | | 1989 | 4,096 | 0.65 | 6 | 43% | | 1992 | 16,384 | 0.97 | 8 | 48% | 1.4 (5%) In general, the total power consumption of integrated circuits is composed of two factors: the dynamic switching power $P_{sw}$ and the static leakage power $P_{leak}$ . Dynamic power can be expressed as $P_{sw} = a*C*V^2*f$ where a is a constant; C is the parasitic capacitance; V is the switching voltage; f is the switching frequency. The static leakage power in general depends on the process technology. In technology A, the static power consumption takes about 10% of the total power while in technology B, the static power takes about 25% of the total power. Assume a CPU runs at a frequency of 2G Hz with supply voltage Vdd=1.5 volts in technology A and consumes a total power of 20 watt. What is the total power of the same CPU design fabricated using process technology B with Vdd=1.0 volt running #### 科目:計算機結構【資工系碩士班】 at 4G Hz? Assume that the overall parasitic capacitance of technology B is about half of technology A. #### 2. (20%) 2.1 (5%) Assume that the MIPS architecture has 5-stage pipeline: instruction fetch (IF), instruction decode (ID), execution (EX), memory access (MA), and write back (WB). Identify all of the data dependencies in the following sequence of MIPS instructions. Which dependencies are data hazards that can be resolved via forwarding? add \$3, \$4, \$2 # add the contents of registers \$4 and \$2 and put the result into register \$3 sub \$5, \$3, \$1 # subtract the contents of \$1 from \$3 and put the result into \$5 lw \$6, 200(\$3) # load into \$6 the value in memory with address calculated as # the sum of register content of \$3 and the immediate value 200 add \$7, \$3, \$6 2.2 (5%) Consider executing the following code on the pipelined datapath in the following figure: add \$2, \$3, \$1 sub \$4, \$3, \$5 add \$5, \$3, \$7 add \$7, \$6, \$1 add \$8, \$2, \$6 At the end of the fifth cycle, which registers are being read and which register will be written? # 科目:計算機結構【資工系碩士班】 2.3 (5%) Compare performance for multi-cycle and pipelined control using the SPECint2000 instruction mix and assuming the same cycle times per unit. Assume the following functional unit execution times: 200ps for memory access, 100ps for ALU operation, 50ps for register file read or write. Assume the following instruction frequencies: 52% ALU instructions. 2% jumps, 10% stores, 11% branches, 25% loads, In the multi-cycle design, the entire instruction execution is divided into the following 5 states with the one-cycle execution time for each state: (1) instruction-fetch, (2) instruction-decode/register-fetch, (3) memory-access, execution/address-computation/branch-jump-completion, (4) memory-read-completion. The load instruction requires states (1)(2)(3)(4)(5); the store instruction needs states (1)(2)(3)(4); each ALU instruction needs states (1)(2)(3)(5); branch or jump instruction needs states (1)(2)(3). Calculate the CPI (cycles per instruction) for the multi-cycle design. What is the cycle time of this multi-cycle design? What is the cycle time of the single cycle design where all the five states are executed within a single cycle? 2.4 (5%) Continue with the above problem. Calculate the CPI for the pipelined design. For pipelined execution, assume that half of the load instructions are immediately followed by an instruction that uses the result, that the branch delay on misprediction is 1 clock cycle, and that one quarter of the branches are mispredicted. Assume that jumps always pay 1 full clock cycle of delay, so their average time is 2 clock cycles. Ignore any other hazards. ## 3. (20%) ί - 3.1 (5%) Here is a series of address references given as word addresses: 2, 3, 11, 16, 21, 13, 64, 48, 19, 11, 3, 22, 4, 27, 6, and 11. Assuming a direct-mapped cache with 16 one-word blocks that is initially empty, label each reference in the above list as a hit or a miss and show the final contents of the cache. 3.2 (5%) Assume the same address reference patterns in the above problem. Show the hits and misses and final cache contents for a direct-mapped cache with four-word blocks and a total size of 16 words. Hint: there are 4 blocks in this cache with each block size of 4 words (or 4\*32=128 bits) - 3.3 (5%) Cache C1 is direct-mapped with 16 one-word blocks. Cache C2 is a direct-mapped with 4 four-word blocks. Assume that the miss penalty for C1 is 4 memory bus clock cycles and the miss penalty for C2 is 7 memory bus clock cycles. Assuming that the caches are initially empty, find an address list for which C2 has a lower miss rate but spends more memory bus clock cycles on cache misses than C1. What are the corresponding miss rates for C1 and C2 in this reference string? - 3.4 (5%) To capture the fact that the time to access data for both hits and misses affects performance, designers often use average memory access time (AMAT) defined as AMAT = Time for a hit + Miss rate \* Miss penalty which is the average time to access memory considering both hits and misses and the frequency of different accesses. Find the AMAT for a processor with a 2 ns clock, a miss penalty of 20 clock cycles, a miss rate of 0.05 misses per instruction, and a cache access time (including hit detection) of 1 clock cycle. Assume that the read and write miss penalties are the same and ignore other write stalls. # 科目:計算機結構【資工系碩士班】 Suppose we can improve the miss rate to 0.03 misses per reference by doubling the cache size. This causes the cache access time to increase to 1.2 clock cycles. Using the AMAT as a metric, determine if this is a good trade-off. Why? ### 4. (20%) Find the word or phrase from the following list below that best matches the description in the following questions. Use the corresponding number of the chosen word in the answer. - (1) abstraction, (2) Amdahl's law, (3) assembler, (4) bit, (5) branch target buffer, (6) branch history table or branch prediction buffer, (7) cache, (8) CPU, (9) chip, (10) compiler, (11) computer family, (12) control, (13) datapath, (14) desktop PC, (15) DVD, (16) defect, (17) DRAM, (18) embedded system, (19) instruction, (20) instruction set architecture, (21) LAN, (22) linker, (23) loader, (24) memory, (25) Moore's law, (26) operating system or OS, (27) productivity, (28) semiconductor, (29) server, (30) SRAM, (31) supercomputer, (32) MOSFET, (33) VLSI, (34) wafer, (35) WAN, (36) yield - 4.1 (2%) approach to the design of hardware or software. The system consists of hierarchical layers, with each lower layer hiding details from the level above. - 4.2 (2%) integrated circuits commonly used to construct cache. - 4.3 (2%) program that converts a symbolic instruction into the binary codes. - 4.4 (2%) a system program that places an object program in main memory for execution. - 4.5 (2%) technology in which single chip can contain billions of transistors. - 4.6 (2%) a device acting as an on/off switch controller by electricity. - 4.7 (2%) specify interface that the hardware provides the low-level software. - 4.8 (2%) computer inside another device used for running one predetermined programs. - 4.9 (2%) a rule that states that the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. - 4.10 (2%) a small memory that is indexed by the lower portion of the address of the branch instruction and that contains one or more bits indicating whether the branch was recently taken or not. ## 5 (20%) Briefly answer the following questions and explain your answers. - 5.1 (5%) Neglecting overflow, can a *logical* left shift instruction replace a signed integer multiply by a power of 2? Neglecting overflow, can a *logical* right shift instruction replace a signed integer division by a power of 2? Why? Hint: a logical left shift inserts zeros into least significant bits; a logical right shift inserts zeros into most significant bits. - 5.2 (5%) Are processors with lower cycles per instruction (CPI) always faster? Why? - 5.3 (5%) Does cache with larger block size have smaller miss rate? Why? - 5.4 (5%) Is it true that the number of transistors in a chip grows linearly with respect to advances of years? Why? # 科目:作業系統與資料結構【資工系碩士班甲組】 INSTRUCTIONS: If any question is unclear or you believe some assumptions need to be made, state your assumptions clearly at the beginning of your answer. 1. (15%) Consider the following C program that uses the Pthreads API. What would be the output of the program? (Note that the line numbers are for references only.) ``` 1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <unistd.h> 4 #include <pthread.h> 5 #include <sys/types.h> 7 int value = 5; 9 static void *runner(void *param); 10 int main(int argc, char **argv) 12 { pid_t pid = fork(); 13 14 if (pid > 0) { printf("A = %d\n", ++value); 15 16 else if (pid == 0) { 17 18 pid_t pid = fork(); 19 if (pid > 0) { 20 printf("B = %d\n", ++value); 21 else if (pid == 0) { 22 pid_t pid = fork(); 23 pthread_t tid; 25 pthread_attr_t attr; 26 pthread_attr_init(&attr); 27 pthread_create(&tid, &attr, runner, NULL); pthread_join(tid, NULL); 28 if (pid > 0) 29 printf("C = %d\n", ++value); 30 31 printf("D = %d\n", ++value); 32 33 } 34 else { 35 exit(1); } 36 37 } 38 else { 39 exit(1); }- 40 41 return 0; 42 } 43 44 static void *runner(void *param) 45 { value++; 46 47 pthread_exit(0); 48 } ``` 2. (10%) Consider the interprocess-communication scheme where mailboxes are used. Suppose a process P wants to wait for two messages, one from mailbox A and one from mailbox B. What sequence of send and receive should it execute so that the messages can be received in any order without from being blocked by each other? # 科目:作業系統與資料結構【資工系碩士班甲組】 - 3. Given a UNIX *i*-node with eight direct blocks and three levels of indirect blocks (i.e., a single, a double, and a triple) and assuming that the sizes of a pointer and a block are, respectively, 8 bytes and 8 Kbytes, answer the following questions. - (a) (5%) What would be the size of the smallest file allowed in bytes? - (b) (10%) What would be the size of the largest file allowed in bytes? - 4. Assume a page reference string for a process with m frames (initially all empty). The page reference string has length p with n distinct page numbers occurring in it. For any page-replacement algorithms, - (a) (5%) What is a lower bound on the number of page faults? - (b) (5%) What is an upper bound on the number of page faults? - 5. Given 4 matrices, $M_1$ , $M_2$ , $M_3$ , and $M_4$ , where $M_1$ is a $10 \times 20$ matrix, $M_2$ a $20 \times 50$ matrix, $M_3$ a $50 \times 1$ matrix, and $M_4$ a $1 \times 100$ matrix, answer the following questions. - (a) (8%) What is the minimum cost (number of multiplications) for computing the product $M = M_1 \times M_2 \times M_3 \times M_4$ ? - (b) (7%) What is the minimum cost order? - 6. (10%) Analyze the behavior of QUICKSORT in the case where a schizophrenic adversary picks the best possible splitter (partitioning element) instead of the worst, every other time (i.e., he alternates between best and worst). What running time is induced by this "adversary?" - 7. (15%) The Ackermann function A(m, n) is defined recursively for non-negative integers m and n as follows: $$A(m,n) = \begin{cases} n+1 & \text{if } m = 0, \\ A(m-1,1) & \text{if } m > 0 \text{ and } n = 0, \\ A(m-1,A(m,n-1)) & \text{if } m > 0 \text{ and } n > 0. \end{cases}$$ Its value grows very quickly, even for small values of m and n. For instance, A(4,1)=65533. What would be the value of A(2,2)? 8. (10%) Compute the failure function f of the KMP pattern matching algorithm for the pattern p = abcabcacab. ## 科目:離散數學【資工系碩士班甲組】 / - 1. [15%] Prove the following statement by the Well-Ordering Principle: If $n \in \mathbb{Z}^+$ and n is composite, then there is a prime p such that p|n. - 2. [15%] Prove the following statement by the Pigeonhole Principle: If $m \in \mathbb{Z}^+$ and m is odd, then there exists a positive integer n such that $m|(2^n-1)$ . 3. (a) [5%] Evaluate $$\binom{n}{0} + 2\binom{n}{1} + 2^2\binom{n}{2} + \dots + 2^k\binom{n}{k} + \dots + 2^n\binom{n}{n}, n \in \mathbb{Z}^+$$ . (b) [5%] Evaluate $$\begin{pmatrix} -5\\8 \end{pmatrix} + \begin{pmatrix} -7\\5 \end{pmatrix}$$ . - 4. Determine the number of integer solutions of $x_1 + x_2 + x_3 + x_4 = 32$ , where - (a) [5%] $x_i \ge 0$ , $1 \le i \le 4$ . - (b) [5%] $x_1, x_2 \ge 5, x_3, x_4 \ge 7.$ - (c) [5%] $x_1, x_2, x_3 > 0, 0 < x_4 \le 25.$ - 5. Solve the following recurrence relations. (a) [5%] $$a_n = 5a_{n-1} + 6a_{n-2}$$ , $n \ge 2$ , $a_0 = 1$ , $a_1 = 3$ . (b) [5%] $$a_n - 6a_{n-1} + 9a_{n-2} = 0$$ , $n \ge 2$ , $a_0 = 5$ , $a_1 = 12$ . (c) [5%] $$a_{n+2} + 3a_{n+1} + 2a_n = 3^n$$ , $n \ge 0$ , $a_0 = 0$ , $a_1 = 1$ . - 6. (a) [5%] Determine the sequence generated by the generating function 1/(3-x). - (b) [5%] Determine the sequence generated by the exponential generating function 1/(1-x). - 7. [10%] If |A| = 30 and the equivalence relation R on A partitions A into equivalence classes $A_1$ , $A_2$ , and $A_3$ , where $|A_1| = |A_2| = |A_3|$ , what is |R|? - 8. [10%] Let R be the relation on $A = \{1, 2, 3, 4, 5, 6, 7\}$ , where the directed graph associated with R consists of the two components, each a directed cycle, shown below. Find all integers n's such that 1 < n < 30 and $R^n = R$ . # 科目:電子學【資工系碩士班乙組】 ✓ - 1. (15%) For the circuit shown in Figure 1, we assume the op amp has a open-loop gain $\mu = 10^4$ , input resistance $R_{id} = 200 k\Omega$ , and output resistance $r_o = 1 k\Omega$ . - (a) Find out the voltage gain $\frac{v_o}{v_s}$ , the input resistance $R_{in}$ , and the output resistance $R_{out}$ of the entire circuit. (12%) - (b) What type of the feedback does this circuit use? (3%) 2. (15 %) Figure 2 shows a differential MOS circuit. All transistors have the same size of $W=120 \mu m$ and $L=20 \mu m$ . The remaining transistor parameters are threshold voltage $|V_m|=|V_{tp}|=1$ V, transconductance $k'_n = u_n C_{ox} = k'_p = u_p C_{ox} = 20 \mu A/V^2$ and early voltage $V_A = 20 V$ . Note that the output resistance $r_o$ of the MOS transistor is equal to $V_A/I_D$ . - (a) Find out the voltage gain. (10%) - (b) For what external load does the gain reduce by a factor of 3? (5%) Figure 2 # 科目:電子學【資工系碩士班乙組】 - 3. (20 %) Assume $V_{BE} = 0.7V$ , and $\beta = 100$ for the transistors in the BJT circuit shown in Figure 3. - (a) Draw the equivalent small-signal circuit. You should specify the small-signal resistance values of the BJT transistors. (8 %) - (b) Calculate the mid-band gain. (6 %) - (c) Find which of the following frequencies is closest to the circuit's lower-3dB frequency: (1) 10MHz (2) 75MHz (3) 150 MHz (4) 200 MHz. (6 %) 4. (15 %) Sketch the transfer characteristics (output voltage vesus input voltage) for the following three circuits assuming 0.6V forward drop for the diodes, and $V_{BE} = 0.7V$ for BJT transistors. The ratio of transistor junction areas of Q3 and Q2 in Figure 4(c) is 5/2. ## 科目:電子學【資工系碩士班乙組】 - 5. (35 %) Answer the following questions regarding the digital logic circuit: - (a) For a pseudo-NMOS inverter shown in Figure 5 that has threshold voltages $|V_m| = |V_{tp}| = 0.8 \text{V}$ and MOS transconductance $k'_n = u_n C_{ox} = 50 \,\mu\text{A}/V^2$ , $k'_p = u_p C_{ox} = 20 \,\mu\text{A}/V^2$ , find $V_{OL}$ and $V_{OH}$ (9%) - (b) Consider what effect (an increase, a decrease or none) will increasing $W_p$ have on (1) $NM_H$ (2) $t_{PLH}$ (3) $V_{OH}$ (4) $V_{OL}$ (5) static power dissipation. (You do not need to provide the explanations for your answer.) (10 %) - (c) Draw the circuit diagram of a pseudo-NMOS gate that realizes the logic function $Y = (\overline{A} + \overline{BC})\overline{D}$ . (6%) - (d) Draw the circuit diagram of a pass transistor logic gate that realizes the XOR function of input signals A, and B assuming signals $A, \overline{A}, B$ , and $\overline{B}$ are available. (4%) - (e) A logic CMOS inverter switches between high and low at a frequency of 100MHz. Assume the supply voltage is 5V and the inverter drives a 2-pf load capacitance. Find the static and dynamic power dissipation, respectively. (The transistor parameters used in this inverter are. $$|V_{tn}| = |V_{tp}| = 1.5 \text{V} \text{ and } k_n = k_p = u_n C_{ox} \frac{W_n}{L_n} = u_p C_{ox} \frac{W_p}{L_n} = 200 \,\mu\text{A}/V^2.$$ (6%) Figure 5 ## 科目:工程數學【資工系碩士班乙組】 > #### 總分100分,請依序作答。 1. (15%) Suppose that the frequency response of a continuous-time LTI system is: $$H(j\omega) = \frac{4 - \omega^2}{1 + j\omega}$$ and the input is: $x(t) = 4\cos(t) + \cos(2t)$ . Determine the output y(t) of the system. - 2. (24%) Given a continuous-time signal x(t). The Fourier Transform of x(t) can be denoted as $X(j\omega)$ . ( $\delta(t)$ is unit impulse function; u(t) is unit step function.) - (a). Find the Fourier Transforms of the following signals (12%): (i). $$x(t) = \delta(t+1) + 2\delta(t) + \delta(t-1)$$ (ii). $$x(t) = \frac{\sin(100\pi(t-2))}{\pi(t-2)}$$ (iii). $$x(t) = e^{-t}u(t) - e^{-t}u(t-4)$$ (b). Determine the corresponding time function y(t) of the Fourier transform (12%): $$Y(j\omega) = \frac{1 - \omega^2 + j\omega}{2 - \omega^2 + 3j\omega}$$ 3. (16%) Consider a stable LTI system that is characterized by the differential equation $$\frac{d^{2}y(t)}{dt^{2}} + 4\frac{dy(t)}{dt} + 3y(t) = \frac{dx(t)}{dt} + 2x(t)$$ - (a). Determine the corresponding frequency response $H(j\omega)$ and impulse response h(t). (8%) - (b). Suppose that the input is $x(t) = e^{-t}u(t)$ , where u(t) represents unit step function. Find the output response: y(t), for the given input x(t). (8%) ## 科目:工程數學【資工系碩士班乙組】 4. (15%) Find the z-transform of a discrete-time signal (u[n] is unit step function): $$x[n] = -a^n u[-n-1]$$ 5. (20%) A linear time-invariant system has the following impulse response: - (a). Is the LTI system stable? Give a reason to support your answer. (4%) - (b). Plot $h(t-\tau)$ vs. $\tau$ , for t=2. Label your plot carefully. (4%) - (c). If the input is x(t) = u(t), i.e. unit step function, use the convolution integral to find y(2), i.e. y(t) when t=2. (12%) - 6. (10%) A finite impulse response (FIR) digital filter can be described as: $$y[n] = \sum_{k=0}^{M} b_k x[n-k]$$ Determine the filter coefficients $\{b_k\}$ of the difference equation for the FIR filter when the impulse response h[n] is shown below.