TRANSACTIONS OF

K.C.WONG EDUCATION FOUNDATION

SUPPORTED LECTURES

王宽诚教育基金会

# 学术讲座汇编

主编 钱伟长

·11·

王宽诚教育基金会编

# 王宽诚教育基金会

# 学术讲座汇编

(第11集)

主编: 钱伟长

王寬诚教育基金会编辑出版

# 为促进国内外学术交流 免费赠送有关单位

## 王宽诚教育基金会《学术讲座汇编》

第11集

1996年

编辑出版:王宽诚教育基金会印 刷:上海市印刷三厂

联系地址:王宽诚教育基金会上海学务办事处

(上海市延长路149号上海大学内,邮政编码: 200072)

开本 787×1092 1/16 彩插 1 印张 9.25 字数 423,000 1996年 7 月第 1 版 1996年 7 月第 1 次印刷 印数 1—1050册

# 惠存

# 王宽诚教育基金会敬赠 年 月 日

# 谨以此书纪念本会创建人、故董事会主席王宽诚先生

# 王宽诚教育基金会

DEDICATED TO THE MEMORY OF MR K.C. WONG, FOUNDER OF THE FOUNDATION AND THE LATE CHAIRMAN OF THE BOARD OF DIRECTORS

K.C. WONG EDUCATION FOUNDATION



王宽诚先生

K. C. WONG (1907–1986)

# 王宽诚教育基金会简介

王宽诚先生(1907~1986)为香港知名爱国人士,热心祖国教育事业,生前为故乡宁波的教育事业做出积极贡献。1985年独力捐巨资创建王宽诚教育基金会,其宗旨在于为国家培养高级科技人才,为祖国四个现代化效力。

王宽诚先生在世时聘请海内外知名学者担任基金会考选委员会和学务委员会委员,共商大计,确定采用"送出去"和"请进来"的方针,为国家培育各科专门人才,并为提高国内和港澳高等院校的教学水平,资助学术界人士互访,用以促进中外文化交流。在此方针指导下,1985、1986两年,基金会在国家教委支持下,选派学生85名前往英、美、加拿大和西德、瑞士、澳大利亚各国攻读博士学位,并计划资助国内学者赴港澳讲学,资助港澳学者到国内讲学,资助美国学者来国内讲学。正当基金会事业初具规模,蓬勃发展之时,王宽诚先生一病不起,于1986年年底逝世。这是基金会的重大损失,共事同仁,无不深切怀念,不胜惋惜。

王宽诚教育基金会在新任董事会主席张二铭先生和胡百全、林延新等董事的主持下,继承王宽诚先生为国家培育人才的遗愿,继续努力,除按计划执行外,并开发与英国学术机构合作的新项目。王宽诚教育基金会过去和现在的工作态度一贯以王宽诚先生所倡导的"公正"二字为守则,谅今后基金会亦将秉此行事,奉行不辍。借此王宽诚教育基金会《学术讲座汇编》出版之际,特简明介绍如上。王宽诚教育基金会日常工作繁重,王明远、王明勤、林延新等董事均不辞劳累,做出积极贡献。

钱 伟 长 一九九六年六月

# 前 言

王宽诚教育基金会是由已故全国政协常委、香港著名工商企业家王宽诚先生(1907~1986) 出于爱国热忱,出资一亿美元于1985年在香港注册登记创立的。

1987年,基金会开设"学术讲座"项目,此项目由当时的全国政协常委、现任全国政协副主席、著名科学家、中国科学院院士、上海大学校长、王宽诚教育基金会贷款留学生考选委员会主任委员兼学务委员会主任委员钱伟长教授主持,由钱伟长教授亲自起草设立"学术讲座"的规定,资助国内学者前往香港、澳门讲学,资助美国学者和港澳学者前来国内讲学,用以促进中外学术交流,提高内地及港澳高等院校的教学质量。

王宽诚教育基金会除资助"学术讲座"学者进行学术交流之外,在钱伟长教授主持的项目下,还资助由国内有关高等院校推荐的学者前往欧美亚澳参加国际学术会议,出访的学者均向所出席的会议提交论文,这些论文亦颇有水平,本汇编亦将其收入,以供参考。

王宽诚教育基金会学务委员会

# 凡 例

### (一)编排次序

本书所收集的王宽诚教育基金会学术讲座的讲稿及由王宽诚教育基金会资助学者赴 欧美亚澳参加国际学术会议的论文均按照收到文稿日期先后或文稿内容编排刊列,不分类别。

## (二)分期分册出版并作简明介绍

因文稿较多,为求便于携带,有利阅读与检索,故分期分册出版,每册约 150 页至 200 页不等。为便于读者查考,每篇学术讲座的讲稿均注明作者姓名、学位、职务、讲学日期、地点、访问院校名称。国内及港澳学者到欧、美、澳及亚洲的国家和地区参加国际学术会议的论文均注明学者姓名、参加会议的名称、时间、地点和推荐的单位。上述两类文章均注明由王宽诚教育基金会资助字样。

#### (三)文字种类

本书为学术性文章汇编,均以学术讲座学者之讲稿原稿或参加国际学术会议学者向会议提交的论文原稿文字为准,即原讲稿或论文是中文的,即以中文刊出,原讲稿或论文是外文的,仍以外文刊出。

# **CONTENTS**

| 1.  | Maximum Time-difference Pipelining                                     | 夏培肃(                | 1         | )  |
|-----|------------------------------------------------------------------------|---------------------|-----------|----|
|     | 中国古代的计算技术                                                              |                     |           |    |
|     | General Relativity on Spinorial Space-time                             |                     |           |    |
|     | Homogeneous Catalysis: An Odyssey from Commodity Chemicals to          |                     |           | •  |
|     | Specialty Products                                                     |                     | <b>57</b> | )  |
| 5.  | Recovery and Separation of Metals by Supported Liquid Membrane         |                     | - 1       | •  |
|     | (SLM)                                                                  |                     | 83        | )  |
| 6.  | Histochemistry and Morphology of Porcine Mast Cells                    | ,                   |           |    |
|     | Moisture Control by Air Gaps in Envelopes                              | 00 × 0000 0000 0    |           |    |
|     | Good Air Source for Rise Buildings                                     |                     |           |    |
|     | Metabolism of Propafenone and Lidocaine in Combination in Rats Live    | Agorda Mellon e 191 |           |    |
|     | Supernatant                                                            | 唐耀年(                | 115       | )  |
| 10. | A Model for Selecting Actively Traded Call Options When Trading Volume |                     |           | •  |
|     | is not Available                                                       | 蔡训生(                | 119       | )  |
| 11. | Multiobjective Optimization Design of Transonic Airfoils               |                     |           | 20 |
|     |                                                                        |                     |           |    |

# Maximum Time-difference Pipelining

### Peisu Xia\*

#### Abstract

This article presents the priciple of maximum time-difference pipelining, an effective approach to promote the clock frequency threefold or even fivefold of a vector processor without insertion of storage elements. Its implementation with ECL and CMOS technology is also described.

### 1. The Principle of Maximum Time-Difference Pipelining

A pipeline is composed of two kinds of circuits: the combinational logic circuits and the storage elements as shown in Fig. 1. For each clock, data move from one stage of storage elements to the next one through appropriate routes consisting of different kinds of logical circuits and transmission lines. Some routes may be long, while the others may be short. The clock period  $T_c$  is equal to or longer than the longest route delay between any two adjacent storage stages plus time spent on the storage element and the clock skew, i.e.,

 $T_c \geqslant \max[(T_k)_{\max}]_1^n + T_s + (\Delta T_c)_{\max}$ 

#### Where

 $T_c$  is the clock period;

 $T_k$  is the route delay between two adjacent storage stages;

 $T_s$  is the time spent in the storage element;

 $\Delta T_c$  is the clock skew.

It is well known that in a pipelined processor, the vector performance increases linearly with the clock frequency. To increase the clock frequency, storage stages can be inserted in the pipelines. However, the delay caused by these storage stages will result in an increase of go-through delay which means the time spent on scalar calculation will be lengthened.

To examine the work of pipelines, it is found that the function of some storage stages in a processor is simply for data synchronization. In Fig. 2, suppose there are only two data routes: say r and s, between stages s and s. Router s is longer than route s between stages s and s and s are time with those in route s, then data in these two routes are synchronized, stage s is therefore no longer required and can be removed. There are still two s between stages s and s and s are clock transmits a set of data.

Practically, the number of routes between storage stages is usually more than two. In such case, the lengths of all shorter routes are simply increased to that of route r.

If the short routes are further lefthened, then the clock period  $T_c$  can be decreased so

<sup>\*</sup> 作者夏培肃女士,是中国科学院院士、中国科学院计算技术研究所研究员。由王宽诚教育基金会资助,于 1994年11月在香港大学讲学,此为其公开学术报告的讲稿。



Fig. 1

that there will be more than two clocks between stages A and C.  $T_c$  can be of the value of the difference of the longest and the shortest routes (or slightly larger) as shown in Fig. 3. The expression of  $T_c$  is

 $T_c \geqslant \max[(T_k)_{\max} - (T_k)_{\min}]_1^n + T_s + (\Delta T_c)_{\max}$ If the first term on the right side of Eq. (2) approaches zero, then  $T_c \rightarrow T_s + (\Delta T_c)_{\max}$ 



Fig. 2

This is why we call this method maximum time-difference pipelining. This method was proposed by me in 1968, but not shown to be of practical value till 1986, when we completed the implementation of a very fast processor GF-10/13. In USA, it is now called wave pipelining.

It is obvious that the maximum time-difference pipelining method possesses the following merits:

- Much faster than conventional pipelining in vector calculation without lengthening scalar calculation.
- Fewer storage stages.
- Easier for clock distribution because of the removal of storage stages.



It can also be seen that there exist some shortcomings:

- · Sophisticated design required.
- Padding circuits needed for lengthening the shorter routes.
- More difficulties exist at the system level. The system can only run within a relatively narrow range of clock frequencies. If it operates at a substantially different frequency, the number of clocks between two storage stages might be incorrect.

#### 2. The Implementation of Maximum Time-Difference Pipelining

To implement maximum time-difference pipelining, ECL techology is more suitable than CMOS technology. ECL circuits deliver complemented outputs at the same time and their delays are not much influenced by their loads and the number of input. CMOS circuits are constrained by the following factors:

- · Delays are different for different types of gates.
- For the same gate, delay varies with fan-in and fan-out as well as the input data pattern.
- The complement of an input can only be obtained by using an inverter. However, CMOS offers the advantages of low power and high density, attractive for VLSI



Fig. 4

## applications.

The block diagram of an ECL processor GF-10/13 based on the principle of maximum time-difference pipelining is shown in Fig. 4.

Some data of GF-10/13 are as follows:

· Word length:

32 bits

· Gate delay:

2.0 ns (Motorola 10K)

0.8 ns (Motorola 100K)

· Clock period:

9.8 ns (Motorola 10K)

5.5 ns (Motorola 100K)

· Peak performance:

200 MOPS

· Number of boards:

200 MOI

Transcr or boards

16

· Board size:

 $305 \times 276 \ mm^2$ 

· Cooling:

forced air at room temperature

With the same chips, the clock frequency of GF-10/13 is about 5 times higher than that of our conventional pipelining design. For scalar calculations, GF-10/13 also shows its superiority in go-though time. Table 1 gives the comparison of go-though time of Cray-1 and GF-10/13.

For CMOS technology, rules of logical design are laid down as follows:

- fan-in ≤ 3
- fan-out of 3-input gates = 1
- fan-out of 2-input gates ≤ 2

Table 1

|                          | Cray-1          | GF-10/13        |                 |
|--------------------------|-----------------|-----------------|-----------------|
| $\bullet_{p_d}(ns)$      | 0.7             | 2.0             | 0.8             |
| $T_c(ns)$                | <b>12.</b> 5    | 9.8             | 5.5             |
| addition time (ns)       | 25<br>(24 bits) | 30<br>(32 bits) | (36 bits)       |
| multiplication time (ns) | 75<br>(24 bits) |                 | 50<br>(32 bits) |

#### 

#### To summerize,

fan-in + fan-out < 4 (buffers not included).

Chips of 36-bit arithmetic unit and  $8 \times 8$  multiplier are implemented with  $1.5 \mu$  CMOS laser cut gate array technology. These chips are designed and simulated with the help of ORCAD and SPICE. Pipelinable carry look-ahead scheme is adopted in the design of 36-bit arithmetic unit as shown in Fig. 5. Modified Booth's algorithm and Wallace tree are employed in  $8 \times 8$  multiplier as shown in Fig. 6. The measured values of go-through time and clock period of these two chips are also shown in Figs. 5 and 6. It can be seen that the speedup is 2.7.



Go-through time:

45 ns

Clock period:

16.75 ns \*

Speedup:

2.7

Fig. 5

#### 3. Conclusion

Maximum time-difference pipelining is an effective approach of boosting the clock frequency of a pipeline without increasing storage elements. Speedup of 5 and 2.7 are obtained for units built with SSI and MSI ECL chips and with laser cut gate array CMOS chips respectively. Custom-design will improve both the performance and the speedup of CMOS implementation. Currently, more complicated chips have been designed and implemented with rebuilt library cells suitable for maximum time-difference pipelining. Meantime, chips of different technologies including BiCMOS, ECL and GaAs are under study.



Go-through time:

62 ns

Clock period:

23 ns

Speedup:

2.7

Fig. 6

. 1

# 中国古代的计算技术

## 夏培肃\*

中华民族是一个有几千年灿烂文化的民族。中国古代有很多发明创造。除了指南针、 造纸、火药、印刷术四大发明以外,在计算技术方面也有辉煌的成就。现在就以下四种 发明做一些介绍:

- 1. 十进制记数系统
- 2. 筹算
- 3 珠算
- 4. 二进制位(bit)
- 1. 十进制记数系统

中国自从有文字记载以来就使用十进制记数法,在河南安阳出土的殷墟甲骨文记录了商代(公元前十六世纪至公元前十一世纪)的文化。

- 1-9的表示符号为:
  - 1 —
  - 2 =
  - $3 \equiv$
  - 4 ≣
  - 5 X
  - 6 / 介 台
  - 7 +
  - 8)(
  - 9 5 %



图 1 甲骨文中的数字

<sup>\*</sup> 作者夏培肃女士,是中国科学院计算技术研究所研究员、中国科学院院士,由王宽诚教育基金会资助,于 1994年11月,在香港大学讲学,此为其讲稿之一。