Table 1 presents an analysis of the storage used for all captured documents. From March, 1996 to June 1998, 41,174 documents with a total of 169,643 pages were captured by the system. Of these, 2468 originated on the copier. A total of 9 GB were needed to store all the information needed by the system. Analysis of printed vs. copied documents shows that on average a printed document contains 4.7 pages. A copied document contains an average of 3.5 pages. The complete data set contains 94% printed documents and 6% copied documents.
| time | source¹ | number of docs | number of pages | total storage | storege per page |
|---|---|---|---|---|---|
| March 1996 to June 1998 | printers + copier | 41,174 | 169,643 | 9.0 GB | 56 KB |
Table 2 presents an analysis of the source of accessed documents. These data were produced by analyzing the web server logs from a 3 month period (April - June, 1996). Every 72 dpi page image viewed by one of the users had been recorded along with the time of that access. The date when the image had been created was also available. A differentiation was also made between documents that had originally been printed and those that had been copied. The results in Table 2 show that of the 1261 pages viewed, 67% of them had originated as printed documents, 28% had been copied. This is much higher than the proportion of copied documents in the population (6%) and suggests, not surprisingly, that copied documents are used more frequently (relative to their sample size) than printed ones.
| printer | copier | misc. |
|---|---|---|
| 848(67%) | 354(28%) | 59(5%) |
The age of accessed documents is presented in Table 3. It is seen that 19% of the pages accessed between April and June of 1998 were originally created in 1997. 12% were created in 1996. That is, a significant percentage were captured about two years before they were accessed. This suggests that users find old documents useful






