Survive the Data Deluge

DN Staff

October 6, 2003

7 Min Read
Survive the Data Deluge

Today's tests can produce thousands or millions of values from dozens or hundreds of sensors. It's easy to get swamped in growing waves of data that will become useless unless saved in some meaningful way. In many test labs, the engineers-the data creators-recognize the need to save test data in a form that lets others-the data users-retrieve, analyze, and report results. Without some way to catalog and retrieve test data though, it becomes difficult or impossible to compare and correlate test results acquired at some expense.

As a first step, engineers can employ a spreadsheet such as Excel or a database such as Access or Paradox to save test results. Someone in the lab may take on the ad hoc role of "librarian" to manage the flow of test results, catalog them, and arrange for their storage, often on a server. This person also may try to enforce some standards for data-acquisition and data formatting. Those standards should involve a model that specifies what data gets saved, the formats for the data, and so on. This process requires considerable planning, because after establishing a format and using it, it should remain fixed.

Although that sort of grass-roots approach can work, spreadsheets and databases don't dictate any structure or protocol for saving test results. So, even though a lab adopts a specific format, other users will not know what the columns of data represent and how they relate to one another. In addition, engineers can complicate the task of sharing and correlating test results.

Most software developed for office use adapts poorly to engineering applications that manipulate thousands of values from many sources. Excel, for example, doesn't let users copy a report from one sheet and apply it to data in another sheet. Custom applications aren't much better. Although programmers can produce data-extraction and data-analysis scripts in Visual Basic, C, or another language, someone must maintain this proprietary software and keep up with upgrades of underlying commercial software.

Because automobile manufacturers produce so much test data, they were among the first to recognize the need for a standard data-storage model. Just imagine an automobile producer that runs over 100 engine test cells and accumulates data day after day, and you can appreciate the data-overload problem they face.

Late in 1998, 33 auto manufacturers and equipment suppliers gathered in Stuttgart, Germany to form the Association for the Standardization of Automation and Measurement Systems (www.asam.net). It now includes close to 100 member companies, and its efforts include standards that cover data storage as well as automatic calibration of instruments, generic device interfaces, and the description and integration of measurement-control systems.

DATA INVASION: It looks like a battle plan, but instead it's a map of information from a vehicle test moving into a database established using the ASAM's Open Data Services standard. A base model accommodates data from similar tests, which makes it easy to share information across a company.

The ASAM's data-storage standard, called Open Data Service (ODS), covers methods for saving data in a way that makes it easy to access, share, analyze, and report information. The flexible standard can apply to many endeavors and different types of data, so even if engineers don't work in automotive industries, they still can adopt the ASAM-ODS standard.

At its simplest, the ASAM ODS standard includes four parts:

  • A common data model that provides a base structure for data.

  • A set of application program interface (API) tools that create and access the data, regardless of what system actually stores it.

  • An ASAM Transport Format (AFT) that lets different systems exchange data in a standard format.

  • A model for the physical storage of data that allows for sharing data and the use of structured-query-language (SQL) commands.

The data model specifies base elements and base relationships. The base elements define physical units such as Newtons and centimeters, and they also define the physical quantity, such as force and distance. A base relationship links a unit and a quantity so software will know a given test measures force in units of Newtons, for example. Base elements also cover information such as physical dimensions, test equipment in use, test sequences, and so on.

Finally, an application model defines the elements in use for a specific test. An engine test and stress test would require different elements, which the standard allows through the use of different application models. Data does not have to force fit into a rigid structure, as long as a model defines the format for each type of test.

A database management system (DBMS) such as Oracle or DB2, noted as a server application in the ODS documents, has no knowledge of the ASAM data model. A DBMS simply stores information, whether it comes from financial accounts or from engine tests. So to provide "structure" for test data in a standard database, the ODS specification includes an API that links applications software, also called a client application, with the appropriate databases. The API defines how a DBMS will create databases that use the data model, and how applications can save and retrieve this information. The API includes the capability to "map" older preexisting data, which doesn't comply with ODS models, into useful formats.

Because some applications require that different servers and databases share test data, the ODS standard includes a transport-format specification that uses the ubiquitous ASCII characters. These characters communicate text well, but they're inefficient at transmitting numeric values. Thus the standard allows numeric data to exist in binary files. (A separate, associated ASCII file defines the structure of the data in the binary file.) It also defines a syntax that programmers can use to parse ASCII data into an ODS-compliant data structure.

The physical data-storage portion of the standard sets out the structure for a relational database that will save the information that arrives from a test. The structure specifies the tables, table elements, and other requirements that a database specialist would need to know about to set up a system for actual use.

The ASAM's ODS standard lays the groundwork for a practical way to save data in a form that lets data creators and data users access information without worrying about who saved the data and the format they used to save it. But although a standard exists, engineers and managers cannot yet buy "shrink-wrapped" software that performs all of the necessary data-management tasks. For now, companies must rely on consultants, and they must commit a significant sum to put an ASAM ODS-compliant system in place.

Potential users should keep in mind that the ODS standard does not include applications that manipulate the data and generate reports. Shrink-wrapped software packages such as DIAdem, NWA Quality Analyst, Origin, MatLab, and Mathcad offer these and other capabilities. (Many software packages can access databases using SQL commands and a variety of protocols.) So, small to medium companies may decide to follow the ASAM ODS standard as a guide and implement their own systems using software such as the packages noted above, which work right out of the box.

Physical storage matters

Although commercial and homegrown database standards don't typically specify physical media, users should consider this implementation detail. Even if you store data in a form that will have lasting value, you may not have the capability to obtain that data when you need it. Anyone familiar with 8-track cartridges or Betamax videocassettes knows that physical media go out of date.

Even if a company saves data on a server, eventually it must save large quantities of older, unused information on some medium-usually magnetic tape-for permanent storage. If engineers expect to need the information, they must ensure they will be able to extract the information as they need it. Magnetic tape can preserve information for as long as 30 years, but a company must maintain magnetic-tape readers in tip-top condition for the same period!

The same fast-paced technology that makes older storage media obsolete also has a benefit. It may reduce the need to save old data. A company that manufactures cell phones, for example, may not need to save test data for phones made several generations ago.

Sign up for the Design News Daily newsletter.

You May Also Like