Selective detection of hazardous VOCs for indoor air quality applications using a virtual gas sensor array

An approach for detecting hazardous volatile organic compounds (VOCs) in ppb and sub-ppb concentrations is presented. Using three types of metal oxide semiconductor (MOS) gas sensors in temperature cycled operation, formaldehyde, benzene and naphthalene in trace concentrations, reflecting threshold limit values as proposed by the WHO and European national health institutions, are successfully identified against a varying ethanol background of up to 2 ppm. For signal processing, linear discriminant analysis is applied to single sensor data and sensor fusion data. Integrated field test sensor systems for monitoring of indoor air quality (IAQ) using the same types of gas sensors were characterized using the same gas measurement setup and data processing. Performance of the systems is reduced due to gas emissions from the hardware components. These contaminations have been investigated using analytical methods. Despite the reduced sensitivity, concentrations of the target VOCs in the ppb range (100 ppb of formaldehyde; 5 ppb of benzene; 20 ppb of naphthalene) are still clearly detectable with the systems, especially when using the sensor fusion method for combining data of the different MOS sensor types.


Introduction
The quality of indoor air (IAQ) is determined by the contamination of the air with various chemical compounds, such as carbon dioxide (CO 2 ), carbon monoxide (CO), nitrogen dioxide (NO 2 ) and volatile organic compounds (VOCs).Several investigations have been performed to determine the occurrence of these substances in indoor air, e.g., by Bernstein et al. (2008) or in European projects like the Airmex study (Geiss et al., 2011) and the INDEX project (Koistinen et al., 2008).
Negative health effects of exposure to these substances, even at low concentrations, mainly including the respiratory system and skin irritations, have been observed (Jones, 1999).Additionally, some VOCs (e.g., benzene) are carcinogenic, while others (e.g., formaldehyde) are suspected to be carcinogenic (Gou et al., 2004).
Hazardous VOCs pose a special problem.Despite that threshold limits for single substances are recommended for indoor air, e.g., by the WHO (World Health Organization, 2010), there is currently no online measurement technology commercially available to identify and quantify different volatile organic substances reliably and at reasonable cost.Monitoring total VOC (TVOC) concentrations is state of the art (Umweltbundesamt, 2007), but this parameter is not significant in terms of health effects since it also includes benign substances and cannot be attributed to symptoms like the sick building syndrome (Burge, 2004;Brinke et al., 1998).Selective VOC detection and quantification is today based on gas sampling and analytical techniques, especially gas chromatography coupled with mass spectrometry (GC-MS; Wu et al., 2004).The resulting high cost for individual measurements prevents ubiquitous VOC monitoring in IAQ applications today.
A possible application for selective VOC monitoring is demand-controlled ventilation in smart buildings.VOC levels can be used as an additional parameter for controlling indoor ventilation in addition to other indicators like temperature and CO 2 levels.Then, selective measurement of single VOCs is necessary since ventilation should be increased only if thresholds of hazardous VOCs are exceeded.
From the wide range of VOCs, three compounds were selected for further investigations on selective detection: formaldehyde, benzene and naphthalene, which are three of the first priority harmful VOCs (Koistinen et al., 2008;World Health Organization, 2010).The selected target concentrations of these gases are 10 ppb for formaldehyde, 0.5 ppb for benzene and 2 ppb for naphthalene, based on international and European national regulations (e.g., World Health Organization, 2010;French decree no. 2011French decree no. -1727French decree no. , 2011;;Sagunski and Heger, 2004).For benzene, the World Health Organization even states that there is no safe level due to its high carcinogenicity (World Health Organization, 2010).Thus, not only a high selectivity is required for identifying these gases but also a very high sensitivity in order to detect ppb levels of these specific VOCs.
One type of sensors which can detect VOCs in this concentration range is a metal oxide semiconductor (MOS) gas sensor (Schüler et al., 2013).MOS sensors in temperature cycled operation (TCO) are used here to measure the selected VOCs against a high background of interfering gas, similar to Reimann and Schütze. (2012).These sensors were also integrated in low-cost sensor systems designed for field testing and as a basis for future commercial online VOC monitoring devices.

TCO optimization
Semiconductor gas sensors are very sensitive sensors, but usually they are broadband sensors and show little selectivity to specific gases.One method to improve selectivity, sensitivity and also stability is temperature cycled operation  (Fricke et al., 2014).(Lee and Reedy, 1999;Gramm and Schütze, 2003;Schüler et al., 2013).By modulating the operating temperature of the MOS sensing layer, different states of the sensor material itself (i.e., surface coverage with oxygen) and its interaction with gas molecules are activated, and thus different sensing characteristics are obtained.Figure 1 shows normalized sensor signals of the same temperature cycle when different gases are applied to a MOS gas sensor.The differences of the recorded signal shapes (i.e., slopes, average values in different sections) are obvious; these features are characteristic of specific gases.
Three types of ceramic substrate MOS gas sensors were evaluated for detection of the target VOCs: GGS 1330, GGS 2330 (both SnO 2 based) and GGS 5330 (WO 3 based) by UST Umweltsensortechnik GmbH (Geschwenda, Germany).
A method for optimizing the TCO cycle was evaluated.In order to find the most sensitive and most selective temperature transitions, the relaxation behavior from a high temperature to different lower temperatures was investigated.Specifically, temperature changes from 400 to 200 • C, 250 • C and 300 • C were performed with a GGS 1330 SnO 2 sensor with benzene and ethanol as test gases.The results are shown in Fig. 2.
The sensor response was calculated by dividing the sensor signal (conductivity of the sensitive layer) of a cycle in gas by the sensor signal of a cycle in pure air for each point of the cycle.The response has distinct peaks several seconds after the temperature steps from the high temperature to the lower temperatures -e.g., for ethanol 50 s after changing the sensor temperature from 400 to 200 • C. The sensor response after cooldown from 400 to 200 • C reaches approx.67 for ethanol and then drops to approx.9 at the steady state.Thus, the sensitivity is significantly increased in TCO mode due to non-equilibrium state of the sensor surface after temperature changes (Sauerwald et al., 2014).For benzene, the sensor response rises to 2.1 at the peak 36 s after the temperature transition from 400 to 250 • C compared to 1.3 at the steady state, corresponding to an almost 4-fold increase in sensitivity.
Based on these results, the temperature steps and the lengths of these steps were defined.For the SnO 2 sensors (GGS 1330 and GGS 2330) a two-step temperature cycle with ramp transitions was chosen (see Fig. 3).The ramps were implemented in order to achieve a defined heating up and cooling down of the sensitive layer independent of ambient temperature and humidity.The durations of the ramps are the result of the heating and cooling characteristics of the sensors.Due to the size of the ceramic substrates, heating up the sensors takes up to 20 s and cooling down even longer, up to 30 s.These are the values chosen for the respective ramps.
The length of the low temperature step is 100 s to cover all the response peaks of the previous optimization measurement.The total duration of the temperature cycle is 180 s, which is sufficiently short for the target application in IAQ monitoring.
The WO 3 -based sensor (GGS 5330) did not show any delayed response maxima, but only a temperature-dependent response.Thus, a simple ramp up and down between 400 and 200 • C was selected covering the range of maximum sensitivity to the target gases (Fig. 4).The duration of a cycle is 60 s; to synchronize all sensors, three cycles of the WO 3based sensor are run during one cycle of the two SnO 2 -based sensors.

Sensor characterization measurements
The three target VOCs were applied in two concentrations each: one at the respective threshold limit value and one at the 10-fold value.Additionally, the measurements were performed with two concentrations of ethanol as a background interference gas and two values for the relative humidity (RH).Table 1 gives an overview for all concentration and humidity values.
The measurements were conducted with a gas mixing system which was designed and set up specifically for trace gas generation with wide concentration ranges by Helwig et al. (2014).The VOCs were diluted into a carrier gas stream of synthetic air (purity 5.0) either from a gas cylinder or from a permeation furnace.Total gas flow was 200 mL min −1 ; the three sensors were set up in a stainless steel sensor chamber.Each of the 36 VOC gas configurations was applied for 30 min; between the VOC exposures the sensors were flushed with background (humid air plus ethanol) for 30 min to allow their return to the baseline and prevent carryover.The complete data set contained 940 temperature cycles for the SnO 2 -based sensors and 2820 cycles for the WO 3 -based sensor.Not all of the cycles were used for signal processing; for the "background" groups without the target VOCs, six sections with a length of approx.15 SnO 2 cycles each were selected, one after each change of the background conditions (humidity, ethanol).

Sensor characterization
As a first analysis of the data, quasistatic sensor signals are examined.These are generated by choosing one point of the temperature cycle and extracting the signal value at this point in the cycle for every cycle of the measurement.These values are then plotted over the respective cycle number, which generates a plot of the sensor signal of a specific point of the cycle over time.An example is given in Fig. 5.
The sensor reactions to all target gases and especially to the ethanol background are clearly visible.This method is  helpful for checking the nominal performance of the gas mixing system and to check the general response of the sensors to the gases.It is independent of the pattern recognition data analysis.
For further signal processing, the method of linear discriminant analysis (LDA) is applied (Gutierrez-Osuna, 2002).This pattern recognition technique can be used to separate different classes of input data while grouping data sets of the same type.In this case, it is used to assign the temperature cycle sensor signals to the different target gases.Thus, in the resulting plots, the algorithm should arrange all cycles of each target gas and background into one compact group while separating the groups of the different target VOCs and the background without VOCs from each other.
The approach used here is basically the same as presented by Bur et al. (2014).Input data sets for the LDA algorithm ("training", i.e., determination of LDA coefficients for the projection, and evaluation) are generated by extracting a set of features from each temperature cycle sensor signal.The temperature cycle is divided into several sections; 20 sections were chosen for the 180 s cycle for the GGS 1330/2330 sensors (see Fig. 6).From each section, features are calculated, in this case the mean value of the sensor signal and the slope of a linear fit.These features were chosen with regard to later implementation of the LDA calculations on the field test system microcontroller since they are easy to calculate.For the GGS 5330 sensor, the 60 s cycle was divided into 14 sections.This generates a data set of 40 (28) values for each sensor for each temperature cycle, which is used as input for the LDA.
As mentioned above, in the presented measurement the aim is identification of the target VOCs.The extracted data sets were therefore assigned to four groups, one group for each target VOC and one for the background gas without any of the three targets.Each of the three VOC groups thus contains the cycles that ran during the application of one VOC with both VOC concentrations, both gas humidities and all ethanol backgrounds, i.e., a total of 12 different conditions.The "background" group contains sections of synthetic air with both humidities and all ethanol background concentrations.
The result of the LDA calculation for the GGS 1330 sensor is shown in Fig. 7. Separation of the four groups is quite successful, but there is still some overlap.As a validation of the results, leave-one-out cross-validation (LOOCV) is performed (Gutierrez-Osuna, 2002).This method checks how many feature vectors are classified correctly if the LDA is trained by all other vectors.For the GGS 1330 sensor, 98.9 % of the 435 used data sets are classified correctly if the method of k nearest-neighbors classification (kNN, k = 5) is applied.So despite the overlap of the groups, nearly all TCO feature sets are assigned to the correct gas.  Figure 8 shows the result of the LDA for the GGS 2330 sensor.Separation of the groups does not appear quite as distinct as for the GGS 1330 sensor, especially with formaldehyde and benzene having slightly more overlap.Leave-oneout cross-validation with kNN results in a correct classification of 96.6 % of all temperature cycles.
The result for the GGS 5330 sensor (Fig. 9) also shows a partial overlap of the groups, especially for formaldehyde and air; compared to the GGS 2230, however, the validation shows a slightly higher number of correct classifications at 98.4 %.
In addition to evaluating the single sensors, a combined processing of the data from the sensors is applied.In this sensor fusion, the feature vectors of two or three sensors are merged into a single data set for each temperature cycle, e.g., a 108-value vector for fusion of all three sensors.LDA  calculation with the combined data results in a much better separation of the groups, shown in Fig. 10 for the combination of all three sensor types.Now there is no overlap of the gas groups.Validation shows a classification accuracy of 100 %; all temperature cycles are classified correctly.

Field test sensor system characterization
For use in field tests, the sensors were integrated into field test electronics (Conrad et al., 2014).The systems are designed to operate two MOS gas sensors independently in temperature cycled operation, with different temperature cycles.Each sensor is mounted on a plug-in PCB (printed circuit board), which also contains an EEPROM (electronically erasable programmable read-only memory) for calibration data and LDA parameters of the individual sensor.With this setup, fast replacement of a sensor is possible without having www.j-sens-sens-syst.net/3/253/2014/ Figure 11.Exterior view of modular field test sensor system containing electronics (PCB) with two MOS gas sensors (Conrad et al., 2014).
to perform a new calibration of the overall system.The sensor signals are acquired at a rate of up to 10 ksps and are stored on an SD memory card, which also contains general configuration data and the temperature cycle data sets.An on-board sensor measures air temperature and humidity; in addition, the system can be equipped with a dual-beam NDIR (nondispersive infrared) CO 2 sensor.Online preview of the measured data is possible via a selection of communication interfaces.The electronics are installed in a polymer housing (Fig. 11).
The performance of the systems was determined using the same test gas profile as for the sensor characterization in the stainless steel sensor chamber (Table 1).Three systems were characterized simultaneously, each equipped with two different UST gas sensor types with the temperature profiles identified during the lab optimization.A total of six MOS sensors were operated, two of every type; one sensor of every type was used for offline LDA signal processing.The systems were placed in a stainless steel measurement chamber with a volume of 3.5 L. The total gas flow was set to 800 mL min −1 , resulting in an air exchange rate of 13.7 ach (air changes per hour).Signal acquisition, pre-processing and feature extraction was performed identically to the characterization measurement of the sensors in the stainless steel sensor chamber.
The LDA result obtained with data from one of the GGS 1330 sensors is shown in Fig. 12.
Separation of the different gases is significantly less successful compared to the sensor characterization measurement (Fig. 7).Each VOC group is split into two sub-groups, reflecting the two tested VOC concentrations.While the higher concentrations are still discriminated from the background, the lower concentrations can no longer be separated from the background group.Using LOOCV, only 71.7 % of temperature cycles are now classified correctly, a significant reduction compared to the result of the sensor in the stainless steel sensor chamber which achieved 98.9 %.Similar results were obtained for the other two sensor types.In the plot of the GGS 2330 LDA output (Fig. 13), the data groups of naphthalene and especially benzene are hardly separated from the background group and only the high formaldehyde concentration can be clearly discriminated.Only 66.1 % of the temperature cycles are assigned to the correct gas.
The GGS 5330 type sensor is much more sensitive to benzene and naphthalene compared to formaldehyde.This clearly shows in the LDA result (Fig. 14), where both formaldehyde concentrations are plotted overlapping with the background group.However, only the high concentrations of benzene and naphthalene are separated from the background, while the lower concentrations are not.The ratio of correct classifications is 62.0 %.Data fusion was applied to the field test system sensor data as well; the resulting LDA plot for fusion one sensor of each of the three sensor types is shown in Fig. 15.As for sensor characterization setup, discrimination of the gases is significantly improved.Not only can the high concentrations of all three target gases be clearly discriminated, but now also the low concentrations are separated more clearly from the background compared to the results obtained with the individual sensors in the systems.LOOCV yields 83.4 % of all temperature cycles classified correctly, an improvement of 11.7 % over the best single sensor (71.7 % for the GGS 1330).
The results of the LDA validations for all the sensors and all possibilities of sensor fusion are listed in Table 2.For the sensors in the stainless steel sensor chamber, fusion of two sensors -GGS 1330 combined with any of the other two sensors -is already sufficient for reliable identification of the VOC.For the sensors integrated in the field test sensor systems, fusion of all three sensors is necessary for best selectivity.Detailed LOOCVs of the LDA results of the sensors integrated in the systems are listed in Table 3.The different gas sensitivities of the three sensor types are clearly shown by the validation results for the different VOCs.While the GGS 1330 sensor has a similar sensitivity to all the gases, the GGS 2330 has an enhanced sensitivity to formaldehyde and a reduced sensitivity to the other two target VOCs.The GGS 5000 sensor is not very sensitive to formaldehyde but has higher numbers of correct classifications for naphthalene and especially benzene compared to the GGS 2330.These values show that the data from the different sensor types can reasonably be used in sensor fusion as the sensors complement each other in their responses to the target gases.
The reason for the reduced sensitivity of the sensors integrated in the field test systems was investigated further (Leidinger et al., 2014).As the main problem, gas emissions from the sensor system hardware components were determined.These emissions were identified and quantified using analytical methods, namely GC/MS VOC measurements according to the ISO 16000 standard.For gas sampling, Tenax tubes were inserted into the outlet gas flow of the stainless steel measurement chamber containing three field test systems.Due to the requirements of this sampling method, air flow had to be reduced to 120 mL min −1 or 2.06 ach.The most significant results of the GC/MS analysis of the gas samples are listed in Table 4.The results obtained with the low flow rate were converted to the high flow rate of 13.7 ach used for the system characterization measurements, assuming that the gas emission rate from the systems is constant and independent of the gas flow at these air exchange rates.The conversion factor is 0.15, which is the ratio of the two gas flows (120 mL min −1 vs. 800 mL min −1 ).
The TVOC value (last row Table 4) proves that there are significant VOC emissions from the systems, especially when heated up during operation.Measured TVOC emissions of three operating systems increase by a factor of approx.20 compared to the unloaded test chamber and a factor of 12 compared to the systems being switched off and at room temperature.Thus, VOCs are produced by the systems, i.e., outgassing from either the PCB or the polymer housing (cf.Fig. 11).This is also confirmed by the reduced contamination observed after a heat treatment of the field test sensor systems (cf.last row in Table 4).Looking at the specific gases, the amount of benzene measured is especially conspicuous.A concentration of 11.4 µg m −3 was determined, corresponding to 3.6 ppb.This strong benzene background, generated by the systems themselves, readily explains the reduced sensitivity to the applied benzene concentrations, especially the lower concentration of 0.5 ppb, compared to the single sensor measurements.
Naphthalene is not emitted from the systems in relevant amounts; the concentration measured with the systems operating is 0.2 µg m −3 or 0.04 ppb.Similarly, the concentration of formaldehyde was 1.2 ppb, or only 10 % of the lower test gas concentration of the calibration measurement.The most significant compound identified in the GC/MS analysis is 1,2-dimethoxyethane, with 168.8 µg m −3 or 45.8 ppb.The origin of this substance could not be determined.
Despite the high contamination levels, discrimination is still possible for the high concentrations of the target gases, as shown for the GGS 1330 sensor in Fig. 16.The high concentrations of formaldehyde and naphthalene can be mostly separated from the background.For benzene, discrimination does not seem as clear, but LOOCV shows that 94.5 % of temperature cycles are classified correctly.Sensor fusion further improves discrimination.Figure 17 shows the LDA  5.The results can be improved further by calculating 3dimensional LDAs.Then the ratio of correct classifications reaches more than 99 % with sensor data fusion (Table 5, last column).One example of the 3-D LDA plot is given in Fig. 18.
A method to prevent or at least reduce gas emissions from the systems (PCB and housing) is heat treatment of the devices.This was performed in a climate chamber where the systems were kept for 13 h at 70 • C inside the stainless steel chamber while pure air was continuously flowing through the chamber in order to flush out all emissions from the systems.Afterwards, another gas sample was taken; see Table 4, last column.Obviously, VOC emissions have been reduced significantly by approx.40 %, but are still more than 7 times higher compared to the unloaded test chamber.Thus, further heat treatment at higher temperature and/or different materials for the housing are required to achieve acceptable contamination levels of the integrated sensor systems.We have demonstrated that standard metal oxide semiconductor gas sensors operated in dynamic mode using TCO can detect and identify hazardous VOCs at ppb and sub-ppb levels, even in the presence of a much higher background concentration of ethanol (up to a factor of 4000 higher compared to the lower benzene concentration in the measurements).
In the sensor characterization measurements, when the sensors were installed in a stainless steel sensor chamber, the data sets from the sensor signals, containing several ethanol concentrations as well as gas humidities, could be assigned to the correct target gas with high reliability using a one-step LDA algorithm.The results of the data evaluation were improved significantly by sensor fusion, i.e., based on features obtained from two or three different sensors.For this measurement, 100 % of the temperature cycles were assigned to the correct gas by this method as verified by LOOCV.Further optimization of the sensor performance, e.g., using hierarchical data analysis (Schütze et al., 2004) or taking into account the information of further sensors, will be studied in the future.
For the integrated field test systems, however, the classification rate was reduced significantly compared to the sensor tests.Even with sensor fusion, only 83.4 % of the temperature cycles were classified correctly.This was attributed to VOC gas emissions from the system hardware, which have a profound effect on the performance of the individual sensors and the combined sensor array; the sensing capabilities are clearly impaired by the VOC emissions.Using only the high test gas concentrations for LDA processing, the ratio of correct classification rises to more than 95 % in a 2-D LDA and over 99 % in a 3-D LDA.These VOC concentrations, still in the ppb range, can be identified by the systems with a high success rate.
A first test of baking out the system showed promising results, as VOC emissions were significantly reduced.Separate heat treatment of the PCB and the housing would allow for application of a higher temperature to the PCB and should reduce gas emissions even further.The expected positive influence of reduced emissions on the sensing performance of the integrated sensor systems will be verified in future experiments.With these optimized integrated sensor systems field tests will be carried out in various typical indoor environments, e.g., offices and meeting rooms, to validate the performance of these systems for continuous monitoring of indoor air quality.

Figure 1 .
Figure 1.Temperature cycle (solid line) and normalized temperature cycle sensor signals (UST GGS 1330) in the presence of different gases.

Figure 2 .
Figure 2. Sensor responses to 25 ppb of benzene and 500 ppb of ethanol during TCO optimization cycle at 12.5 % relative humidity (Fricke et al., 2014).

Figure 4 .
Figure 4. 60 s temperature cycle for the GGS 5330 WO 3 -based sensor; three cycles are run in order to synchronize the signals with the 180 s cycle for the SnO 2 sensors.

Figure 5 .
Figure 5. Section of the quasistatic sensor signal, UST GGS 1330, 60 %RH; the selected point of the cycle is the end of the low temperature step at 99 s (see Figs. 3/6).

Figure 6 .
Figure 6.Selected feature ranges of the GGS 1330 and GGS 2330 sensor.

Figure 10 .
Figure 10.LDA plot based on data fusion of all three sensors.

Figure 12 .
Figure 12.LDA plot of the lab characterization of a UST GGS 1330 sensor integrated in a field test system.

Figure 13 .
Figure 13.LDA plot of the lab characterization of a UST GGS 2330 sensor integrated in a field test system.

Figure 15 .
Figure 15.LDA plot based on data fusion of three sensors (one of every type) integrated in field test systems.

Figure 17 .
Figure 17.LDA plot based on data fusion of a GGS 1330 and a GGS 5330 integrated in field test systems, evaluated only for the high VOC concentrations.

Figure 18 .
Figure 18.3-D LDA plot based on data fusion of three sensors (one of every type) integrated in field test systems, evaluated only for the high VOC concentrations.

Table 2 .
List of leave-one-out cross-validation results with kNN-5 for the LDAs of the single sensors and sensor fusions Figure 14.LDA result plot of the lab characterization of a UST GGS 5330 sensor integrated in a field test system.

Table 3 .
Detailed LOOCV results of the LDAs of the field test system MOS sensors; ratio of correct classifications for the single groups and overall.
Figure 16.LDA result plot of the lab characterization of a UST GGS 1330 sensor integrated in a field test system, evaluated only for the high VOC concentrations.

Table 4 .
Measured contaminations caused by outgassing of the field test system, converted from 2.06 ach to 13.7 ach; in µg m −3 according to the ISO 16000 standard (n.d.: not detectable; n/a: data not available).

Table 5 .
List of LOOCV results for the 2-D and 3-D LDAs of the single sensors and sensor fusions of the field test system sensors for the high VOC concentrations.