Is obsidian sourcing about geochemistry or archaeology? A reply to Speakman and Shackley

Is obsidian sourcing about geochemistry or archaeology? A reply to Speakman and Shackley

Journal of Archaeological Science 40 (2013) 1444e1448 Contents lists available at SciVerse ScienceDirect Journal of Archaeological Science journal h...

147KB Sizes 1 Downloads 35 Views

Journal of Archaeological Science 40 (2013) 1444e1448

Contents lists available at SciVerse ScienceDirect

Journal of Archaeological Science journal homepage: http://www.elsevier.com/locate/jas

Commentary

Is obsidian sourcing about geochemistry or archaeology? A reply to Speakman and Shackley Ellery Frahm* Department of Archaeology, The University of Sheffield, Northgate House, West Street, Sheffield S1 4ET, United Kingdom

a r t i c l e i n f o

a b s t r a c t

Article history: Received 14 August 2012 Received in revised form 25 September 2012 Accepted 1 October 2012

The response from Speakman and Shackley to my paper highlights a number of important issues currently facing archaeological sourcing research. Many of these issues, however, have little to do with HHpXRF itself and more to do with an artificial crisis triggered by specialists’ concerns about a hitherto restricted technique becoming available to a wider community. Despite their mantra of “validity and reliability,” Speakman and Shackley erroneously equate these two concepts with the accuracy of an instrument’s measurements. Additionally, they mischaracterise a self-contained test, conducted with specific parameters (i.e., “off-the-shelf” operation), as an endeavour to facilitate or endorse poor-quality data. Paradoxically, their disparagement of experimental internal consistency as “silliness” is incongruous with their own data. Furthermore, a discussion focused on handheld instruments is obfuscated by their inclusion of desktop instruments that are “portable” only in the sense of luggable to an electrical outlet in a laboratory context. Most troubling is that they envision themselves as the arbiters of science vs. “playing scientist.” Contrary to their claims, HHpXRF proliferation will improve reproducibility and archaeological results. Ó 2012 Elsevier Ltd. All rights reserved.

Keywords: Portable XRF Obsidian sourcing Accuracy Reliability Validity

1. Introduction Speakman and Shackley (hereafter S&S) hold that my experiment, meant to test the limits of handheld portable XRF (HHpXRF) for obsidian sourcing, promotes research in which (1) the results are (only) “internally consistent,” (2) “results cannot be verified externally,” and (3) “reliability and validity no longer is [sic] the cornerstone of scientific inquiry.” To the contrary, I assert that validity and reliability are essential in sourcing and that reproducibility among researchers may be achieved through correlating measurements. S&S, however, mistakenly equate validity, reproducibility, and correlation with geochemical accuracy. Furthermore, S&S demonstrate that their own data are only internally consistent, and they claim that, every time labs compare data, there are systematic offsets that must be “corrected.” Consequently, it is nonsensical that S&S object to the accuracy of my raw measurements, collected with the express purpose of testing a HHpXRF instrument without special calibration, relative to fully calibrated data. The goal of my experiment was not to achieve accuracy but rather to determine if the analyser, under particular test conditions, could discern the origins of obsidian artefacts. Our

* Tel.: þ44 74 0299 0202. E-mail addresses: e.frahm@sheffield.ac.uk, [email protected]. 0305-4403/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jas.2012.10.001

approaches are equally verifiable and valid, but sourcing “results” are more than merely reporting an instrument’s measurements. Obsidian sourcing is a kind of archaeological typology, where “results” are artefacts’ source assignments, not X-ray counts. 2. Why conduct an experiment? S&S wonder why I would conduct an experiment in which the accuracy of measurements is, at least initially, ignored. On one hand, there are two dozen articles that use HHpXRF for obsidian sourcing. Most report correct source assignments despite systematic differences between HHpXRF and labXRF data (e.g., Nazaroff et al., 2010; Millhauser et al., 2011). On the other hand, XRF experts proclaim these studies are flawed (e.g., Shackley, 2010). The wider archaeological community faces an apparent contradiction: the tantalisingly successful use of HHpXRF for sourcing (even when the geochemical measurements are inaccurate) vs. the assertion that HHpXRF studies must always follow experimental protocols developed for lab-based systems. Can artefacts’ source assignments be correct if the authors did not use 40 standards or third-party calibration software? Hence, a test: take a challenging archaeological context, ignore “decades of protocol” for labXRF, forego efforts to optimise accuracy, and see if artefacts’ source identifications are correct. Even for this “worst case scenario,” 94% of the

E. Frahm / Journal of Archaeological Science 40 (2013) 1444e1448

1445

artefacts were sourced successfully, and the Kömürcü subsource of Göllü Daǧ volcano was also distinguished. In contrast, lab-based studies have reported difficulties discerning Kömürcü from other  subsources (e.g., Carter and Shackley, 2007). S&S should Göllü Dag appreciate a scientific approach to testing, rather than simply accepting, prevailing opinions.

with other team members on the overall research design. Additionally, studies that rely on lab-based techniques must often interpret a vast assemblage based on a handful of artefacts. HHpXRF, though, will enable much larger portions of assemblages to be sourced, so archaeological interpretations can become more meaningful and nuanced.

3. Reproducibility s accuracy, results s raw data

4. “Systematic offsets” [ internal consistency

I concur with S&S about the importance of reproducibility (i.e., reliability; Frahm, 2012), but our views diverge about the nature of sourcing “results” and, consequently, their reproducibility. Sourcing is not geochemistry. We do not analyse artefacts to elucidate chemical processes that regulate geological systems. Our research questions are not formulated in terms of planetary processes such as mantle convection. Sourcing is archaeology, specifically a form of archaeological typology. We endeavour to sort artefacts into types that correspond to the materials’ geographical origins and to interpret any patterns in terms of human behaviour. There are different approaches to create types. Geochemical measurements are the most common, but such measurements are not our goal. Our questions are archaeological, so our sourcing “results” should be archaeological. It is laudable to also acquire geochemical results that, for example, a volcanologist would find useful for magma differentiation research, but it is certainly not a prerequisite. I look forward to a time when archaeologists have sufficient funding to subsidise geochemists’ research. Thus, reproducibility in sourcing should concern the source “types” and which artefacts are sorted into them. My experiment demonstrates the reproducibility of the HHpXRF results using an independent analytical technique (i.e., EMPA): 94% of the HHpXRF source assignments are correct compared to the EMPA source assignments. Hence, two independent techniques yielded the same archaeological results 94% of the time. Using independent techniques and attaining the same results is reproducibility. Another form of reproducibility is when an experiment is conducted repeatedly and the conclusions corroborated. I concluded that HHpXRF can yield correct source assignments for artefacts even if the geochemical measurements are systematically inaccurate. This reproduces similar conclusions from independent observers (e.g., Nazaroff et al., 2010; Millhauser et al., 2011). In contrast, S&S are focused on a non-archaeological goal: ensuring reported geochemical measurements are sufficiently accurate to be directly comparable among different studies and put into a central data repository (a silo?) from which science eventually emerges. Thus, in their view, reproducibility is only focused on the accuracy of instruments’ measurements. Accuracy, however, is merely a result of “correcting” internally consistent data to better fit one’s expectations and has nothing to do with reproducibility. The only reproducibility differences between my experiment and theirs are those that affect comparisons between any two data sets. HHpXRF will increase, rather than decrease, reproducibility in sourcing studies because the proliferation of these instruments enables an increase in the number of independent observers who can conduct sourcing studies. For the last four decades, obsidian sourcing has largely been limited to specialised archaeometric laboratories. Despite the benefits of such laboratories, anthropologists have noted that the ways in which material, technical, and human resources are organised within a laboratory affect the types of research questions, facts, and counter-facts emerging from that place (Latour and Woolgar, 1979). Furthermore, using pXRF, artefacts will be analysed non-destructively and remain in their countries of origin for others to study. It will also improve archaeological results to have obsidian experts do their analyses on-site, participate in the excavations and interpretation, and collaborate

S&S assert that internal consistency is “silliness.” Ironically, the two data sets included in their response are only internally consistent (i.e., the instruments yielded different measurements). Their Fig. 1 highlights this. For each 100 ppm of Nb measured by one instrument, the other one measured roughly 80 ppm of Nb. This occurred for each measurement. One instrument measures 100, 200, and 300 ppm of Nb, while the other measures about 80, 160, and 240 ppm of Nb. Similar incongruities occur for every element. One instrument measures 100, 200, and 300 ppm of Rb, but the other measures 85, 170, and 255 ppm. S&S refer to the data sets as having “systematic offsets,” but this is merely a euphemism for “internally consistent” data. Ultimately, S&S claim that these offsets occur every time two laboratories compare their data sets, which means that all laboratories are generating only internally consistent measurements. To approximate consistency between their instruments, S&S apply an empirical “correction” to one data set to make it fit with the other. There is little difference between their corrections and the corrections that I used to make my data sufficiently compatible with the MURR values. Such a correction is the only difference between data deemed accurate and data deemed inaccurate. S&S mistakenly equate accuracy with reproducibility, validity, and correlation, so they believe that the corrections for accuracy also affect these other concepts. 5. Validity s accuracy I fully concur with S&S that validity is the proper evaluation concept with which to assess HHpXRF. Their interpretation of validity, though, is contrary to its established meaning. Validity must not uncritically be equated with accuracy. In Research Methods in Anthropology, Bernard (2011) explains that validity is a different concept than accuracy: What if the spring [in a bathroom scale] were not calibrated correctly (there was an error at the factory where the scale was built.) and the scale were off?. Suppose it turned out that your scale was always incorrectly lower by 5 pounds., then a simple correction formula would be all you’d need in order to feel confident that the data from the instrument were pretty close to the truth. The data from this instrument are valid (it has already been determined that the scale is measuring weight e exactly what you think it’s measuring); the data are reliable (you get the same answer every time you step on it); and they are precise enough for your purposes. But they are not accurate. (43) The bathroom scale is validly measuring weight, but it is poorly calibrated, so its measurements of weight are not accurate. It is not valid to measure height or beauty using a bathroom scale. Validity depends on the context for which an instrument will be used (e.g., Carmines and Zeller, 1979). I use Neff’s (1998) formulation of validity for sourcing studies: “The instrument is a valid indicator to the extent that composition really does measure ‘source’ as a location in geographic space and not some other concept” (116). I follow this definition because I am interested in establishing where past peoples acquired lithic raw materials, not the Sr or Rb content of the magma.

1446

E. Frahm / Journal of Archaeological Science 40 (2013) 1444e1448

In contrast, S&S contend that validity is when one’s measurements match the “true” values of standards. That is accuracy, not validity. The difference is considerable. Suppose we multiplied every value in the MURR XRF obsidian database by three. Every element concentration would be three times higher and, therefore, inaccurate. But if the data were plotted or run through statistical analyses, artefacts’ source assignments would be the same. Thus, these data would still be valid for obsidian sourcing, even if no geochemist would want to include the data in a study on magmatism.

Table 2 The Pearson’s r coefficients for Zr after the NITON data were calibrated using linear regression analysis (Fig. 7 in the paper). The best correlation is highlighted. Element

ElvaX vs. NAA

NITON vs. NAA

ElvaX vs. NITON

Zr

0.982

0.995

0.988

must use a correlation coefficient to determine the correlation between data sets. 7. Accuracy is not straightforward

6. Correlation s accuracy S&S apparently also mistakenly equate correlation and accuracy. This is important because correlation is how one compares data sets and endeavours to make them compatible. S&S, though, make incorrect statements about correlation among the three data sets in my paper: it is of interest here that Frahm’s plots of [MURR ElvaX] data vs. NAA data show higher correlated values than plots comparing the ElvaX PXRF data to Niton PXRF data. The reason for this is quite simple e the MURR ElvaX instrument was calibrated by Mike Glascock using obsidian that had been analysed by INAA and/or other analytical methods at MURR with the express intent of generating data that would be as comparable to MURR’s extant NAA obsidian database... In contrast, the Niton XRF data are poorly correlated with MURR [ElvaX] data because the Niton was not calibrated for obsidian. In reality, the reverse is true. Consider the Pearson’s r correlation coefficients in Tables 1 and 2 here calculated for the relationships shown in Figs. 6 and 7 in my paper. Table 1 shows the Pearson’s r coefficients for Fig. 6 (NITON data with no special calibration). For the five elements, once the r values are tied (Mn), twice the r values are better for the NAA vs. ElvaX data (Rb and Sr), and twice the r values are better for the ElvaX vs. NITON data (Zn and Zr). Table 2 shows the correlations for one element (Zr) after my data were calibrated by linear regression analysis. The best correlation (r ¼ 0.995) is for the NITON vs. NAA data, followed by the NITON vs. ElvaX (r ¼ 0.988), and the worst correlation is between the two MURR data sets (r ¼ 0.981). Thus, not only is there no evidence for a claim that the NITON data “are poorly correlated with” the ElvaX data, but also these coefficients reveal that calibration has little, if any, bearing on correlation. I can only explain their claims by concluding S&S have confused correlation with accuracy. It seems they assess correlation based on the trend line slopes in Fig. 6. Generally the slopes are closer to 1 for the ElvaX vs. NAA data than the ElvaX vs. NITON data. That S&S attribute “higher correlation” between the two MURR data sets to Glascock’s calibrations further indicates that they are looking at the slopes: calibrations affect the slopes much more than the correlations (e.g., linear regression calibration markedly changed the slopes between Figs. 6 and 7). Assuming one of the data sets is “true,” trend line slopes can serve as a proxy for accuracy, but one Table 1 The Pearson’s r coefficients for the data in Fig. 6 of the original paper (i.e., NITON data with no special calibration). The best correlations for each element are highlighted. Element

NAA vs. ElvaX

ElvaX vs. NITON

Mn Zn Rb Sr Zr

0.97 0.95 0.95 0.99 0.98

0.97 0.98 0.81 0.95 0.99

Accuracy is a useful concept, but it is not as straightforward as S&S suggest. As defined by NIST, accuracy is the qualitative “closeness of the agreement between the result of a measurement” and the “true” value of the quantity being measured (Taylor and Kuyatt, 1994). Accuracy compares a measurement to the “truth.” In Statistics in Spectroscopy, Mark and Workman (2003) explain that accuracy is complex “because the usual statement of ‘accuracy’ compares the result obtained with ‘truth’ [but]. ‘truth’ is usually unknown, making this comparison difficult” (214). Thus, accuracy involves comparing measurements to a constructed truth. S&S contend that accuracy can be established if data “conform to international standards as published,” so let us consider a specific example: Smithsonian VG-568, Yellowstone National Park obsidian. Eugene Jarosewich at the Smithsonian Institution created this and many other reference standards, still distributed today and used in hundreds of laboratories worldwide, with the official compositions published in Geostandards Newsletter (Jarosewich et al., 1980). Table 3 shows the official Smithsonian VG-568 values alongside values from three different data sets. Place your hand over the right half of this table. You are now only looking at the official data and mine. Look at TiO2: 1200 ppm (0.12%) in Jarosewich et al. (1980) and 800 ppm (0.08%) in my data. Also look at MnO: 3000 ppm in Jarosewich et al. (1980) and 200 ppm in my data. Which of these two data sets would be considered “accurate” and which “inaccurate”? Compared to the official data, my data would be seen as inaccurate. The Smithsonian data have authority that mine do not. Now look at the entire table. The two other data sets match my values for TiO2 and MnO. Now which values will be seen as “true”? These data have not changed, but the basis of authority has. Accuracy and “truth” has jumped from one set of values to the other. Imagine a laboratory has been using VG-568 for three decades and that the analysts worked to produce data that “conform to [this Table 3 The official values for Smithsonian standard VG-568 (Yellowstone National Park obsidian) alongside values from three independent data sets. The discussed elements are highlighted.

SiO2 TiO2 Al2O3 FeO(T) MnO MgO CaO Na2O K2O P2O5 SO3 Cl Total

Official values: Jarosewich et al. (1980)

Frahm (2010)

Streck and Wacaster (2006)

Rowe et al. (2008)

Mean

Mean

Std dev

Mean

Std dev

Mean

Std dev

76.71 0.12 12.06 1.23 0.30 <0.10 0.50 3.75 4.89 <0.01 e e 99.44

76.91 0.08 12.03 1.12 0.02 0.03 0.43 3.68 5.01 0.00 0.00 0.10 99.39

0.35 0.01 0.10 0.11 0.01 0.01 0.02 0.15 0.08 0.01 0.01 0.01 0.37

76.96 0.08 12.17 1.08 0.02 0.03 0.45 3.52 4.93 0.00 0.00 0.10 99.31

0.53 0.03 0.12 0.05 0.02 0.02 0.03 0.11 0.2 0.01 0.00 0.01 0.64

76.55 0.08 12.44 1.11 0.02 0.03 0.39 3.51 4.91 0.01 0.00 0.12 99.18

0.67 0.01 0.19 0.07 0.02 0.02 0.02 0.24 0.13 0.01 0.00 0.01 0.92

E. Frahm / Journal of Archaeological Science 40 (2013) 1444e1448

standard] as published.” Thus, the TiO2 and MnO values for obsidians they analysed were deemed accurate. Now imagine that the researchers learn their TiO2 values are all 50% too high and their MnO values are 15 times too high. The data are now deemed inaccurate. Should they toss out decades of analyses? Of course not. Their TiO2 and MnO values may be made accurate by applying corrections. Thus, accuracy is also malleable. As shown in my paper, the (initially) uncalibrated measurements were readily and sufficiently “accuratised” using linear regression equations calculated by a spreadsheet program. 8. Moving beyond accuracy tests As an example of a HHpXRF test they support, S&S cite a paper from the Journal of Arizona Archaeology. Dybowski (2012) compared measurements from a HHpXRF instrument and a labXRF instrument. Seven geological specimens were measured three times with HHpXRF and once with the labXRF instrument. Dybowski assumes the one labXRF measurement is “true” and concludes the HHpXRF data are inaccurate because there are differences between the data sets. He does not source a single artefact, nor does he assess whether the observed differences would be detrimental to sourcing. Simply put, Dybowski does not evaluate validity for obsidian sourcing. Similarly, the so-called “pXRF shootout,” coorganised by S&S at the Society for American Archaeology meeting in Memphis, involved no attempt to source artefacts, meaning that validity could not be evaluated. It was an exercise only in geochemical accuracy, and S&S were disinterested in participants’ chaîne analytique to go from a collection of artefacts to valid source identifications. Thus, S&S apparently prefer HHpXRF tests that focus on accuracy, do not source artefacts, and show that HHpXRF measurements do not match labXRF measurements. Such tests are flawed, in part, because accuracy is the product of instrumentespecimeneanalyst interactions. Two analysts can make different choices based on their theoretical and practical “know how” (connaissances and savoir-faire in the terminology of Lemonnier, 1992), and their measurements may differ in accuracy even when using the same instrument and specimens. S&S admit that there will always be offsets between different instruments, no matter the technique. Hence, it makes little sense to keep doing tests of accuracy that simply judge the measurements of one instrument relative to another. This is not a new problem. Compatibility issues date back to the earliest years of obsidian sourcing, when Renfrew and colleagues changed from OES to NAA, and continue today (e.g., Glascock, 1999). This has not stopped us from doing archaeology. We adapt through methodology. 9. HHpXRF vs. luggable desktops S&S suggest that my statement, to attain “extreme portability, [HHpXRF] analysers sacrifice performance” is equivalent to proposing an iPhone has lesser capabilities than the larger circa1994 Motorola 8900X-2. Not only is such an analogy contrived, their belief “size doesn’t matter” is easily disproved. Consider the instruments from one HHpXRF manufacturer: Bruker. Bruker states their “best in class” desktop XRF, the S2, offers “unrivaled analytical performance.” It is 120 kg (260 lbs). Bruker engineers are not needlessly making instruments too large. The S2 uses mains electricity, so it has a 50-W X-ray tube (the higher the wattage, the more intense the excitation beam). The Bruker HHpXRF analyser, however, must run on batteries for hours at a time, so it has a 1-W tube. Hence, there are sacrifices to build a handheld 2-kg (4.5-lbs) instrument.

1447

It is a mistake, however, to interpret my categories of HHpXRF and labXRF as principally technical ones. Instead, the categories are based on context of practice. HHpXRF analysers may be brought to an excavation and used on-site, used for surveying an archaeological site for chemical signals of activity areas, or even used on the slopes of a volcano to analyse obsidian outcrops. In contrast, labXRF instruments are not used in such contexts. They operate either in the traditional context of scientific practice, the laboratory, or in conditions very much like it. Hence, even if the same components were put into a 35-kg (77-lbs) desktop instrument and a 2-kg (4.5lbs) handheld instrument, the technical performance of these instruments could be similar, but their contexts of practice and, thus possible archaeological applications, will differ greatly. Thus, it is dubious that S&S add desktop XRF systems to a discussion explicitly focused on handheld analysers. As S&S demonstrate, the “pXRF” label is too vague since it has been used to describe desktop instruments that could be lugged to a vehicle and transported to a room with an electrical outlet. This is how S&S achieve a 60-entry bibliography of “pXRF” studies. Readers are encouraged to consult, as an example, the system described by Craig et al. (2007: 2015). Their “kit” cannot be used, for example, to conduct in situ geochemical analyses. Astonishingly, S&S argue that the ElvaX “desktop” instrument at the MURR Archaeometry Laboratory, led by Mike Glascock, is a pXRF system. The ElvaX is 18e35 kg (40e77 lbs) and needs a computer and external power source. Glascock and other MURR researchers repeatedly refer to the ElvaX as a “desktop” system and choose not to identify it as a “pXRF” system (Glascock et al., 2008, 2011; Giesso et al., 2011; Blomster and Glascock, 2011; Glascock and Giesso, 2012). But is the ElvaX a “labXRF” instrument? Yes, it is, according to a recent JAS paper coauthored by Glascock (Millhauser et al., 2011). The authors test the performance of a “pXRF” instrument (an InnovX HHpXRF analyser) against “laboratory XRF (lXRF).” In their test, the “lXRF” instrument is the MURR “ElvaX desktop EDXRF spectrometer” (3145)! Hence, Glascock and collaborators explicitly conceptualise the ElvaX as a laboratory XRF, not pXRF, instrument. Although classifying the ElvaX as pXRF or labXRF has no bearing on my experiment, it is disingenuous to claim MURR researchers consider this instrument “pXRF.” Additionally, an article S&S cite as using the ElvaX instrument in Argentina used, in reality, a HHpXRF analyser (Barberena et al., 2011: 28). 10. Arbiters of science vs. new paradigms S&S make a number of unsubstantiated claims in their response (e.g., “Most archaeological PXRF users lack experience,” “many studies... published are founded on poor science”). It appears their concerns are ultimately due to the proliferation of an analytical technique to which specialists like themselves previously held the reigns. Now the broader archaeological community has direct access to the technology. The danger, to quote S&S, is that social scientists will “eject... the ‘science’ aspect of our discipline to ‘play scientist’ with portable XRF.” Their analogy is revealing: A professional photographer recently told one of us that, as a consequence of widespread availability of digital cameras, now everyone is a photographer. It would seem that the same holds true for portable analytical instrumentation e now everyone can be a scientist. The photographer is an expert, has specialised technical knowledge about cameras that the public does not, and has made a profession out of taking photographs. Armed with digital cameras, the public is merely “playing photographer,” and the professional deems what is “real” photography. S&S apparently identify with the photographer in their analogy.

1448

E. Frahm / Journal of Archaeological Science 40 (2013) 1444e1448

Consider, however, that the proliferation of digital cameras has directly led to innumerable novel applications. Digital cameras are available to many more people, including innovators who use them in ways that once would have been cost-prohibitive. Consumers’ digital cameras operate in paradigms different from a professional’s specialised equipment. Photographs can be sent from our phones around the world in seconds. Users may not understand f-stops, and it might frustrate the professional that most users just turn a knob of presets. Think, however, of all the photographs of people, places, and events that have been taken when the professional was absent or simply did not deem a photo worth taking. Ultimately, as a direct result of this proliferation, never before has photography made such an impact on the world. The spread of digital cameras is exciting because of the novel things that can be done with the technology. Exactly the same is true for HHpXRF. I am excited by all the potential ways that HHpXRF can be integrated into archaeological research, and it might mean that HHpXRF should not be limited by “decades of protocol developed for laboratory XRF analysis” (Shackley, 2010: 18). In my paper, I argue that “there is great value in establishing HHpXRF as a successful tool, even if that necessitates methodological changes to source artefacts.” I stress the importance of appropriate methodology with HHpXRF; however, S&S proclaim that it must be the traditional methodology of labXRF and what they have deemed to be “correct within the discipline.” 11. Concluding remarks We are left with three questions. The first is the title of this reply: Is obsidian sourcing about geochemistry or archaeology? Archaeology is primary, and geochemistry can be kept or ignored as one’s methodology permits. The second is from S&S: “Why would any archaeologist accept results that could not be verified?” S&S have expressed an extremely narrow view on reproducibility and results in sourcing, and I contend that HHpXRF will improve both in our discipline. Neither science nor archaeology occurs by focussing on accuracy simply to stockpile geochemical measurements in a central repository, which will inevitably become outdated, and expecting knowledge to emerge out of the numbers. However, focused studies, using HHpXRF or another technique, can yield reliable and valid archaeological results, regardless of whether one’s measurements of, for example, Mexican obsidians are directly compatible with another researcher’s measurements of Aegean obsidians. Last is the title of Shackley’s (2010) editorial: Is there reliability and validity in pXRF? Absolutely and in varied contexts, including some that may violate “decades of protocol” for labXRF. Without breaking out of traditional lab-based paradigms, there is little exciting about HHpXRF. To identify new contexts and paradigms will require tests, and during this phase, we must keep in mind how Lévi-Strauss defined a scientist in Le Cru et le Cuit (1964): “The scientist is not a person who gives the right answers, he’s one who asks the right questions.” Now get out there and play scientist! Acknowledgements Drs. Roger Doonan, Joshua Feinberg, Haskell Scheimberg, and Andrea Alveshere as well as two anonymous reviewers are thanked for their comments that clarified this reply.

References Barberena, R., Hajduk, A., Gil, A., Neme, G., Duran, V., Glascock, M., Giesso, M., Borrazzo, K., de La Paz Pompei, M., Salgan, M., Cortegoso, V., Villarosa, G., Rughini, A., 2011. Obsidian in the south-central Andes: geological, geochemical, and archaeological assessment of north Patagonian sources (Argentina). Quaternary International 245 (1), 25e36. Bernard, H., 2011. Research Methods in Anthropology: Qualitative and Quantitative Approaches, fifth ed. AltaMira Press, Lanham, Maryland. Blomster, J., Glascock, M., 2011. Obsidian procurement in formative Oaxaca, Mexico: diachronic changes in political economy and interregional interaction. Journal of Field Archaeology 36, 21e41. Carmines, E., Zeller, R., 1979. Reliability and Validity Assessment. Quantitative Applications in the Social Sciences. Sage Publications, Thousand Oaks, California. Carter, T., Shackley, M., 2007. Sourcing obsidian from Neolithic Çatalhöyük (Turkey) using energy dispersive X-ray fluorescence. Archaeometry 49, 437e454. Craig, N., Speakman, R., Popelka-Filcoff, R., Glascock, M., Robertson, J., Shackley, M., Aldenderfer, M., 2007. Comparison of XRF and PXRF for analysis of archaeological obsidian from southern Perú. Journal of Archaeological Science 34, 2012e2024. Dybowski, D., 2012. pXRF/WDXRFS inter-unit data comparison of Arizona obsidian samples. Journal of Arizona Archaeology 2, 1e7. Frahm, E., 2010. The Bronze-Age Obsidian Industry at Tell Mozan (Ancient Urkesh), Syria. Ph.D. dissertation, Department of Anthropology, University of Minnesota. Available online at University of Minnesota’s Digital Conservancy: http://purl. umn.edu/99753. Frahm, E., 2012. Evaluation of archaeological sourcing techniques: reconsidering and re-deriving Hughes’ four-fold assessment scheme. Geoarchaeology 27, 166e174. Giesso, M., Duran, V., Neme, G., Glascock, M., Cortegoso, V., Gil, A., Sanhueza, L., 2011. A study of obsidian source usage in the Central Andes of Argentina and Chile. Archaeometry 53, 1e21. Glascock, M., Giesso, M., 2012. New perspectives on obsidian procurement and exchange at Tiwanaku, Bolivia. In: Liritzis, I., Stevenson, C. (Eds.), Obsidian and Ancient Manufactured Glasses. University of New Mexico Press. Glascock, M., Beyin, A., Coleman, M., 2008. X-ray fluorescence and neutron activation analysis of obsidian from the Red Sea Coast of Eritrea. International Association for Obsidian Studies 38, 6e10. Glascock, M., Kuzmin, Y., Grebennikov, A., Popov, V., Medvedev, V., Shewkomud, I., Zaitsev, N., 2011. Obsidian provenance for prehistoric complexes in the Amur River basin (Russian Far East). Journal of Archaeological Science 38, 1832e1841. Glascock, M., 1999. An inter-laboratory comparison of element compositions for two obsidian sources. International Association for Obsidian Studies Bulletin 23, 13e25. Jarosewich, E., Nelen, J., Norberg, J., 1980. Reference samples for electron microprobe analysis. Geostandards Newsletter 4, 43e47. Latour, B., Woolgar, S., 1979. Laboratory Life: the Construction of Scientific Facts. Princeton University Press, Princeton. Lemonnier, P., 1992. Elements for an Anthropology of Technology. University of Michigan Museum of Anthropology, Ann Arbor. Lévi-Strauss, C., 1964. Le Cru et le Cuit (The Raw and the Cooked, 1969). Pion, Paris. Mark, H., Workman Jr., J., 2003. Statistics in Spectroscopy, second ed. Elsevier Academic Press, San Diego, California. Millhauser, J., Rodríguez-Alegría, E., Glascock, M., 2011. Testing the accuracy of portable X-ray fluorescence to study Aztec and Colonial obsidian supply at Xaltocan, Mexico. Journal of Archaeological Science 38, 3141e3152. Nazaroff, A., Prufer, K., Drake, B., 2010. Assessing the applicability of portable X-ray fluorescence spectrometry for obsidian provenance research in the Maya lowlands. Journal of Archaeological Science 37, 885e895. Neff, H., 1998. Units in chemistry-based ceramic provenance investigations. In: Ramenofsky, A., Steffen, A. (Eds.), Unit Issues in Archaeology: Measuring Time, Space, and Material. University of Utah Press, pp. 115e127. Rowe, M., Thornber, C., Gooding, D., Pallister, S., 2008. Catalog of Mount St. Helens 2004e2005 Tephra Samples with Major- and Trace-element Geochemistry. United States Geological Survey, Open-File Report 2008-1131. Available online: http://pubs.usgs.gov/of/2008/1131/. Shackley, M., Nov. 2010. Is There Reliability and Validity in Portable X-ray Fluorescence Spectrometry (PXRF)? The SAA Archaeological Record, pp. 17e 18, 20. Streck, M., Wacaster, S., 2006. Plagioclase and pyroxene hosted melt inclusions in basaltic andesites of the current eruption of Arenal Volcano, Costa Rica. Journal of Volcanology and Geothermal Research 157, 236e253. Taylor, B., Kuyatt, C., 1994. Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results. NIST Technical Note 1297. Available online: http://www.nist.gov/pml/pubs/tn1297/index.cfm.