For analyses that are done in GENASYS, users tag statistics via the "Flags" section of the Load Results window. For statistics that are imported in PARcore or GENASYS, tags are configured in the layout and users tag statistics in the import file. Although the number of tags available is large and varied, it is expected that a given program will only use a small subset.
The tags are organized into five categories. The purpose of all of these tags is to identify every set of statistics loaded to the statistics service and to allow an appropriately structured query to draw the right set of statistics into the IBIS® Application for a given task or in PARcore when running a SRR. Before discussing the structure and logic of a query, it will be helpful to review in detail each of the available tags and their expected use.
The tags are organized into five categories. The purpose of all of these tags is to identify every set of statistics loaded to the statistics service and to allow an appropriately structured query to draw the right set of statistics into the IBIS® Application for a given task or in PARcore when running a SRR. Before discussing the structure and logic of a query, it will be helpful to review in detail each of the available tags and their expected use.
Tag Category | Description |
Data Quality | These are intended to describe the test taker sample on which the statistics are based. As a general rule, at least one data quality tag should be set whenever statistics are loaded. |
Usage | These tags describe the uses to which the tagged statistics are to be put, or the role that they are expected to play. As a general rule, at least one usage tag should be set whenever statistics are loaded. |
Delivery Mode | These tags identify the administration mode the data producing a set of statistics were collected under. As a general rule, this tag would only be set for testing programs that produce statistics from both paper and computer delivered samples. |
All Item Flags | The tags in this category provide additional information about the nature or origin of a set of statistics (IRT Scaled Parameters, IRT Reported Score Metric, Interim, Default, Do Not Use). Many tests will not use these tags. |
User Defined | These tags are available to cover unusual and unforeseen situations and needs. Note that the category of “Other’ in the Data Quality and Usage families are intended for the same purpose. Many tests will not use these tags. |
1. Data Quality: Many programs produce multiple sets of IA statistics at various times during the course of a post-administration analysis cycle. Because the size and degree of representativeness of the test taker samples supporting each IA can differ substantially, the resulting statistics may be appropriate or suitable for some purposes but not others. A common example is an IA run on a very early sample that includes a small fraction of the expected total sample. The statistics from this sample may adequately support a PIA item review but may not be ideal for future form assembly. Statistics for future form assembly are best computed from later, more complete data samples. All statistics that are tagged for any purpose should have a data quality tag set.
Data Quality Tags | Description |
PIA | Preliminary Item Analysis - An early sample that can be both small in size and unrepresentative of the test taker population. |
EQT | Equating - A later, often much larger data sample, but still not 100% of the test taker population. |
FIA | Final Item Analysis - The final data sample constituting essentially the full test taker population. |
DIF | Differential Item Function - This tag indicates that statistics were produced from a DIF analysis. |
Subgroup | This tag indicates that statistics were produced from a subgroup population sample. |
Research | This tag is available to identify statistics produced from data sets not collected during the course of an operational administration. For example, AP does pretesting in non-operational administrations. Other examples are special administrations, pretest-only administrations, or on-line tryouts of writing prompts. |
Accumulated | This tag indicates that statistics were estimated not from a single administration but from data pooled across multiple administrations (and perhaps even multiple forms). |
Other | A program/test definable tag intended to cover data quality circumstances not explicitly covered by the other tags. |
HAI Agreement Statistics | This tag indicates that IRR statistics were produced from Human-AI scoring. |
HH Agreement Statistics | This tag indicates that IRR statistics were produced from Human-Human scoring. |
Model Build Evaluation | This tag indicates that IRR statistics were produced from model build evaluation. |
2. Usage: These tags indicate the purposes that statistics are expected to serve. There is generally a strong connection between data quality and expected use. For example, statistics tagged as Data Quality of PIA, would be tagged as intended to support item review. Similarly, statistics from the “best” data quality sample (FIA or Accumulated) would usually be tagged as supporting test assembly and item review. The usage tags generally are not mutually exclusive and often multiple tags will be applied to a given set of statistics.
Usage Tags | Description |
Item Review | This tag identifies statistics intended to support item review at any time and for any purpose. Statistics tagged with both Usage: Item Review and Data Quality: PIA will be available for the PIA review function within the IBIS Application. Statistics tagged with both Usage: Item Review and Data Quality: FIA will be available for the FIA review function within the IBIS Application. |
Test Assembly | This tag identifies statistics which can be used for test assembly. This tag is generally set for statistics that were (a) based on a large and reasonably representative sample, (b) used the primary criterion score of choice, and (c) were computed with a standard set of analysis options. For most situations, it is expected that the Test Assembly tag would only be set in conjunction with Data Quality tags of EQT or FIA. |
External CFIB | Client Facing Item Bank – These are statistics that are deemed acceptable for external review or use. In general, this would be synonymous with the Test Assembly tag (so it would not be necessary to set this tag). However, there may be instances where they differ and this tag is intended to allow for that possibility. |
Internal IB | Internal item Banks – These are statistics that can be appropriately exposed to systems such as program specific data explorer tools. |
Other | A program/test definable category. |
Item Lifecycle | This tag identifies statistics intended for Item Lifecycle Evalution within the IBIS application. |
3. Delivery Mode: The mode in which the items were delivered to test takers (Paper, Computer, or Both).
4. All Item Flags: These are available to convey additional information regarding the nature of a set of statistics.
All Item Flags | Description |
IRT Scaled Parameters | This tag is only set when loading IRT requests and is intended to identify IRT parameters that have already been linked or scaled via a TBLT (Transforming B-parameters using a Least-squares Technique) or Robust request. |
IRT Reported Score Metric | This tag is relevant only to IRT requests, and indicates that the IRT parameters are on a scale other than the theta metric. This is to support programs that have been inherited from CTB/ McGraw Hill where Pardux conflated the IRT and reporting metrics into a single scale. |
Interim | This tag identifies statistics that are being loaded for the purposes of creating some processing report and are not expected to be used subsequently. For example, classical item statistics computed by subform that are used only to evaluate potential concerns of test book specific manufacturing errors and only exposed to a data visualization tool. Once reviewed, these statistics might be of no further interest other than as process documentation. |
Default | This tag parallels the default tag in a historical Test Assembly System (TAS) process. However, it is expected that most programs will make no use of this tag, instead identifying the best set of statistics for a particular purpose through the data quality and usage tags. If the most recent instance of statistics which are tagged as Usage: Test Assembly should be the default used in test assembly, then it is not necessary to continue to set the default tag. Instead, when setting up the SRR for test assembly, use the default sorting (run date within administration date) with the default resolution (choose the first in the sorted list). |
Do Not Use | This tag should be used if the item statistics for that date/time should not be used for retrieval. For statistics that are calculated in the system, this tag can only be set for item(s) that have already been saved to the statistics service and have at least one tag saved. For statistics that are imported, this can be specified in the appropriate tag field for an item in the import file. Statistics can be retrieved for items with this tag if the item is tagged with the tag/category specified in the SRR and the 'As of Date' of before the item was tagged as Do Not Use is specified in the SRR run. |
5. User Defined: This final set of tags is available for any program-specific identification of statistics that is not otherwise covered by the collection of system-defined tags. For example, a user-defined tag may be used to identify statistics that are produced for a specific report or to support a particular analysis.