Fall 2015

Date

Event

Speaker

Abstract/Details

08/26/2015Incorporating World Knowledge to Heterogeneous Information NetworksMing ZhangLocation: KOELBEL 203

The key challenges of applying world knowledge are how to adapt the world knowledge to domains and how to represent it for learning. In this talk, we provide an example of using world knowledge for domain dependent document clustering. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Then we propose a clustering algorithm that can cluster multiple types and incorporate the sub-type information as constraints. Experimental results in Freebase and YAGO2 on two text benchmark datasets (20newsgroups and RCV1) show that incorporating world knowledge as indirect supervision can significantly outperform the state-of-the-art clustering algorithms as well as clustering algorithms enhanced with world knowledge features.
09/09/2015Cultural Heritage Linked Data on the Semantic WebEero HyvönenLocation: KOELBEL 355

Cultural Heritage (CH) (meta)data is often heterogeneous, multilingual, distributed, semantically interlinked, and produced independently by organizations and individuals using different schemas, tools, and practices. As a result, a fundamental problem area in dealing with CH data is to make the content mutually interoperable, so that it can be searched, linked, and presented in a harmonized way across the boundaries of the datasets and data silos. Semantic Web and Linked Data standards and practices of W3C are a promising approach to address these issues [1]. However, this is not enough: we also need a content infrastructure, i.e., the actual domain ontologies, metadata models, and data shared by the CH community, and web services that make their integration and use in CH data systems easy and cost efficient. This talk tells about our experiences in building a national level Linked Data content infrastructure in Finland.
09/16/2015Matrix Completion and Robust PCA: new data analysis toolsStephen BeckerLocation: KOELBEL 203

Matrix completion is a generalization of compressed sensing that seeks to determine missing matrix entries under some (non-Bayesian) assumptions about the matrix. The technique has generated a lot of excitement due to rigorous guarantees in some case, and also due to applications to machine learning (e.g., the Netflix prize problem). This talk discusses basic matrix completion, including efficient algorithms suitable for big data, as well as an extension of matrix completion known as robust PCA, which can handle large outliers in the data. We continue with several applications: inferring the structure of chromosomes, functional imaging of the brain, removing clouds from multi-spectral satellite image data, and verifying the properties of a quantum state or a quantum gate.
09/23/2015N-minute madnessÌýLocation: ENG Clark Conference Room
09/30/2015AMR and AMR ParsingMartha, Wei-Te, Wayne WardLocation: Fleming 279

• Broad-coverage CCG Semantic Parsing with AMR
• A Transition-based Algorithm for AMR Parsing
• Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation
10/07/2015NN for SRLBill Foland, Jim MartinLocation: Fleming 279
10/14/2015Topic modeling for sentence annotation - brainstormingÌýÌý
10/21/2015Aligning perspectives to scientific literatureJin-Dong KimLocation: Fleming 279

Scientific literature holds the accumulation of our scientific discoveries. By accessing the accumulated knowledge, development of new knowledge could be efficient. Because the size of the scientific literature is increasing exponentially, semantic indexing of literature is important to allow instant and fine-grained access to the sources of scientific assertions. There are many projects on-going to produce semantic indexing of scientific literature, a.k.a. literature annotation. Literature annotation projects are particularly active in the area of life sciences, partly due to the existence of public literature databases, e.g. PubMed. Although many of those annotation projects are conducted individually, fundamentally, they share the same target, i.e. PubMed articles. Since it is impossible for a single group to annotate the whole PubMed collection for every important aspect, individual projects annotate different parts of PubMed for different aspects of life sciences. It is like many blind men annotating a giant elephant from their individual perspectives. The annotations produced by an individual may be limited, but if all the annotations are collected and aligned, the chances of figuring out the whole picture will be maximized. The PubAnnotation system is developed to provide a platform for collecting and aligning various annotations made to a collection of literature, particularly now a collection of life science literature, represented by PubMed articles. The community of Biomedical Linked Annotation Hackathon (BLAH) is backing-up the developments around PubAnnotation, towards public shared resources of linked literature annotation.
10/28/2015Verb semanticsBill CroftÌý
11/04/2015Document Classification by Topic Using Neural NetworksScott DenningPresented is a method for classifying patent documents by technology type. This method is enabled by the creation of document indexes using latent semantic indexing. The indexes are input into an artificial neural network and based on learned patterns of categories and corresponding indexes, the neural network determines the most appropriate topic category. Testing has shown that this system achieves 99.5% accuracy in correctly classifying documents of a particular technology category if there are at least fifty patents in that category’s training set.
11/11/2015ÌýJames Gung, James Pustejovsky, Annie ZaenenÌý
11/18/2015Topic modeling for sentence annotation - brainstormingÌýÌý
12/02/2015AMR parsingWei-Te ChenÌý
12/09/2015NAACL Paper ClinicÌýÌý